data mining cs 341, spring 2007 lecture 4: data mining techniques (i)

Data MiningData Mining

CS 341, Spring 2007CS 341, Spring 2007

Lecture 4: Data Mining Techniques (I)Lecture 4: Data Mining Techniques (I)

© Prentice Hall 2

Review:Review:

Information RetrievalInformation Retrieval– Similarity measuresSimilarity measures– Evaluation Metrics : Precision and RecallEvaluation Metrics : Precision and Recall

Question AnsweringQuestion Answering Web Search EngineWeb Search Engine

– An application of IRAn application of IR– Related to web miningRelated to web mining

© Prentice Hall 3

Data Mining Techniques OutlineData Mining Techniques Outline

StatisticalStatistical– Point EstimationPoint Estimation– Models Based on SummarizationModels Based on Summarization– Bayes TheoremBayes Theorem– Hypothesis TestingHypothesis Testing– Regression and CorrelationRegression and Correlation

Similarity MeasuresSimilarity Measures

Goal:Goal: Provide an overview of basic data Provide an overview of basic data mining techniquesmining techniques

© Prentice Hall 4

Point EstimationPoint Estimation Point Estimate:Point Estimate: estimate a population estimate a population

parameter.parameter. May be made by calculating the parameter for a May be made by calculating the parameter for a

sample.sample. May be used to predict value for missing data.May be used to predict value for missing data. Ex: Ex:

– R contains 100 employeesR contains 100 employees– 99 have salary information99 have salary information– Mean salary of these is $50,000Mean salary of these is $50,000– Use $50,000 as value of remaining employee’s Use $50,000 as value of remaining employee’s

salary. salary. Is this a good idea?Is this a good idea?

© Prentice Hall 5

Estimation ErrorEstimation Error

Bias: Bias: Difference between expected value and Difference between expected value and actual value.actual value.

Mean Squared Error (MSE):Mean Squared Error (MSE): expected value expected value of the squared difference between the of the squared difference between the estimate and the actual value:estimate and the actual value:

Root Mean Square Error (RMSE)Root Mean Square Error (RMSE)

© Prentice Hall 6

Jackknife EstimateJackknife Estimate Jackknife Estimate:Jackknife Estimate: estimate of estimate of

parameter is obtained by omitting one parameter is obtained by omitting one value from the set of observed values.value from the set of observed values.

Named to describe a “handy and useful Named to describe a “handy and useful tool”tool”

Used to reduce biasUsed to reduce bias PropertyProperty: The Jackknife estimator : The Jackknife estimator

lowers the bias from the order of 1/n to lowers the bias from the order of 1/n to 1/n1/n22

© Prentice Hall 7

Jackknife EstimateJackknife Estimate DefinitionDefinition: :

– Divide the sample size n into g groups of Divide the sample size n into g groups of size m each, so n=mg. (often m=1 and size m each, so n=mg. (often m=1 and g=n)g=n)

– estimate estimate jj by ignoring the jth group. by ignoring the jth group.

_ is the average of _ is the average of jj . .

– The Jackknife estimator is The Jackknife estimator is QQ = g = g– (g-1)– (g-1)_. _.

Where Where is an estimator for the parameter theta. is an estimator for the parameter theta.

© Prentice Hall 8

Jackknife Estimator: Example 1Jackknife Estimator: Example 1 Estimate of mean for X={xEstimate of mean for X={x1, x, x2, x, x3,}, n =3, g=3, ,}, n =3, g=3,

m=1, m=1, = = = ( = (xx1+ x+ x2+ x+ x3)/3

11 = ( = (xx2 + x + x3)/2, )/2, 22 = ( = (xx1 + x + x3)/2, )/2, 11 = ( = (xx1 + x + x2)/2, )/2,

_ = (_ = (11 + + 2 2 + + 22)/3 )/3

Q Q = g= g-(g-1) -(g-1) _= 3_= 3-(3-1) -(3-1) _= (_= (xx1 + x+ x2 + x + x3)/3)/3

In this case, the Jackknife Estimator is the In this case, the Jackknife Estimator is the same as the usual estimator.same as the usual estimator.

© Prentice Hall 9

Jackknife Estimator: Example 2Jackknife Estimator: Example 2 Estimate of variance for X={1, 4, 4}, n =3, g=3, Estimate of variance for X={1, 4, 4}, n =3, g=3,

m=1, m=1, = = 2 2

22 = ((1-3)= ((1-3)22 +(4-3) +(4-3)22 +(4-3) +(4-3)22 )/3 = 2 )/3 = 2 11 = = ((4-4)((4-4)22 + (4-4) + (4-4)22 ) /2 = 0 ) /2 = 0,, 22 = = 2.252.25 , , 33 = = 2.252.25 _ = _ = ((11 + + 2 2 + + 22)/3 = 1.5)/3 = 1.5 Q Q = = gg-(g-1) -(g-1) _= 3_= 3-(3-1) -(3-1) __

=3(2)-2(1.5)=3=3(2)-2(1.5)=3

In this case, the Jackknife Estimator is In this case, the Jackknife Estimator is different from the usual estimator.different from the usual estimator.

© Prentice Hall 10

Jackknife Estimator: Jackknife Estimator: Example 2(cont’d)Example 2(cont’d)

In general, apply the Jackknife technique In general, apply the Jackknife technique to the biased estimator to the biased estimator 22

2 2 = = (x (xii – x ) – x )2 2 / n/ n

then the jackknife estimator is sthen the jackknife estimator is s22

ss2 2 = = (x (xii – x ) – x )2 2 / (n -1)/ (n -1) Which is known to be unbiased for Which is known to be unbiased for 22

© Prentice Hall 11

Maximum Likelihood Maximum Likelihood Estimate (MLE)Estimate (MLE)

Obtain parameter estimates that maximize Obtain parameter estimates that maximize the probability that the sample data occurs for the probability that the sample data occurs for the specific model.the specific model.

Joint probability for observing the sample Joint probability for observing the sample data by multiplying the individual probabilities. data by multiplying the individual probabilities. Likelihood function: Likelihood function:

Maximize L.Maximize L.

© Prentice Hall 12

MLE ExampleMLE Example

Coin toss five times: {H,H,H,H,T}Coin toss five times: {H,H,H,H,T}

Assuming a perfect coin with H and T equally Assuming a perfect coin with H and T equally

likely, the likelihood of this sequence is: likely, the likelihood of this sequence is:

However if the probability of a H is 0.8 then:However if the probability of a H is 0.8 then:

© Prentice Hall 13

MLE Example (cont’d)MLE Example (cont’d) General likelihood formula:General likelihood formula:

Estimate for p is then 4/5 = 0.8Estimate for p is then 4/5 = 0.8

© Prentice Hall 14

Expectation-Maximization Expectation-Maximization (EM)(EM)

Solves estimation with incomplete data.Solves estimation with incomplete data. Obtain initial estimates for parameters.Obtain initial estimates for parameters. Iteratively use estimates for missing Iteratively use estimates for missing

data and continue until convergence.data and continue until convergence.

© Prentice Hall 15

EM ExampleEM Example

© Prentice Hall 16

EM AlgorithmEM Algorithm

© Prentice Hall 17

Models Based on SummarizationModels Based on Summarization

Basic concepts to provide an abstraction Basic concepts to provide an abstraction and summarization of the data as a and summarization of the data as a whole.whole.– Statistical concepts: mean, variance, median, mode, Statistical concepts: mean, variance, median, mode,

etc.etc.

Visualization:Visualization: display the structure of the display the structure of the data graphically.data graphically.– Line graphs, Pie charts, Histograms, Scatter plots, Line graphs, Pie charts, Histograms, Scatter plots,

Hierarchical graphsHierarchical graphs

© Prentice Hall 18

Scatter DiagramScatter Diagram

© Prentice Hall 19

Bayes TheoremBayes Theorem

Posterior Probability:Posterior Probability: P(hP(h1|x|xi)) Prior Probability:Prior Probability: P(h P(h1)) Bayes Theorem:Bayes Theorem:

Assign probabilities of hypotheses given a data Assign probabilities of hypotheses given a data value.value.

© Prentice Hall 20

Bayes Theorem ExampleBayes Theorem Example Credit authorizations (hypotheses): Credit authorizations (hypotheses):

hh11=authorize purchase, h=authorize purchase, h2 = authorize after = authorize after further identification, hfurther identification, h3=do not authorize, =do not authorize, hh4= do not authorize but contact police= do not authorize but contact police

Assign twelve data values for all Assign twelve data values for all combinations of credit and income:combinations of credit and income:

From training data: P(hFrom training data: P(h11) = 60%; P(h) = 60%; P(h22)=20%; )=20%;

P(h P(h33)=10%; P(h)=10%; P(h44)=10%.)=10%.

1 2 3 4 Excellent x1 x2 x3 x4 Good x5 x6 x7 x8 Bad x9 x10 x11 x12

© Prentice Hall 21

Bayes Example(cont’d)Bayes Example(cont’d) Training Data:Training Data:

ID Income Credit Class xi 1 4 Excellent h1 x4 2 3 Good h1 x7 3 2 Excellent h1 x2 4 3 Good h1 x7 5 4 Good h1 x8 6 2 Excellent h1 x2 7 3 Bad h2 x11 8 2 Bad h2 x10 9 3 Bad h3 x11 10 1 Bad h4 x9

© Prentice Hall 22

Bayes Example(cont’d)Bayes Example(cont’d) Calculate P(xCalculate P(xii|h|hjj) and P(x) and P(xii))

Ex: P(xEx: P(x77|h|h11)=2/6; P(x)=2/6; P(x44|h|h11)=1/6; P(x)=1/6; P(x22|h|h11)=2/6; P(x)=2/6; P(x88||

hh11)=1/6; P(x)=1/6; P(xii|h|h11)=0 for all other x)=0 for all other xii.. Predict the class for xPredict the class for x44::

– Calculate P(hCalculate P(hjj|x|x44) for all h) for all hjj. . – Place xPlace x4 4 in class with largest value.in class with largest value.– Ex: Ex:

»P(hP(h11|x|x44)=(P(x)=(P(x44|h|h11)(P(h)(P(h11))/P(x))/P(x44)) =(1/6)(0.6)/0.1=1. =(1/6)(0.6)/0.1=1.

»xx4 4 in class hin class h11..

© Prentice Hall 23

Hypothesis TestingHypothesis Testing

Find model to explain behavior by Find model to explain behavior by creating and then testing a hypothesis creating and then testing a hypothesis about the data.about the data.

Exact opposite of usual DM approach.Exact opposite of usual DM approach. HH0 0 – Null hypothesis; Hypothesis to be – Null hypothesis; Hypothesis to be

tested.tested. HH1 1 – Alternative hypothesis– Alternative hypothesis

© Prentice Hall 24

Chi-Square TestChi-Square Test One technique to perform hypothesis testingOne technique to perform hypothesis testing Used to test the association between two Used to test the association between two

observed variable values and determine if a observed variable values and determine if a set of observed values is statistically different.set of observed values is statistically different.

The chi-squared statistic is defines as:The chi-squared statistic is defines as:

O – observed valueO – observed value E – Expected value based on hypothesis.E – Expected value based on hypothesis.

© Prentice Hall 25

Chi-Square TestChi-Square Test Given the average scores of five schools. Given the average scores of five schools.

Determine whether the difference is Determine whether the difference is statistically significant.statistically significant.

Ex: Ex: – O={50,93,67,78,87}O={50,93,67,78,87}– E=75E=75– 22=15.55 and therefore significant=15.55 and therefore significant

Examine a chi-squared significance table. Examine a chi-squared significance table. – with a degree of 4 and a significance level of 95%, with a degree of 4 and a significance level of 95%,

the critical value is 9.488. Thus the variance the critical value is 9.488. Thus the variance between the schools’ scores and the expected between the schools’ scores and the expected value cannot be associated with pure chance.value cannot be associated with pure chance.

© Prentice Hall 26

RegressionRegression

Predict future values based on past valuesPredict future values based on past values Fitting a set of points to a curveFitting a set of points to a curve Linear RegressionLinear Regression assumes linear assumes linear

relationship exists.relationship exists.

y = cy = c00 + c + c11 x x11 + … + c + … + cnn x xnn

– n input variables, (called regressors or predictors)n input variables, (called regressors or predictors)– One out put variable, called responseOne out put variable, called response– n+1 constants, chosen during the modlong n+1 constants, chosen during the modlong

process to match the input examplesprocess to match the input examples

© Prentice Hall 28

CorrelationCorrelation

Examine the degree to which the values Examine the degree to which the values for two variables behave similarly.for two variables behave similarly.

Correlation coefficient r:Correlation coefficient r:• 1 = perfect correlation1 = perfect correlation• -1 = perfect but opposite correlation-1 = perfect but opposite correlation• 0 = no correlation0 = no correlation

© Prentice Hall 29

CorrelationCorrelation

Where X, Y are means for X and Y Where X, Y are means for X and Y respectively.respectively.

Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1)Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1)r = ?r = ?

Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10)Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10)r = ?r = ?

© Prentice Hall 30


Determine similarity between two objects.Determine similarity between two objects. Similarity characteristics:Similarity characteristics:

Alternatively, distance measure measure how Alternatively, distance measure measure how unlike or dissimilar objects are.unlike or dissimilar objects are.

© Prentice Hall 32

Distance MeasuresDistance Measures

Measure dissimilarity between objectsMeasure dissimilarity between objects

© Prentice Hall 33

Next Lecture:Next Lecture:

Data Mining techniques (II)Data Mining techniques (II)– Decision trees, neural networks and Decision trees, neural networks and

genetic algorithmsgenetic algorithms Reading assignments: Chapter 3Reading assignments: Chapter 3

data mining cs 341, spring 2007 lecture 4: data mining techniques (i)

Documents

x x x

x x2

x x x3 q

estimate of mean

expected value

actual value

n estimate j

usual estimator