27 machine learning unsupervised measure properties

92
Machine Learning for Data Mining Measure Properties for Clustering Andres Mendez-Vazquez July 27, 2015 1 / 37

Upload: andres-mendez-vazquez

Post on 16-Apr-2017

215 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: 27 Machine Learning Unsupervised Measure Properties

Machine Learning for Data MiningMeasure Properties for Clustering

Andres Mendez-Vazquez

July 27, 2015

1 / 37

Page 2: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

2 / 37

Page 3: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

3 / 37

Page 4: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Introduction

ObservationClusters are considered as groups containing data objects that are similarto each other.

When they are notWe assume there is some way to measure that difference!!!

4 / 37

Page 5: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Introduction

ObservationClusters are considered as groups containing data objects that are similarto each other.

When they are notWe assume there is some way to measure that difference!!!

4 / 37

Page 6: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Therefore

It is possible to give a mathematical definition of two important typesof clustering

Partitional ClusteringHierarchical Clustering

GivenGiven a set of input patterns X = {x1, x2, ..., xN}, wherexj = (x1j , x2j , ..., xdj)T ∈ Rd , with each measure xij called a feature(attribute, dimension, or variable).

5 / 37

Page 7: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Therefore

It is possible to give a mathematical definition of two important typesof clustering

Partitional ClusteringHierarchical Clustering

GivenGiven a set of input patterns X = {x1, x2, ..., xN}, wherexj = (x1j , x2j , ..., xdj)T ∈ Rd , with each measure xij called a feature(attribute, dimension, or variable).

5 / 37

Page 8: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Therefore

It is possible to give a mathematical definition of two important typesof clustering

Partitional ClusteringHierarchical Clustering

GivenGiven a set of input patterns X = {x1, x2, ..., xN}, wherexj = (x1j , x2j , ..., xdj)T ∈ Rd , with each measure xij called a feature(attribute, dimension, or variable).

5 / 37

Page 9: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hard Partitional Clustering

DefinitionHard partitional clustering attempts to seek a K -partition of X ,C = {C1, C2, ..., CK} with K ≤ N such that

1 Ci 6= ∅ for i = 1, ..., K .2 ∪K

i=1Ci = X .3 Ci ∩ Cj = ∅ for all i, j = 1, ..., K and i 6= j.

ClearlyThe third property can be relaxed to obtain the mathematical definition offuzzy and possibilistic clustering.

6 / 37

Page 10: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hard Partitional Clustering

DefinitionHard partitional clustering attempts to seek a K -partition of X ,C = {C1, C2, ..., CK} with K ≤ N such that

1 Ci 6= ∅ for i = 1, ..., K .2 ∪K

i=1Ci = X .3 Ci ∩ Cj = ∅ for all i, j = 1, ..., K and i 6= j.

ClearlyThe third property can be relaxed to obtain the mathematical definition offuzzy and possibilistic clustering.

6 / 37

Page 11: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hard Partitional Clustering

DefinitionHard partitional clustering attempts to seek a K -partition of X ,C = {C1, C2, ..., CK} with K ≤ N such that

1 Ci 6= ∅ for i = 1, ..., K .2 ∪K

i=1Ci = X .3 Ci ∩ Cj = ∅ for all i, j = 1, ..., K and i 6= j.

ClearlyThe third property can be relaxed to obtain the mathematical definition offuzzy and possibilistic clustering.

6 / 37

Page 12: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hard Partitional Clustering

DefinitionHard partitional clustering attempts to seek a K -partition of X ,C = {C1, C2, ..., CK} with K ≤ N such that

1 Ci 6= ∅ for i = 1, ..., K .2 ∪K

i=1Ci = X .3 Ci ∩ Cj = ∅ for all i, j = 1, ..., K and i 6= j.

ClearlyThe third property can be relaxed to obtain the mathematical definition offuzzy and possibilistic clustering.

6 / 37

Page 13: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hard Partitional Clustering

DefinitionHard partitional clustering attempts to seek a K -partition of X ,C = {C1, C2, ..., CK} with K ≤ N such that

1 Ci 6= ∅ for i = 1, ..., K .2 ∪K

i=1Ci = X .3 Ci ∩ Cj = ∅ for all i, j = 1, ..., K and i 6= j.

ClearlyThe third property can be relaxed to obtain the mathematical definition offuzzy and possibilistic clustering.

6 / 37

Page 14: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

For example

Given the membership Ai (x j) ∈ [0, 1]We can have

∑Ki=1 Ai (xj) = 1, ∀j and

∑Nj=1 Ai (xj) < N , ∀i.

7 / 37

Page 15: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hierarchical Clustering

DefinitionHierarchical clustering attempts to construct a tree-like, nested structurepartition of X , H = {H1, ..., HQ} with Q ≤ N such that

If Ci ∈ Hm and Cj ∈ Hl and m > l imply Ci ⊂ Cj or Ci ∩ Cj = ∅ forall i, j = 1, ..., K , i 6= j and m, l = 1, ..., Q.

8 / 37

Page 16: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Hierarchical Clustering

DefinitionHierarchical clustering attempts to construct a tree-like, nested structurepartition of X , H = {H1, ..., HQ} with Q ≤ N such that

If Ci ∈ Hm and Cj ∈ Hl and m > l imply Ci ⊂ Cj or Ci ∩ Cj = ∅ forall i, j = 1, ..., K , i 6= j and m, l = 1, ..., Q.

8 / 37

Page 17: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

9 / 37

Page 18: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Features

UsuallyA data object is described by a set of features or variables, usuallyrepresented as a multidimensional vector.

ThusWe tend to collect all that information as a pattern matrix ofdimensionality N × d.

ThenA feature can be classified as continuous, discrete, or binary.

10 / 37

Page 19: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Features

UsuallyA data object is described by a set of features or variables, usuallyrepresented as a multidimensional vector.

ThusWe tend to collect all that information as a pattern matrix ofdimensionality N × d.

ThenA feature can be classified as continuous, discrete, or binary.

10 / 37

Page 20: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Features

UsuallyA data object is described by a set of features or variables, usuallyrepresented as a multidimensional vector.

ThusWe tend to collect all that information as a pattern matrix ofdimensionality N × d.

ThenA feature can be classified as continuous, discrete, or binary.

10 / 37

Page 21: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus

ContinuousA continuous feature takes values from an uncountably infinite range set.

DiscreteDiscrete features only have finite, or at most, a countably infinite numberof values.

BinaryBinary or dichotomous features are a special case of discrete features whenthey have exactly two values.

11 / 37

Page 22: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus

ContinuousA continuous feature takes values from an uncountably infinite range set.

DiscreteDiscrete features only have finite, or at most, a countably infinite numberof values.

BinaryBinary or dichotomous features are a special case of discrete features whenthey have exactly two values.

11 / 37

Page 23: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus

ContinuousA continuous feature takes values from an uncountably infinite range set.

DiscreteDiscrete features only have finite, or at most, a countably infinite numberof values.

BinaryBinary or dichotomous features are a special case of discrete features whenthey have exactly two values.

11 / 37

Page 24: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

MeaningAnother property of features is the measurement level, which reflects therelative significance of numbers.

We have four from lowest to highestNominal, ordinal, interval, and ratio.

12 / 37

Page 25: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

MeaningAnother property of features is the measurement level, which reflects therelative significance of numbers.

We have four from lowest to highestNominal, ordinal, interval, and ratio.

12 / 37

Page 26: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

NominalFeatures at this level are represented with labels, states, or names.The numbers are meaningless in any mathematical sense and nomathematical calculation can be made.For example, peristalsis ∈ {hypermotile, normal, hypomotile, absent}.

OrdinalFeatures at this level are also names, but with a certain order implied.However, the difference between the values is again meaningless.For example, abdominal distension ∈ {slight, moderate, high}.

13 / 37

Page 27: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

NominalFeatures at this level are represented with labels, states, or names.The numbers are meaningless in any mathematical sense and nomathematical calculation can be made.For example, peristalsis ∈ {hypermotile, normal, hypomotile, absent}.

OrdinalFeatures at this level are also names, but with a certain order implied.However, the difference between the values is again meaningless.For example, abdominal distension ∈ {slight, moderate, high}.

13 / 37

Page 28: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

NominalFeatures at this level are represented with labels, states, or names.The numbers are meaningless in any mathematical sense and nomathematical calculation can be made.For example, peristalsis ∈ {hypermotile, normal, hypomotile, absent}.

OrdinalFeatures at this level are also names, but with a certain order implied.However, the difference between the values is again meaningless.For example, abdominal distension ∈ {slight, moderate, high}.

13 / 37

Page 29: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

NominalFeatures at this level are represented with labels, states, or names.The numbers are meaningless in any mathematical sense and nomathematical calculation can be made.For example, peristalsis ∈ {hypermotile, normal, hypomotile, absent}.

OrdinalFeatures at this level are also names, but with a certain order implied.However, the difference between the values is again meaningless.For example, abdominal distension ∈ {slight, moderate, high}.

13 / 37

Page 30: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

NominalFeatures at this level are represented with labels, states, or names.The numbers are meaningless in any mathematical sense and nomathematical calculation can be made.For example, peristalsis ∈ {hypermotile, normal, hypomotile, absent}.

OrdinalFeatures at this level are also names, but with a certain order implied.However, the difference between the values is again meaningless.For example, abdominal distension ∈ {slight, moderate, high}.

13 / 37

Page 31: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

NominalFeatures at this level are represented with labels, states, or names.The numbers are meaningless in any mathematical sense and nomathematical calculation can be made.For example, peristalsis ∈ {hypermotile, normal, hypomotile, absent}.

OrdinalFeatures at this level are also names, but with a certain order implied.However, the difference between the values is again meaningless.For example, abdominal distension ∈ {slight, moderate, high}.

13 / 37

Page 32: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

IntervalFeatures at this level offer a meaningful interpretation of thedifference between two values.However, there exists no true zero and the ratio between two values ismeaningless.For example, the concept cold can be seen as an interval.

RatioFeatures at this level possess all the properties of the above levels.There is an absolute zero.Thus, we have the meaning of ratio!!!

14 / 37

Page 33: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

IntervalFeatures at this level offer a meaningful interpretation of thedifference between two values.However, there exists no true zero and the ratio between two values ismeaningless.For example, the concept cold can be seen as an interval.

RatioFeatures at this level possess all the properties of the above levels.There is an absolute zero.Thus, we have the meaning of ratio!!!

14 / 37

Page 34: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

IntervalFeatures at this level offer a meaningful interpretation of thedifference between two values.However, there exists no true zero and the ratio between two values ismeaningless.For example, the concept cold can be seen as an interval.

RatioFeatures at this level possess all the properties of the above levels.There is an absolute zero.Thus, we have the meaning of ratio!!!

14 / 37

Page 35: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

IntervalFeatures at this level offer a meaningful interpretation of thedifference between two values.However, there exists no true zero and the ratio between two values ismeaningless.For example, the concept cold can be seen as an interval.

RatioFeatures at this level possess all the properties of the above levels.There is an absolute zero.Thus, we have the meaning of ratio!!!

14 / 37

Page 36: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

IntervalFeatures at this level offer a meaningful interpretation of thedifference between two values.However, there exists no true zero and the ratio between two values ismeaningless.For example, the concept cold can be seen as an interval.

RatioFeatures at this level possess all the properties of the above levels.There is an absolute zero.Thus, we have the meaning of ratio!!!

14 / 37

Page 37: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Measurement Level

IntervalFeatures at this level offer a meaningful interpretation of thedifference between two values.However, there exists no true zero and the ratio between two values ismeaningless.For example, the concept cold can be seen as an interval.

RatioFeatures at this level possess all the properties of the above levels.There is an absolute zero.Thus, we have the meaning of ratio!!!

14 / 37

Page 38: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

15 / 37

Page 39: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

DefinitionA Similarity Measure s on X is a function

s : X ×X → R (1)

Such that1 s (x, y) ≤ s0.2 s (x, x) = s0 for all x ∈ X .3 s (x, y) = s (y, x) for all x, y ∈ X .4 If s (x, y) = s0 ⇐⇒ x = y.5 s (x, y) s (y, z) ≤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ∈ X .

16 / 37

Page 40: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

DefinitionA Similarity Measure s on X is a function

s : X ×X → R (1)

Such that1 s (x, y) ≤ s0.2 s (x, x) = s0 for all x ∈ X .3 s (x, y) = s (y, x) for all x, y ∈ X .4 If s (x, y) = s0 ⇐⇒ x = y.5 s (x, y) s (y, z) ≤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ∈ X .

16 / 37

Page 41: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

DefinitionA Similarity Measure s on X is a function

s : X ×X → R (1)

Such that1 s (x, y) ≤ s0.2 s (x, x) = s0 for all x ∈ X .3 s (x, y) = s (y, x) for all x, y ∈ X .4 If s (x, y) = s0 ⇐⇒ x = y.5 s (x, y) s (y, z) ≤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ∈ X .

16 / 37

Page 42: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

DefinitionA Similarity Measure s on X is a function

s : X ×X → R (1)

Such that1 s (x, y) ≤ s0.2 s (x, x) = s0 for all x ∈ X .3 s (x, y) = s (y, x) for all x, y ∈ X .4 If s (x, y) = s0 ⇐⇒ x = y.5 s (x, y) s (y, z) ≤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ∈ X .

16 / 37

Page 43: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

DefinitionA Similarity Measure s on X is a function

s : X ×X → R (1)

Such that1 s (x, y) ≤ s0.2 s (x, x) = s0 for all x ∈ X .3 s (x, y) = s (y, x) for all x, y ∈ X .4 If s (x, y) = s0 ⇐⇒ x = y.5 s (x, y) s (y, z) ≤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ∈ X .

16 / 37

Page 44: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

DefinitionA Similarity Measure s on X is a function

s : X ×X → R (1)

Such that1 s (x, y) ≤ s0.2 s (x, x) = s0 for all x ∈ X .3 s (x, y) = s (y, x) for all x, y ∈ X .4 If s (x, y) = s0 ⇐⇒ x = y.5 s (x, y) s (y, z) ≤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ∈ X .

16 / 37

Page 45: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

DefintionA Dissimilarity Measure d on X is a function

d : X ×X → R (2)

Such that1 d (x, y) ≥ d0 for all x, y ∈ X .2 d (x, x) = d0 for all x ∈ X .3 d (x, y) = d (y, x) for all x, y ∈ X .4 If d (x, y) = d0 ⇐⇒ x = y.5 d (x, z) ≤ d (x, y) + d (y, z) for all x, y, z ∈ X .

17 / 37

Page 46: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

DefintionA Dissimilarity Measure d on X is a function

d : X ×X → R (2)

Such that1 d (x, y) ≥ d0 for all x, y ∈ X .2 d (x, x) = d0 for all x ∈ X .3 d (x, y) = d (y, x) for all x, y ∈ X .4 If d (x, y) = d0 ⇐⇒ x = y.5 d (x, z) ≤ d (x, y) + d (y, z) for all x, y, z ∈ X .

17 / 37

Page 47: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

DefintionA Dissimilarity Measure d on X is a function

d : X ×X → R (2)

Such that1 d (x, y) ≥ d0 for all x, y ∈ X .2 d (x, x) = d0 for all x ∈ X .3 d (x, y) = d (y, x) for all x, y ∈ X .4 If d (x, y) = d0 ⇐⇒ x = y.5 d (x, z) ≤ d (x, y) + d (y, z) for all x, y, z ∈ X .

17 / 37

Page 48: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

DefintionA Dissimilarity Measure d on X is a function

d : X ×X → R (2)

Such that1 d (x, y) ≥ d0 for all x, y ∈ X .2 d (x, x) = d0 for all x ∈ X .3 d (x, y) = d (y, x) for all x, y ∈ X .4 If d (x, y) = d0 ⇐⇒ x = y.5 d (x, z) ≤ d (x, y) + d (y, z) for all x, y, z ∈ X .

17 / 37

Page 49: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

DefintionA Dissimilarity Measure d on X is a function

d : X ×X → R (2)

Such that1 d (x, y) ≥ d0 for all x, y ∈ X .2 d (x, x) = d0 for all x ∈ X .3 d (x, y) = d (y, x) for all x, y ∈ X .4 If d (x, y) = d0 ⇐⇒ x = y.5 d (x, z) ≤ d (x, y) + d (y, z) for all x, y, z ∈ X .

17 / 37

Page 50: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

DefintionA Dissimilarity Measure d on X is a function

d : X ×X → R (2)

Such that1 d (x, y) ≥ d0 for all x, y ∈ X .2 d (x, x) = d0 for all x ∈ X .3 d (x, y) = d (y, x) for all x, y ∈ X .4 If d (x, y) = d0 ⇐⇒ x = y.5 d (x, z) ≤ d (x, y) + d (y, z) for all x, y, z ∈ X .

17 / 37

Page 51: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

18 / 37

Page 52: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The Hamming distance for {0, 1}The vectors here are collections of zeros and ones. Thus,

dH (x, y) =d∑

i=1(xi − yi)2 (3)

Euclidean distance

d (x, y) =( d∑

i=1|xi − yi |

12

)2

(4)

19 / 37

Page 53: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The Hamming distance for {0, 1}The vectors here are collections of zeros and ones. Thus,

dH (x, y) =d∑

i=1(xi − yi)2 (3)

Euclidean distance

d (x, y) =( d∑

i=1|xi − yi |

12

)2

(4)

19 / 37

Page 54: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The weighted lp metric DMs

dp (x, y) =( d∑

i=1wi |xi − yi |p

) 1p

(5)

WhereWhere xi , yi are the ith coordinates of x and y, i = 1, ..., d and wi ≥ 0 isthe ith weight coefficient.

ImportantA well-known representative of the latter category of measures is theEuclidean distance.

20 / 37

Page 55: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The weighted lp metric DMs

dp (x, y) =( d∑

i=1wi |xi − yi |p

) 1p

(5)

WhereWhere xi , yi are the ith coordinates of x and y, i = 1, ..., d and wi ≥ 0 isthe ith weight coefficient.

ImportantA well-known representative of the latter category of measures is theEuclidean distance.

20 / 37

Page 56: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The weighted lp metric DMs

dp (x, y) =( d∑

i=1wi |xi − yi |p

) 1p

(5)

WhereWhere xi , yi are the ith coordinates of x and y, i = 1, ..., d and wi ≥ 0 isthe ith weight coefficient.

ImportantA well-known representative of the latter category of measures is theEuclidean distance.

20 / 37

Page 57: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The weighted l2 metric DM can be further generalized as follows

d (x, y) =√

(x − y)T B (x − y) (6)

Where B is a symmetric, positive definite matrix

A special caseThis includes the Mahalanobis distance as a special case

21 / 37

Page 58: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The weighted l2 metric DM can be further generalized as follows

d (x, y) =√

(x − y)T B (x − y) (6)

Where B is a symmetric, positive definite matrix

A special caseThis includes the Mahalanobis distance as a special case

21 / 37

Page 59: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

Special lp metric DMs that are also encountered in practice are

d1 (x, y) =d∑

i=1wi |xi − yi | (Weighted Manhattan Norm)

d∞ (x, y) = max1≤i≤d

wi |xi − yi | (Weighted l∞ Norm)

22 / 37

Page 60: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

Special lp metric DMs that are also encountered in practice are

d1 (x, y) =d∑

i=1wi |xi − yi | (Weighted Manhattan Norm)

d∞ (x, y) = max1≤i≤d

wi |xi − yi | (Weighted l∞ Norm)

22 / 37

Page 61: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The inner product

sinner (x, y) = xT y =d∑

i=1xiyi (7)

Closely related to the inner product is the cosine similarity measure

scosine (x, y) = xTy‖x‖ ‖y‖ (8)

Where ‖x‖ =√∑d

i=1 x2i and ‖y‖ =

√∑di=1 y2

i , the length of thevectors!!!

23 / 37

Page 62: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The inner product

sinner (x, y) = xT y =d∑

i=1xiyi (7)

Closely related to the inner product is the cosine similarity measure

scosine (x, y) = xTy‖x‖ ‖y‖ (8)

Where ‖x‖ =√∑d

i=1 x2i and ‖y‖ =

√∑di=1 y2

i , the length of thevectors!!!

23 / 37

Page 63: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

Pearson’s correlation coefficient

rPearson (x, y) = (x − x)T (y − y)‖x‖ ‖y‖ (9)

Where

x = 1d

d∑i=1

xi and y = 1d

d∑i=1

yi (10)

PropertiesIt is clear that rPearson (x, y) takes values between -1 and +1.The difference between sinner and rPearson does not depend directlyon x and y but on their corresponding difference vectors.

24 / 37

Page 64: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

Pearson’s correlation coefficient

rPearson (x, y) = (x − x)T (y − y)‖x‖ ‖y‖ (9)

Where

x = 1d

d∑i=1

xi and y = 1d

d∑i=1

yi (10)

PropertiesIt is clear that rPearson (x, y) takes values between -1 and +1.The difference between sinner and rPearson does not depend directlyon x and y but on their corresponding difference vectors.

24 / 37

Page 65: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

Pearson’s correlation coefficient

rPearson (x, y) = (x − x)T (y − y)‖x‖ ‖y‖ (9)

Where

x = 1d

d∑i=1

xi and y = 1d

d∑i=1

yi (10)

PropertiesIt is clear that rPearson (x, y) takes values between -1 and +1.The difference between sinner and rPearson does not depend directlyon x and y but on their corresponding difference vectors.

24 / 37

Page 66: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

Pearson’s correlation coefficient

rPearson (x, y) = (x − x)T (y − y)‖x‖ ‖y‖ (9)

Where

x = 1d

d∑i=1

xi and y = 1d

d∑i=1

yi (10)

PropertiesIt is clear that rPearson (x, y) takes values between -1 and +1.The difference between sinner and rPearson does not depend directlyon x and y but on their corresponding difference vectors.

24 / 37

Page 67: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The Tanimoto measure or distance

sT (x, y) = xT y‖x‖2 + ‖y‖2 − xT y

(11)

Something Notable

sT (x, y) = 1

1 + (x−y)T (x−y)xT y

(12)

MeaningThe Tanimoto measure is inversely proportional to the squared Euclideandistance.

25 / 37

Page 68: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The Tanimoto measure or distance

sT (x, y) = xT y‖x‖2 + ‖y‖2 − xT y

(11)

Something Notable

sT (x, y) = 1

1 + (x−y)T (x−y)xT y

(12)

MeaningThe Tanimoto measure is inversely proportional to the squared Euclideandistance.

25 / 37

Page 69: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The Tanimoto measure or distance

sT (x, y) = xT y‖x‖2 + ‖y‖2 − xT y

(11)

Something Notable

sT (x, y) = 1

1 + (x−y)T (x−y)xT y

(12)

MeaningThe Tanimoto measure is inversely proportional to the squared Euclideandistance.

25 / 37

Page 70: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

26 / 37

Page 71: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

We consider now vectors whose coordinates belong to afinite set

Given F = {0, 1, 2, ..., k − 1} where k is a positive integerThere are exactly kd vectors x ∈ Fd

Now consider x, y ∈ Fd

With A (x, y) = [aij ], i, j = 0, 1, ..., k − 1 be a k × k matrix

WhereThe element aij is the number of places where the first vector has the isymbol and the corresponding element of the second vector has the jsymbol.

27 / 37

Page 72: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

We consider now vectors whose coordinates belong to afinite set

Given F = {0, 1, 2, ..., k − 1} where k is a positive integerThere are exactly kd vectors x ∈ Fd

Now consider x, y ∈ Fd

With A (x, y) = [aij ], i, j = 0, 1, ..., k − 1 be a k × k matrix

WhereThe element aij is the number of places where the first vector has the isymbol and the corresponding element of the second vector has the jsymbol.

27 / 37

Page 73: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

We consider now vectors whose coordinates belong to afinite set

Given F = {0, 1, 2, ..., k − 1} where k is a positive integerThere are exactly kd vectors x ∈ Fd

Now consider x, y ∈ Fd

With A (x, y) = [aij ], i, j = 0, 1, ..., k − 1 be a k × k matrix

WhereThe element aij is the number of places where the first vector has the isymbol and the corresponding element of the second vector has the jsymbol.

27 / 37

Page 74: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Using Indicator functions

We can think of each element aij

aij =d∑

k=I [(i, j) == (xk , yk)] (13)

Thanks to Ricardo Llamas and Gibran Felix!!! Class Summer 2015.

28 / 37

Page 75: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Contingency Matrix

ThusThis matrix is called a contingency table.

For example if d = 6 and k = 3With x = [0, 1, 2, 1, 2, 1]T and y = [1, 0, 2, 1, 0, 1]T

Then

A (x, y) =

0 1 01 2 01 0 1

(14)

29 / 37

Page 76: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Contingency Matrix

ThusThis matrix is called a contingency table.

For example if d = 6 and k = 3With x = [0, 1, 2, 1, 2, 1]T and y = [1, 0, 2, 1, 0, 1]T

Then

A (x, y) =

0 1 01 2 01 0 1

(14)

29 / 37

Page 77: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Contingency Matrix

ThusThis matrix is called a contingency table.

For example if d = 6 and k = 3With x = [0, 1, 2, 1, 2, 1]T and y = [1, 0, 2, 1, 0, 1]T

Then

A (x, y) =

0 1 01 2 01 0 1

(14)

29 / 37

Page 78: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus, we have that

Easy to verify thatk−1∑i=0

k−1∑j=0

aij = d (15)

Something NotableMost of the proximity measures between two discrete-valued vectors maybe expressed as combinations of elements of matrix A (x, y).

30 / 37

Page 79: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus, we have that

Easy to verify thatk−1∑i=0

k−1∑j=0

aij = d (15)

Something NotableMost of the proximity measures between two discrete-valued vectors maybe expressed as combinations of elements of matrix A (x, y).

30 / 37

Page 80: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity MeasuresThe Hamming distanceIt is defined as the number of places where two vectors differ:

d (x, y) =k−1∑i=0

k−1∑j=0,j 6=i

aij (16)

The summation of all the off-diagonal elements of A, which indicate thepositions where x and y differ.

Special case k = 2, thus vectors are binary valuedThe vectors x ∈ Fd are binary valued and the Hamming distance becomes

dH (x, y) =d∑

i=1(xi + yi − 2xiyi)

=d∑

i=1(xi − yi)2

31 / 37

Page 81: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity MeasuresThe Hamming distanceIt is defined as the number of places where two vectors differ:

d (x, y) =k−1∑i=0

k−1∑j=0,j 6=i

aij (16)

The summation of all the off-diagonal elements of A, which indicate thepositions where x and y differ.

Special case k = 2, thus vectors are binary valuedThe vectors x ∈ Fd are binary valued and the Hamming distance becomes

dH (x, y) =d∑

i=1(xi + yi − 2xiyi)

=d∑

i=1(xi − yi)2

31 / 37

Page 82: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Dissimilarity Measures

The l1 distanceIt is defined as in the case of the continuous-valued vectors,

d1 (x, y) =d∑

i=1wi |xi − yi | (17)

32 / 37

Page 83: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The Tanimoto measureGiven two sets X and Y , we have that

sT (X , Y ) = nX∩YnX + nY − nX∩Y

(18)

where nX = |X |, nY = |Y |, nX∩Y = |X ∩Y |.

Thus, if x, y are discrete-valued vectorsThe measure takes into account all pairs of corresponding coordinates,except those whose corresponding coordinates (xi , yi) are both 0.

33 / 37

Page 84: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The Tanimoto measureGiven two sets X and Y , we have that

sT (X , Y ) = nX∩YnX + nY − nX∩Y

(18)

where nX = |X |, nY = |Y |, nX∩Y = |X ∩Y |.

Thus, if x, y are discrete-valued vectorsThe measure takes into account all pairs of corresponding coordinates,except those whose corresponding coordinates (xi , yi) are both 0.

33 / 37

Page 85: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Similarity Measures

The Tanimoto measureGiven two sets X and Y , we have that

sT (X , Y ) = nX∩YnX + nY − nX∩Y

(18)

where nX = |X |, nY = |Y |, nX∩Y = |X ∩Y |.

Thus, if x, y are discrete-valued vectorsThe measure takes into account all pairs of corresponding coordinates,except those whose corresponding coordinates (xi , yi) are both 0.

33 / 37

Page 86: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus

We now definenx =

∑k−1i=1

∑k−1j=0 aij and ny =

∑k−1i=0

∑k−1j=1 aij

In other wordsnx and ny denotes the number of the nonzero coordinate of x and y.

Thus, we have

sT (x, y) ==∑k−1

i=1∑k−1

j=i aii

nX + nY −∑k−1

i=1∑k−1

j=i aij(19)

34 / 37

Page 87: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus

We now definenx =

∑k−1i=1

∑k−1j=0 aij and ny =

∑k−1i=0

∑k−1j=1 aij

In other wordsnx and ny denotes the number of the nonzero coordinate of x and y.

Thus, we have

sT (x, y) ==∑k−1

i=1∑k−1

j=i aii

nX + nY −∑k−1

i=1∑k−1

j=i aij(19)

34 / 37

Page 88: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Thus

We now definenx =

∑k−1i=1

∑k−1j=0 aij and ny =

∑k−1i=0

∑k−1j=1 aij

In other wordsnx and ny denotes the number of the nonzero coordinate of x and y.

Thus, we have

sT (x, y) ==∑k−1

i=1∑k−1

j=i aii

nX + nY −∑k−1

i=1∑k−1

j=i aij(19)

34 / 37

Page 89: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Outline1 Introduction

Introduction2 Features

Types of Features3 Similarity and Dissimilarity Measures

Basic Definition4 Proximity Measures between Two Points

Real-Valued VectorsDissimilarity MeasuresSimilarity Measures

Discrete-Valued VectorsDissimilarity MeasuresSimilarity Measures

5 Other ExamplesBetween Sets

35 / 37

Page 90: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Between Sets

Jaccard SimilarityGiven two sets X and Y , we have that

J (X , Y ) = |X ∩Y ||X ∪Y | (20)

36 / 37

Page 91: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Between Sets

Jaccard SimilarityGiven two sets X and Y , we have that

J (X , Y ) = |X ∩Y ||X ∪Y | (20)

36 / 37

Page 92: 27 Machine Learning Unsupervised Measure Properties

Images/cinvestav-1.jpg

Although there are more stuff to look for

Please Refer to1 Theodoridis’ Book Chapter 11.2 Rui Xu and Don Wunsch. 2009. Clustering. Wiley-IEEE Press.

37 / 37