evidential clustering: review and some new...

77
Evidential Clustering: Review and some New Developments Thierry Denœux Université de Technologie de Compiègne Heudiasyc (UMR CNRS 7253) https://www.hds.utc.fr/˜tdenoeux 18th Int. Symp. on Advanced Intelligent Systems (ISIS 2017) Daegu, South Korea October 12, 2017 Thierry Denœux Evidential clustering ISIS 2017, Daegu 1 / 77

Upload: others

Post on 31-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential Clustering:Review and some New Developments

Thierry Denœux

Université de Technologie de CompiègneHeudiasyc (UMR CNRS 7253)

https://www.hds.utc.fr/˜tdenoeux

18th Int. Symp. on Advanced Intelligent Systems (ISIS 2017)Daegu, South Korea

October 12, 2017

Thierry Denœux Evidential clustering ISIS 2017, Daegu 1 / 77

Page 2: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Cluster Analysis

n objects described byAttribute vectors x1, . . . , xn (attributedata) orDissimilarities (proximity data)

Goals:1 Discover groups in the data2 Assess the uncertainty in group

membership

Thierry Denœux Evidential clustering ISIS 2017, Daegu 2 / 77

Page 3: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Hard and soft clustering

Hard clustering: no representation of uncertainty. Each object is assigned toone and only one group. Group membership is represented bybinary variables uik such that uik = 1 if object i belongs to groupk and uik = 0 otherwise.

Fuzzy clustering: each object has a degree of membership uik ∈ [0,1] to eachgroup, with

∑ck=1 uik = 1. The uik ’s can be interpreted as

probabilities.Possibilistic clustering: the uik are free to take any value in [0,1]c . Each

number uik is interpreted as a degree of possibility that object ibelongs to group k .

Thierry Denœux Evidential clustering ISIS 2017, Daegu 3 / 77

Page 4: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Hard and soft clustering (continued)

Rough clustering: each cluster ωk is characterized by a lower approximationωk and an upper approximation ωk , with ωk ⊆ ωk ; themembership of object i to cluster k is described by a pair(uik ,uik ) ∈ 0,12, with uik ≤ uik ,

∑ck=1 uik ≤ 1 and∑c

k=1 uik ≥ 1.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 4 / 77

Page 5: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Clustering and belief functions

clustering structure uncertainty frameworkfuzzy partition probability theory

possibilistic partition possibility theoryrough partition (rough) sets

? belief functions

As belief functions extend probabilities, possibilities and sets, could thetheory of belief functions provide a more general and flexible frameworkfor cluster analysis?Objectives:

Unify the various approaches to clusteringAchieve a richer and more accurate representation of uncertaintyNew clustering algorithms and new tools to compare and combine clusteringstructures.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 5 / 77

Page 6: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 6 / 77

Page 7: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 7 / 77

Page 8: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Mass functionDefinition

Let Q be a question of interest, and Ω be a finite set of possible answers,one and only one of which is true.Evidence about Q can be represented by a mass function m from 2Ω to[0,1] such that ∑

A⊆Ω

m(A) = 1

Ω

A1

A2

A3

A4

The subsets A of Ω such thatm(A) 6= 0 are called the focal setsof m.If m(∅) = 0, m is said to benormalized (usually assumed).

Thierry Denœux Evidential clustering ISIS 2017, Daegu 8 / 77

Page 9: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Example

Consider a road scene analysis application. Let Q concern the type ofobject in some region of the image, and Ω = G,R,T ,O,S,corresponding to the possibilities Grass, Road, Tree/Bush, Obstacle, SkyAssume that a lidar sensor (laser telemeter) returns the information thatthe object is either a Tree or an Obstacle, but we there is a probabilityp = 0.1 that the information is not reliable (because, e.g., the sensor isout of order).This uncertain evidence can be represented by the following massfunction m on Ω,

m(T ,O) = 0.9, m(Ω) = 0.1

Thierry Denœux Evidential clustering ISIS 2017, Daegu 9 / 77

Page 10: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Mass functionInterpretation

The meaning of the evidence is uncertain: it points to several subsets ofΩ, with different degrees of support.Each mass m(A) is a measure of the support given exactly to A, and tono more specific subset.In particular, m(Ω) is a measure of lack of information (ignorance).The vacuous mass function defined by m?(Ω) = 1 represents total lack ofinformation (complete ignorance).

Thierry Denœux Evidential clustering ISIS 2017, Daegu 10 / 77

Page 11: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Belief and plausibility functions

ΩB

A1

A2

A3

A4

Given a normalized mass function m,we can define

Tshe degree of belief in B as

Bel(B) =∑A⊆B

m(A)

The degree of plausibility of B as

Pl(B) = 1− Bel(B)

=∑

A∩B 6=∅

m(A).

Interpretation: Bel(B) measures the total support given to B, while Pl(A)measures the lack of support given to B.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 11 / 77

Page 12: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Two-dimensional representation

The uncertainty on a proposition B is represented by two numbers:Bel(B) and Pl(B), with Bel(B) ≤ Pl(B).The intervals [Bel(B),Pl(B)] have maximum length when m is thevacuous mass function. Then,

[Bel(B),Pl(B)] = [0,1]

for all subset B of Ω, except ∅ and Ω.The intervals [Bel(B),Pl(B)] are reduced to points when the focal sets ofm are singletons (m is then said to be Bayesian); then,

Bel(B) = Pl(B)

for all B, and Bel is a probability measure.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 12 / 77

Page 13: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Logical/Consonant mass function

If m has only one focal set, it is said to be logical.If the focal sets of m are nested (A1 ⊂ A2 ⊂ . . . ⊂ An ), m is said to beconsonant.Pl is then a possibility measure, i.e.,

Pl(A ∪ B) = max(Pl(A),Pl(B))

for all A,B ⊆ Ω.We have

Pl(A) = maxω∈A

Pl(ω) for all A ⊆ Ω.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 13 / 77

Page 14: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Dempster’s rule

Let m1 and m2 be two mass functions induced by independent pieces ofevidence.Their orthogonal sum is the mass function m1 ⊕m2 defined by

(m1 ⊕m2)(A) =1

1− κ∑

B∩C=A

m1(B)m2(C)

for all A 6= ∅ and (m1 ⊕m2)(∅) = 0, where

κ =∑

B∩C=∅

m1(B)m2(C)

is the degree of conflict.⊕ is commutative, associative, and m? is its single neutral element.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 14 / 77

Page 15: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Belief-probability transformation

It may be useful to transform a mass function m into a probabilitydistribution for approximation or decision-making.Plausibility-probability transformation

pm(ω) =Pl(ω)∑

ω∈Ω Pl(ω)

Property:pm1⊕m2 = pm1 ⊕ pm2

Thierry Denœux Evidential clustering ISIS 2017, Daegu 15 / 77

Page 16: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Decision

Given a normalized mass function m, how to select an element or asubset of Ω?Several solutions: for instance, choose ω with the largest degree of beliefBel(ω) or the largest plausibility Pl(ω).Interval dominance:

Bel(ω) Pl(ω) Bel(ω) Pl(ω)

ω is dominated by ω′ iff Pl(ω) < Bel(ω′)We may select the set A∗ of possibilities ω that are dominated by no otherpossibility ω′.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 16 / 77

Page 17: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Theory of belief functions: a brief introduction

Main ideas

The theory of belief function combines sets with probabilities, byassigning basic probabilities to focal sets.A mass function can be seen as a generalized set or as a generalizedprobability distribution.Possibility theory is also recovered as a special case, when the focal setsare nested.A mass function can be transformed into

a probability distribution,an element of Ω, ora subset of Ω.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 17 / 77

Page 18: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 18 / 77

Page 19: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Credal partition

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77

Page 20: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Credal partition

Evidential clustering

Let O = o1, . . . ,on be a set of n objects and Ω = ω1, . . . , ωc be a setof c groups (clusters).Each object oi belongs to at most one group.Evidence about the group membership of object oi is represented by amass function mi on Ω:

for any nonempty set of clusters A ⊆ Ω, mi (A) is the degree of support givento the proposition “oi belongs to one of the clusters in A”mi (∅) measures the support given to the proposition “oi does not belong toany of the c groups”

The n-tuple M = (m1, . . . ,mn) is called a credal partition.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 20 / 77

Page 21: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Credal partition

Example

−5 0 5 10

−20

24

68

10

Butterfly data

x1

x 2

1234

5 6 78910

11

12

Credal partition

∅ ω1 ω2 ω1, ω2m3 0 1 0 0m5 0 0.5 0 0.5m6 0 0 0 1m12 0.9 0 0.1 0

Thierry Denœux Evidential clustering ISIS 2017, Daegu 21 / 77

Page 22: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Credal partition

Relationship with other clustering structures

Hardpar''on

Fuzzypar''on Possibilis'cpar''on Roughpar''on

Credalpar''on

micertain

miBayesian miconsonant milogical

Moregeneral

Lessgeneral

Thierry Denœux Evidential clustering ISIS 2017, Daegu 22 / 77

Page 23: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Credal partition

Rough clustering as a special case

Assume that each mi is logical, i.e., mi (Ai ) = 1 for some Ai ⊆ Ω, Ai 6= ∅.We can then define the lower and upper approximations of cluster ωk as

ωk = oi ∈ O|Ai = ωk , ωk = oi ∈ O|ωk ∈ Ai.

The membership values to the lower and upper approximations of clusterωk are uik = Beli (ωk) and uik = Pli (ωk).

m(ω1)=1( m(ω1, ω2)=1( m(ω2)=1(

Lower(approxima4ons(

Upper(approxima4ons(

ω1L( ω2

L( ω2U(ω1

U(

Thierry Denœux Evidential clustering ISIS 2017, Daegu 23 / 77

Page 24: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Summarization of a credal partition

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 24 / 77

Page 25: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Summarization of a credal partition

Summarization of a credal partition

Hardpar''on

Fuzzypar''on Possibilis'cpar''on Roughpar''on

Credalpar''onMorecomplex

Lesscomplex

intervaldominanceormaximummassplausibility

ofsingletons

maximumplausibilitymaximum

probability

plausibility-proba.transforma'on

Thierry Denœux Evidential clustering ISIS 2017, Daegu 25 / 77

Page 26: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Summarization of a credal partition

From evidential to rough clustering

For each i , let Ai ⊆ Ω be the set of non dominated clusters

Ai = ω ∈ Ω|∀ω′ ∈ Ω,Bel∗i (ω′) ≤ Pl∗i (ω),

where Bel∗i and Pl∗i are the normalized belief and plausibility functions.Lower approximation:

uik =

1 if Ai = ωk0 otherwise.

Upper approximation:

uik =

1 if ωk ∈ Ai

0 otherwise.

The outliers can be identified separately as the objects for whichmi (∅) ≥ mi (A) for all A 6= ∅.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 26 / 77

Page 27: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Relational representation of a credal partition

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 27 / 77

Page 28: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Relational representation of a credal partition

Relational representation of a hard partition

A hard partition can be represented equivalently bythe n × c membership matrix U = (uik ) oran n × n relation matrix R = (rij ) representing the equivalence relation

rij =

1 if oi and oj belong to the same group0 otherwise.

The relational representation R is invariant under renumbering of theclusters, and is thus more suitable to compare or combine severalpartitions.What is the counterpart of matrix R in the case of a credal partition?

Thierry Denœux Evidential clustering ISIS 2017, Daegu 28 / 77

Page 29: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Relational representation of a credal partition

Pairwise representation

Let M = (m1, . . . ,mn) be a credal partition.For a pair of objects oi ,oj, let Qij be the question “Do oi and oj belongto the same group?” defined on the frame Θij = Sij ,¬Sij.Θij is a coarsening of Ω2.

ω1 ω2 ω3 ω4

ω1

ω2

ω3

ω4

Ω

Ω

Sij

Given mi and mj on Ω, a mass function mij onΘij can be computed as follows:

1 Extend mi and mj to Ω2

2 Combine the extensions of mi and mj bythe unnormalized Dempster’s rule

3 Compute the restriction of the combinedmass function to Θij

Thierry Denœux Evidential clustering ISIS 2017, Daegu 29 / 77

Page 30: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Relational representation of a credal partition

Pairwise mass function

Mass function:

mij (∅) = mi (∅) + mj (∅)−mi (∅)mj (∅)

mij (Sij) =c∑

k=1

mi (ωk)mj (ωk)

mij (¬Sij) = κij −mij (∅)

mij (Θij ) = 1− κij −∑

k

mi (ωk)mj (ωk)

where κij is the degree of conflict between mi and mj .In particular,

Plij (Sij) = 1− κij

Plij (¬Sij) = 1−mij (∅)−∑

k

mi (ωk)mj (ωk)

Return to CEVCLUS

Thierry Denœux Evidential clustering ISIS 2017, Daegu 30 / 77

Page 31: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Evidential clustering Relational representation of a credal partition

Relational representation of a credal partition

Let M = (m1, . . . ,mn) be a credal partition.The tuple R = (mij )1≤i<j≤n is called the relational representation of credalpartition M.

M = (m1,m2,m3,m4,m5) −→ R =

1 2 3 4 5

1 · m12 m13 m14 m152 · · m23 m24 m253 · · · m34 m354 · · · · m455 · · · · ·

Thierry Denœux Evidential clustering ISIS 2017, Daegu 31 / 77

Page 32: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 32 / 77

Page 33: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms

Evidential clustering algorithms

1 Evidential c-means (ECM): (Masson and Denoeux, 2008):Attribute dataHCM, FCM family

2 EK-NNclus (Denoeux et al, 2015)Attribute or proximity dataSearches for the most plausible partition of a dataset

3 EVCLUS (Denoeux and Masson, 2004; Denoeux et al., 2016):Attribute or proximity (possibly non metric) dataMultidimensional scaling approach

Thierry Denœux Evidential clustering ISIS 2017, Daegu 33 / 77

Page 34: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 34 / 77

Page 35: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Learning a Credal Partition from proximity data

Problem: given the dissimilarity matrix D = (dij ), how to build a“reasonable” credal partition ?We need a model that relates cluster membership to dissimilarities.Basic idea: “The more similar two objects, the more plausible it is thatthey belong to the same group”.How to formalize this idea?

Thierry Denœux Evidential clustering ISIS 2017, Daegu 35 / 77

Page 36: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Formalization

Let mi and mj be mass functions regarding the group membership ofobjects oi and oj .We have seen that the plausibility that objects oi and oj belong to thesame group is

Plij (Sij) =∑

A∩B 6=∅

mi (A)mj (B) = 1− κij

where κij = degree of conflict between mi and mj .Problem: find a credal partition M = (m1, . . . ,mn) such that largerdegrees of conflict κij correspond to larger dissimilarities dij .

Thierry Denœux Evidential clustering ISIS 2017, Daegu 36 / 77

Page 37: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Cost function

Approach: minimize the discrepancy between the dissimilarities dij andthe degrees of conflict κij .Cost (stress) function:

J(M) =∑i<j

(κij − ϕ(dij ))2

where ϕ is an increasing function from [0,+∞) to [0,1], for instance

ϕ(d) = 1− exp(−γd2).

J(M) can be minimized efficiently using an Iterative Row-wise QuadraticProgramming (IRQP) algorithm.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 37 / 77

Page 38: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Butterfly exampleCredal partition

−5 0 5 10

−20

24

68

10

Butterfly data

x1

x 2

1234

5 6 78910

11

12

2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

objects

mas

ses m(∅)

m(ω1)m(ω2)m(Ω)

Thierry Denœux Evidential clustering ISIS 2017, Daegu 38 / 77

Page 39: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Butterfly exampleShepard diagram

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

transformed dissimilarities

degr

ees

of c

onfli

ct

Thierry Denœux Evidential clustering ISIS 2017, Daegu 39 / 77

Page 40: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Example with a four-class dataset (2000 objects)

−5 0 5 10

−5

05

10

x[, 1]

x[, 2

]

−5 0 5 10

−5

05

10

x[, 1]

x[, 2

]

−5 0 5 10

−5

05

10

x[, 1]

x[, 2

]

−5 0 5 10

−5

05

10

x[, 1]

x[, 2

]

Thierry Denœux Evidential clustering ISIS 2017, Daegu 40 / 77

Page 41: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Modification for large datasets

EVCLUS requires to store the whole dissimilarity matrix: it is inapplicableto large proximity data.However, there is usually some redundancy in a dissimilarity matrix.In particular, if two objects o1 and o2 are very similar, then any object o3that is dissimilar from o1 is usually also dissimilar from o2.Because of such redundancies, it might be possible to compute thedifferences between degrees of conflict and dissimilarities, for only asubset of randomly sampled dissimilarities.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 41 / 77

Page 42: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

New stress function

Let j1(i), . . . , jk (i) be k integers sampled at random from the set1, . . . , i − 1, i + 1, . . . ,n, for i = 1, . . . ,n.Let Jk the following stress criterion,

Jk (M) =n∑

i=1

k∑r=1

(κi,jr (i) − ϕ(di,jr (i))

)2.

The calculation of Jk (M) requires only O(nk) operations.If k can be kept constant as n increases, then time and spacecomplexities are reduced from quadratic to linear.This modification makes EVCLUS applicable to large datasets(∼ 104 − 105 objects and hundreds of classes).

Thierry Denœux Evidential clustering ISIS 2017, Daegu 42 / 77

Page 43: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms EVCLUS

Example with simulated data (n = 10,000)

0 500 1000 1500 2000

0.80

0.81

0.82

0.83

0.84

0.85

0.86

k

AR

I

0 500 1000 1500 2000

3040

5060

7080

90

k

time

Thierry Denœux Evidential clustering ISIS 2017, Daegu 43 / 77

Page 44: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 44 / 77

Page 45: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Need to limit the number of focal sets

If no restriction is imposed on the focal sets, the number of parameters tobe estimated in evidential clustering grows exponentially with the numberc of clusters, which makes it intractable unless c is small.If we allow masses to be assigned to all pairs of clusters, the number offocal sets becomes proportional to c2, which is manageable for moderatevalues of c (say, until 10), but still impractical for larger n.Idea: assign masses only to pairs of contiguous clusters.If each cluster has at most q neighbors, then the number of focal sets isproportional to c.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 45 / 77

Page 46: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Example

2e+05 4e+05 6e+05 8e+05 1e+060e+

002e

+05

4e+

056e

+05

8e+

051e

+06

x1

x 2

123456789101112131415

The S2 dataset (n = 5000) and the 15 clusters found by k -EVCLUS withk = 100

Thierry Denœux Evidential clustering ISIS 2017, Daegu 46 / 77

Page 47: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Method

Step1: Run a clustering algorithm (e.g., ECM or EVCLUS) with focalsets of cardinalities 0, 1 and (optionally) c. A credal partition M0is obtained.

Step 2: Compute the similarity between each pair of clusters (ωj , ω`) as

S(j , `) =n∑

i=1

plijpli`,

where plij and pli` are the normalized plausibilities that object ibelongs, respectively, to clusters j and `. Determine the set PKof pairs ωj , ω` that are mutual q nearest neighbors.

Step 3: Run the clustering algorithm again, starting from the previouscredal partition M0, and adding as focal sets the pairs in PK .

Thierry Denœux Evidential clustering ISIS 2017, Daegu 47 / 77

Page 48: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Pairs of mutual neighbors with q = 1

2e+05 4e+05 6e+05 8e+05 1e+060e+00

2e+05

4e+05

6e+05

8e+05

1e+06

x1

x 2

123456789101112131415

Thierry Denœux Evidential clustering ISIS 2017, Daegu 48 / 77

Page 49: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Pairs of mutual neighbors with q = 2

2e+05 4e+05 6e+05 8e+05 1e+060e+00

2e+05

4e+05

6e+05

8e+05

1e+06

x1

x 2

123456789101112131415

Thierry Denœux Evidential clustering ISIS 2017, Daegu 49 / 77

Page 50: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Initial credal partitionM0

2e+05 4e+05 6e+05 8e+05 1e+060e+

002e

+05

4e+

056e

+05

8e+

051e

+06

x1

x 2

123456789101112131415

Lower approximations and ambiguous objects for the initial credal partitionM0Thierry Denœux Evidential clustering ISIS 2017, Daegu 50 / 77

Page 51: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Handling a large number of clusters

Final credal partition (q = 1)

2e+05 4e+05 6e+05 8e+05 1e+060e+

002e

+05

4e+

056e

+05

8e+

051e

+06

x1

x 2

123456789101112131415

Lower approximations and ambiguous objects for the final credal partitionThierry Denœux Evidential clustering ISIS 2017, Daegu 51 / 77

Page 52: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Constrained evidential clustering

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 52 / 77

Page 53: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Constrained evidential clustering

Constrained EVCLUS

In some cases, we may have some prior knowledge about the groupmembership of some objects.Such knowledge may take the form of instance-level constraints of twokinds:

1 Must-link (ML) constraints, which specify that two objects certainly belong tothe same cluster

2 Cannot-link (CL) constraints, which specify that two objects certainly belongto different clusters

How to take into account such constraints?

Thierry Denœux Evidential clustering ISIS 2017, Daegu 53 / 77

Page 54: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Constrained evidential clustering

Modified cost-function

To take into account ML and CL constraints, we can modify the costfunction of k -EVCLUS as

JkC(M) = η

n∑i=1

k∑r=1

(κi,jr (i) − δi,jr (i)

)2+

ξ

2(|ML|+ |CL|)(JML + JCL),

with

JML =∑

(i,j)∈ML

Plij (¬Sij) + 1− Plij (Sij),

JCL =∑

(i,j)∈CL

Plij (Sij) + 1− Plij (¬Sij),

whereML and CL are, respectively, the sets of ML and CL constraints.Plij (¬Sij) and Plij (¬Sij) are computed from pairwise mass function mij

Go back to pairwise mass functions

Thierry Denœux Evidential clustering ISIS 2017, Daegu 54 / 77

Page 55: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Constrained evidential clustering

Results

0 500 1000 1500 2000

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Banana data

AR

IA

RI

nbconst0 500 1000 1500 2000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Letter data

AR

IA

RI

nbconst

Thierry Denœux Evidential clustering ISIS 2017, Daegu 55 / 77

Page 56: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Constrained evidential clustering

Constraint expansion

(a) A CL constraint (oi ,oj ) ∈ CL, with the K -neighborhoods NK (oi ) andNK (oj ) of oi and oj , respectively (K = 2). (b) The set PK (oi ,oj ) of pairs of aneighbor of oi and a neighbor of oj . (c) The K = 2 new CL constraints.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 56 / 77

Page 57: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Algorithms Constrained evidential clustering

Results

0 500 1000 1500 2000

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Banana data

ARI

ARI

nbconst

K=0K=1K=3K=5K=10

0 500 1000 1500 2000

0.2

0.4

0.6

0.8

Letter data

ARI

ARI

nbconst

K=0K=1K=3K=5K=10

Thierry Denœux Evidential clustering ISIS 2017, Daegu 57 / 77

Page 58: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 58 / 77

Page 59: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms

Exploiting the generality of evidential clustering

We have seen that the concept of credal partition subsumes the mainhard and soft clustering structures.Consequently, methods designed to evaluate or combine credal partitionscan be used to evaluate or combine the results of any hard or softclustering algorithms.Two such methods will be described:

1 A generalization of the Rand index to compute the distance between twocredal partitions;

2 A method to combine credal partitions.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 59 / 77

Page 60: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 60 / 77

Page 61: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Rand index

The Rand index is a widely used measure of similarity between two hardpartitions.It is defined as

RI =a + b

n(n − 1)/2

witha = number of pairs of objects that are grouped together in both partitionsb = number of pairs of objects that are assigned to different clusters in bothpartitions.

How to generalize the Rand Index to credal partitions?

Thierry Denœux Evidential clustering ISIS 2017, Daegu 61 / 77

Page 62: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Belief distance

Let R = (mij ) and R′ = (m′ij ) be the relational representations of twocredal partitions.The assess the distance between R and R′, we can average thedistances between the mij ’s and m′ij ’s.Belief distance between mass functions:

δB(mij ,m′ij ) =12

∑A⊆Θ

| Belij (A)− Bel ′ij (A) |,

where Belij and Bel ′ij are, respectively, the belief functions associated tomij and m′ij .Property: δB(mij ,m′ij ) ∈ [0,1].

Thierry Denœux Evidential clustering ISIS 2017, Daegu 62 / 77

Page 63: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Similarity index

We define the Credal Rand Index between two credal partitions as

CRI(R,R′) = 1−∑

i<j δB(mij ,m′ij )n(n − 1)/2

.

Properties:0 ≤ CRI(R,R′) ≤ 1CRI = RI when the two partitions are hardSymmetry: CRI(R,R′) = CRI(R′,R)If R = R′, then CRI(R,R′) = 11− CRI is a metric in the space R of relational representations

The CRI can be used to compare the results of any two hard or softclustering algorithms.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 63 / 77

Page 64: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Example: Seeds data

Comp.1

−2 −1 0 1 2 −0.5 0.0 0.5 1.0

−4

−2

02

−2

−1

01

2

Comp.2

Comp.3

−3

−2

−1

01

−4 −2 0 2

−0.

50.

00.

51.

0

−3 −2 −1 0 1

Comp.4

Seeds from three differentvarieties of wheat: Kama,Rosa and Canadian, 70elements each7 features

Thierry Denœux Evidential clustering ISIS 2017, Daegu 64 / 77

Page 65: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Clustering algorithms

algorithm R function R packageECM ecm evclust

EVCLUS kevclus evclustHCM kmeans stats

Hierarchical clust. hclust stats(Ward distance)

FCM FKM fclustFuzzy k -medoids FKM.med fclustπ-Rough k -means RoughKMeans_PI SoftClustering

EM Mclust mclust

Thierry Denœux Evidential clustering ISIS 2017, Daegu 65 / 77

Page 66: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Credal Rand index

Result: MDS configuration

−0.2 0.0 0.2 0.4

−0.4

−0.2

0.0

0.2

axis 1

axis

2

TRUEHCM

Ward

ECM

ECMh

ECMf

ECMr

EVC

EVCh

EVCf

EVCr

FCMFKMm

EM

RCM

evidential

fuzzy

rough

hard

Thierry Denœux Evidential clustering ISIS 2017, Daegu 66 / 77

Page 67: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Combining clustering structures

Outline

1 Theory of belief functions: a brief introduction

2 Evidential clusteringCredal partitionSummarization of a credal partitionRelational representation of a credal partition

3 AlgorithmsEVCLUSHandling a large number of clustersConstrained evidential clustering

4 Comparing and combining the results of soft clustering algorithmsCredal Rand indexCombining clustering structures

Thierry Denœux Evidential clustering ISIS 2017, Daegu 67 / 77

Page 68: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Combining clustering structures

Motivations for combining clustering structures

Let M1, . . . ,MN be an ensemble of N credal partitions generated by hardor soft (fuzzy, rough, etc.) clustering structures.It may be useful to combine these credal partitions:

to increase the chance of finding a good approximation to the true partition,orto highlight invariant patterns across the clustering structures.

Combination is easily carried out using pairwise representations.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 68 / 77

Page 69: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Combining clustering structures

Combination method

M1

M2

Mk

R1R2

Rk

combina/on R* M*

Credalpar//ons

Pairwiserepresenta/ons

Combinedcredalpar//on

The combined credal partition can be defined as

M∗ = arg maxM

CRI(R(M),R∗),

where R(M) denotes the relational representation of M.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 69 / 77

Page 70: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Combining clustering structures

Example: seeds dataHard clustering results

−4 −2 0 2

−2

−1

01

2

x1

x 2

HCM

−4 −2 0 2

−2

−1

01

2

x1

x 2

Hierarchical Ward

Thierry Denœux Evidential clustering ISIS 2017, Daegu 70 / 77

Page 71: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Combining clustering structures

Example: seeds dataFuzzy clustering results

−4 −2 0 2

−2

−1

01

2

Variability explained by these two components: 71.61%Principal Component 1

Prin

cipa

l Com

pone

nt 2

FCM

−4 −2 0 2

−2

−1

01

2

Variability explained by these two components: 71.61%Principal Component 1

Prin

cipa

l Com

pone

nt 2

FKM.med

Thierry Denœux Evidential clustering ISIS 2017, Daegu 71 / 77

Page 72: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Comparing and combining the results of soft clustering algorithms Combining clustering structures

Example: seeds dataCombined credal partition (Dubois-Prade rule)

−4 −2 0 2

−2

−1

01

2

x1

x 2Combined (DP)

Thierry Denœux Evidential clustering ISIS 2017, Daegu 72 / 77

Page 73: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Conclusions

Summary

The Dempster-Shafer theory of belief functions provides a rich andflexible framework to represent uncertainty in clustering.The concept of credal partition encompasses the main existing softclustering concepts (fuzzy, possibilistic, rough partitions).Efficient algorithms exist, allowing one to generate credal partitions fromattribute or proximity datasets.These algorithms can be applied to large datasets and large numbers ofclusters (by carefully selecting the focal sets).Concepts from the theory of belief functions make it possible to compareand combine clustering structures generated by various soft clusteringalgorithms.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 73 / 77

Page 74: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Conclusions

Future research directions

Combining clustering structures in various settingsdistributed clustering,combination of different attributes, different algorithms,etc.

Handling huge datasets (several millions of objects)Criteria for selecting the number of clustersSemi-supervised clusteringClustering imprecise or uncertain dataApplications to image processing, social network analysis, processmonitoring, etc.Etc...

Thierry Denœux Evidential clustering ISIS 2017, Daegu 74 / 77

Page 75: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

Conclusions

The evclust package

https://cran.r-project.org/web/packages

Thierry Denœux Evidential clustering ISIS 2017, Daegu 75 / 77

Page 76: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

References

References Icf. https://www.hds.utc.fr/˜tdenoeux

M.-H. Masson and T. Denœux.ECM: An evidential version of the fuzzy c-means algorithm.Pattern Recognition, 41(4):1384-1397, 2008.

T. Denœux and M.-H. Masson.EVCLUS: Evidential Clustering of Proximity Data.IEEE Transactions on SMC B, 34(1):95-109, 2004.

V. Antoine, B. Quost, M.-H. Masson and T. Denoeux.CEVCLUS: Evidential clustering with instance-level constraints forrelational data.Soft Computing, 18(7):1321-1335, 2014.

T. Denœux, S. Sriboonchitta and O. KanjanatarakulEvidential clustering of large dissimilarity data.Knowledge-Based Systems, 106:179–195, 2016.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 76 / 77

Page 77: Evidential Clustering: Review and some New Developmentstdenoeux/dokuwiki/_media/en/talk_isis2017.… · Thierry Denœux Evidential clustering ISIS 2017, Daegu 19 / 77. Evidential

References

References IIcf. https://www.hds.utc.fr/˜tdenoeux

T. Denoeux, O. Kanjanatarakul and S. Sriboonchitta.EK-NNclus: a clustering procedure based on the evidential K-nearestneighbor rule.Knowledge-Based Systems, Vol. 88, pages 57-69, 2015.

T. Denoeux, S. Li and S. Sriboonchitta.Evaluating and Comparing Soft Partitions: an Approach Based onDempster-Shafer Theory.IEEE Transactions on Fuzzy Systems, In Press, 2017.

Thierry Denœux Evidential clustering ISIS 2017, Daegu 77 / 77