class #3: clustering · objective functions! 4 an objective function, e.g., e(Θ), measures the...

55
1 Class #3: Clustering ML4Bio 2013 March 15th, 2013 Quaid Morris

Upload: others

Post on 03-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

1

Class #3: Clustering "

ML4Bio 2013"March 15th, 2013"

Quaid Morris" "

Page 2: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

2 Module #: Title of Module

Page 3: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Overview!•  Objective functions"•  Parametric clustering (i.e. we are estimating

some parameters):"–  K-means"–  Gaussian mixture models"

•  Network-based (non-parametric clustering, no “parameters” estimated):"–  Hierarchical clustering"–  Affinity propagation (objective function-based)"–  MCL (look it up yourself)"

3

Page 4: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Objective functions!

4

An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations.""By maximizing the objective function, we can do estimation by finding the settings of the parameters that have the best fit." "Likelihood and log likelihood are examples of common objective functions. "

Page 5: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Notes about objective functions!

5

Beware: Sometimes you are supposed to minimize objective functions (rather than maximizing them) but in those cases, the objective function is usually called a cost function or an error function."!Note: you can always turn a minimization problem into a maximization one by putting a minus sign in front of the cost function!""

similarity = - distance

Page 6: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Examples of objective functions: log likelihood!

•  Estimating the bias of a coin, p, given that we’ve observed m heads and n tails."

Pr(m heads and n tails | bias of p) = "

6

Page 7: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Examples of objective functions: log likelihood!

•  Estimating the bias of a coin, p, given that we’ve observed m heads and n tails."

Pr(m heads and n tails | bias of p) = "

7

m + nm

"

# $

%

& ' pm(1− p)n

Page 8: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Examples of objective functions: log likelihood!

•  Estimating the bias of a coin, p, given that we’ve observed m heads and n tails."

Pr(m heads and n tails | bias of p) = Use log likelihood minus a constant as objective

function: Maximum likelihood (ML) estimate:""

8

m + nm

"

# $

%

& ' pm(1− p)n

E(p) =m log p + n log(1− p)

pML = argmax p E(p)

Page 9: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Examples of objective functions: sum of squared errors!

•  Useful for estimating the mean (or “centroid” if m is a vector), m, of a set of observations x1, x2, …, xN."

Objective function: "Minimum sum of squared error (MSSE) estimate: "

9 €

E(m) = − (i=1

N

∑ m − xi )2

mMSSE = argmaxm E(m)

Page 10: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Sum of squared errors with vectors!

•  Recall from linear algebra that for vectors v, and w (where vj and wj are elements of these vectors) that their dot, or inner, product is:"

•  For measuring SSE between vectors m and observations x1 , x2, etc. we use: "

10

vTw = vjwjj∑

m− xii∑

2= (m− xi )

T (m− xi )i∑ = (mj

j∑

i∑ − xij )

2

Squared Euclidean distance"

Page 11: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

K-means!•  Given a “K”, the number of clusters and a set

of data vectors x1, x2, …, xN, find K clusters defined by their centroids: m1, m2, …, mK

"Each data vector xi, is assigned to its closest

mean. Let c(i) be the cluster that xi is assigned to, so:"

"

11

c(i) = argmin j xi −mj2

Page 12: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

K-means!•  Cost function:"

"often E is called the distortion.""Recall:"

""

12

c(i) = argmin j xi −mj2

E(m1,…mK ) = xi −mc(i )i∑

2

Page 13: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

How to select the right groupings?!

x1

x2

With N datapoints and 2 clusters, There’s 2N possible groupings to test!

Page 14: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Lloyd’s algorithm for K-means!•  The K-means objective function is multimodal

and its not exactly clear how to minimize it. There’s a number of algorithms for this (see, e.g., kmeans() help in R)."

•  However, one algorithm, Lloyd’s is so commonly used it’s often called “the K-means algorithm”."

" 14

Page 15: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Lloyd’s algorithm for K-means!•  Step 0: initialize the means"

–  (can do this by randomly sampling the means, or by randomly assigning data points to mean)"

•  Step 1: Compute the cluster assignments, c(i) based on the new means (“E-step”)."

•  Step 2: Compute the means based on the cluster assignments (“M-step”):"–  mj = mean of all xi such that c(i) = j."

•  Step 3: If the means don’t change, then you are done, otherwise go back to Step 1."

"sometimes E is called the distortion.""Recall:"

""

15

Page 16: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 0

Page 17: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 1 “E-step”

Page 18: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 2 “M-step”

Page 19: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 1 “E-step”

Page 20: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 2 “M-step”

Page 21: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 1 “E-step”

Page 22: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Clustering example!

x1

x2

Step 2 “M-step”

We’re done!

Page 23: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Another K-means algorithm that almost always works better.

(sequential K-means)!•  Step 0: initialize the means"

–  (can do this by randomly sampling the means, or by randomly assigning data points to mean)"

•  Step 1: Compute the means based on the cluster assignments (“M-step”):"–  mj = mean of all xi such that c(i) = j.

•  Step 2: Recompute the cluster assignment for a random point, xi

•  Step 3: If you are not done, go back to Step 1." 23

Page 24: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Local minima in K-means!

x1

x2

One solution: do multiple restarts and compare distortions of the different solutions.

Lloyd’s & sKmeans

cannot improve on this solution.

Page 25: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

K-means is “stochastic”!•  Lloyd’s is deterministic and guaranteed to

reduce distortion at each iteration (unless converged) but solution depends on random initialization. Hard to guarantee that you have found the “global minima”"

•  Methods to find “good local minima”:"–  Multiple random initializations and choose solution

that achieves lowest distortion."–  Stochastic search (e.g. simulated annealing or

MCMC*** or [yuck] genetic algorithms.) – these occasionally take steps that increase distortion."

25

Page 26: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

A situation where K-means won’t work well!

x1

x2

What is the lowest

distortion solution?

Page 27: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

A better solution!

x1

x2

Define a “cluster-specific” distance measure.

Ovals indicate equal “Mahalanobis distance” from the cluster mean.

d1(m, x) = (m− x)T M1(m− x)

d2 (m, x) = (m− x)T M2 (m− x)

M1 and M2 are matrices that specify the cluster distance metric

Page 28: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Gaussian mixture models!•  We’re not capturing any information about

the “spread” of the data when we do K-means – this can be especially important when choosing # of means."

•  Rarely can you eyeball the data like we just did to choose # of means. So, we’d like to know if we have one cluster with a broad spread in one dimension and narrow in another or multiple clusters."

•  This is common and can happen if your dimensions are measured in different units." 28

Page 29: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

A MSSE estimate is a ML estimate!!

•  The MSSE is the log likelihood (minus a constant) of the data under a Normal distribution with fixed variance:"

Recall:""and if xi is normally distributed with mean m and

variance of 1 (i.e. σ2 = 1), then:"

29

P(x1,x2,…,xN |m) = P(xi |m)i=1

N

P(xi |m) =12πe−(xi −m)

2 / 2

Page 30: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Multivariate normal density

N(x; µ,σ) = e 1 2π

(x–µ)T Σ-1 (x–µ)

2 –

√|Σ|

1

If M = 0.5 * Σ-1 then dM differs from the log by only a constant

Page 31: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

What’s Σ???!•  Σ is the covariance matrix, it specifies the

shape of the distribution"•  Σij = covariance between xi and xj "•  If Σ = σ2 I (the identity matrix), then the

contours of equal density are “circular” or “spherical”."

•  If Σ is a diagonal matrix, then the distribution is “axis-aligned” but contours may be elliptical."

•  If Σ is neither, then the ellipse is slanted"

" "

""

31

Page 32: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

-4 -2 0 2 4

-4-2

02

4

rmvnorm(n = 300, mean = c(1, 1), sigma = matrix(c(1, -1.9, -1.9, [,1] 4), ncol = 2))[,1]rm

vnor

m(n

= 3

00, m

ean

= c(

1, 1

), si

gma

= m

atrix

(c(1

, -1.

9, -1

.9, [

,2]

4

), nc

ol =

2))[

,2]

1.0 -1.9 -1.9 4.0 Σ =

-4 -2 0 2 4

-4-2

02

4

rmvnorm(n = 300, mean = c(1, 1), sigma = matrix(c(1, 0, 0, 4), [,1] ncol = 2))[,1]

rmvn

orm

(n =

300

, mea

n =

c(1,

1),

sigm

a =

mat

rix(c

(1, 0

, 0, 4

), [,2

]

nco

l = 2

))[,2

]

1 0 0 4 Σ =

-4 -2 0 2 4

-4-2

02

4

rmvnorm(n = 300, mean = c(1, 1), sigma = matrix(c(1, 0, 0, 1), [,1] ncol = 2))[,1]rm

vnor

m(n

= 3

00, m

ean

= c(

1, 1

), si

gma

= m

atrix

(c(1

, 0, 0

, 1),

[,2]

n

col =

2))

[,2]

1 0 0 1 Σ =

-4 -2 0 2 4

-4-2

02

4

rmvnorm(n = 300, mean = c(1, 1), sigma = matrix(c(1, 0.5, 0.5, [,1] 1), ncol = 2))[,1]

rmvn

orm

(n =

300

, mea

n =

c(1,

1),

sigm

a =

mat

rix(c

(1, 0

.5, 0

.5, [

,2]

1

), nc

ol =

2))[

,2]

1.0 0.5 0.5 1.0 Σ =

µ

“full covariance, positively correlated”

“axis-aligned, or diagonal” “full covariance, neg. corr.”

“spherical”

Page 33: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Log of multivariate normal density is (almost) a Mahalanobis distance

N(x; µ,σ) = e 1 2π

(x–µ)T Σ-1 (x–µ)

2 –

√|Σ|

1

Compare with: dM(x,µ) = (x–µ)T M (x–µ)

log N(x; µ,σ) = c – 0.5 * (x–µ)T Σ-1 (x–µ)/2

(c does not depend on x or µ)

If M = 0.5 * Σ-1 then dM differs from the log by only a constant

Page 34: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Math that makes life worth living

P(x1, x2,…, xN |m,Σ) = P(xi |m,Σ)i=1

N

Say we want a MLE for a multivariate Gaussian

It turns out that the MLE for m is simply the mean of the data and the MLE for Σ is the covariance matrix of data!!!!""

Page 35: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Gaussian mixture model!•  Objective function:"

"where "

""

35

E(m1,…mK,Σ1,…,ΣK ) =

P(xi m1,…mK,Σ1,…,ΣK ) |i∏

P(xi m1,…mK ,Σ1,…,ΣK ) =

π kN(xi;mk,k∑ Σk ) Multivariate normal

density

Page 36: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Gaussian mixture model!

x1

x2

What we want

Σ2

Σ1

Page 37: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Gaussian mixture model!

x1

x2

Σ1 is the identity matrix

Page 38: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Gaussian mixture model!

x1

x2

Σ1 is diagonal

Page 39: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Gaussian mixture model!

x1

x2

Σ1

Σ1 is not diagonal

Page 40: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Expectation-Maximization for fitting Gaussian mixture models!

•  E-step: Fit “responsibilities” rik for each data point xi in each cluster k. "

–  rik = πkN(xi; mk, Σk) / sum over ”k” ’ πkN(xi; mk’, Σk’)

•  M-step: fit parameters give responsibilities."–  mk = sumi rik xi / sumi rik

–  where “[Σk]ij” is the (i,j)-th entry of Σk: •  [Σk]rc = sumi rik(xir-mkr) (xic-mkc) / sumi rik

–  πk =sumi rik xi / N where N is the # of datapoints.!40

Page 41: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Model selection!

•  So, we’ve talked about a number of different ways to model data within a cluster – you can use spherical or axis-aligned covariance matrices – what’s the right one to use?"

•  Furthermore, you need to select the number of clusters that are appropriate for your dataset."

•  Making these decisions is called “model selection”"

41

Page 42: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Why model selection is hard!

•  More “complex” models (i.e., models with more parameters) will almost always fit data better than less complex models, so if your objective function (e.g., likelihood) depends only on the data, it will always select the model with more parameters; e.g., with one cluster per datapoint, then the cluster mean could lie exactly on top of the datapoint, and the squared error would be zero. This solution doesn’t tell us much about the data."

"

42

Page 43: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Model selection via penalized likelihood!

•  A simple way to do model selection is to incorporate a term to the objective function that penalizes parameters. So, additional parameters have to be offset by improvements in the fit to the data."

•  These are called “penalized likelihood” methods."

43

Page 44: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

BIC and AIC!

•  Let LL(Θ; X) be the log likelihood of the parameters (represented by Θ) where the data is X, let k be the number of free parameters in the model and n be the number of datapoints:"

•  Akaike Information Criteria:"–  AIC = 2k – 2 LL(Θ; X) "

•  Bayesian Information Criteria:"–  BIC = k log(n) – 2 LL(Θ; X) "

•  BIC is usually a stronger penalty than AIC" 44

Page 45: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Hierarchical agglomerative clustering!

Often difficult to determine correct number of clusters ahead of time and want to group observations at different levels of resolution

dendrogram

clustergram Eisen et al, PNAS 1998

Page 46: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Hierarchical agglomerative clustering!

•  Start with set of clusters :"–  C = {{x1}, {x2}, …, {xN}} each containing exactly

one of the observations, also assume a distance metric dij is defined for each xi and xj

•  While not done:"–  Find most similar (i.e. least distant) pair of clusters

in C, say Ca & Cb –  Merge Ca and Cb to make a new cluster Cnew,

remove Ca and Cb from C and add Cnew –  done if C contains only one cluster "

Page 47: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Hierarchical agglomerative clustering!

""Algorithms vary on how they calculate the

distance of clusters d(Ca, Cb). In all cases, if both clusters contain only one item, say Ca={xi} and Cb={xj} then d(Ca, Cb) = dij

47

Page 48: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Hierarchical agglomerative clustering!

"If clusters have >1 item, then have to choose

linkage criterion:"Average linkage (UPGMA): !

d(Ca, Cb) = mean of distances between items in clusters"Single linkage:!

d(Ca, Cb) = minimum distance between items in clusters"Complete linkage:!

d(Ca, Cb) = maximum distance between items in clusters"

"

Page 49: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

49

Page 50: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Drawing the dendrogram!

Ca Cb

d(Ca,Cb)

This node represents Cnew

Page 51: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Rotation of the subtrees in the dendrogram is arbitrary.

dendrogram

clustergram Eisen et al, PNAS 1998

Advanced topic: ordering the items !

Page 52: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Advanced topic: ordering the items !

•  When displaying the clustergram, the ordering needs to be consistent with the dendrogram (and the clustering) but there are many consistent orderings, as you can arbitrarily rotate the trees."

•  Can use the “TreeArrange” (Ziv Bar-Joseph et al, 2001) algorithm to find the ordering of the items that minimizes the distance between adjacent items while being consistent with the dendrogram (and clustering)."

Page 53: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Affinity propagation(Frey and Dueck, Science 2007)!

""Exemplar-based clustering method, i.e. the cluster

centre is on one of the datapoints. Also, “automatically” chooses the number of cluster centres to use. Requires the similarities, sij, for each pair of data items xi and xj."

Page 54: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Affinity propagation!•  Objective function (for similarities si,j):"

Ε(c) = Σi si,c(i) "where c(i) is the index of the centre that data item xi is

assigned to."•  The self-similarities sii can determine # of

centres, e.g., if sii is less than sij for all j (not equal to i), then there will be only one centre. If sii is greater than sij, for all j, all points will be there own centre. "

Page 55: Class #3: Clustering · Objective functions! 4 An objective function, e.g., E(Θ), measures the fit of one or more parameters, indicated by Θ, to a set of observations." " By maximizing

Propagation algorithm for assessing c(i)!

•  Updates two sets of quantities:"–  rik , the responsibility of k for i"–  aik, the availability of k to serve as i’s centre"

rik = sik – maxk’ | k’ is not k (aik’ + sik’)" aik = min{0, rkk+ Σi’ | i’ is not i or k max(0,ri’k)} " akk = Σi’ | i’ is not k max(0,ri’k)""•  aik all initially start at 0"•  c(i) = argmax rik + aik, once converged"