Neural Networks - Lecture 8 1
Unsupervised competitive learning
• Particularities of unsupervised learning
• Data clustering
• Neural networks for clustering
• Vectorial quantization
• Topological mapping and Kohonen networks
Neural Networks - Lecture 8 2
Particularities of unsupervised learning
Supervised learning
• The training set contains both inputs and correct answers
• Example: classification in predefined classes for which one knows examples of labeled data
• It is similar with the optimization of an error function which measures the difference between the true answers and the answers given by the network
Unsupervised learning:
•The training set contains just input data•Example: grouping data in categories based on the similarities between them•Relies on the statistical properties of data•Does not use an error concept but a clustering quality concept which should be maximized
Neural Networks - Lecture 8 3
Data clusteringData clustering = identifying natural groups (clusters) in the data set
such that– Data in the same cluster are highly similar
– Data belonging to different clusters are dissimilar enough
Rmk: It does not exist an apriori labeling of the data (unlike supervised classification) and sometimes even the number of clusters is unknown
Applications:• Identification of user profiles in the case of e-commerce systems,
e-learning systems etc.)• Identification of homogeneous regions in digital images (image
segmentation)• Electronic document categorization• Biological data analysis
Neural Networks - Lecture 8 4
Data clustering
Example: image segmentation = identification the homogeneous regions in the image = reduction of the number of labels (colours) in the image in order to help the image analysis
Rmk: satellite image obtained by combining three spectral bands
Neural Networks - Lecture 8 5
Data clusteringA key element in data clustering is the similarity/dissimilarity measure
The choice of an appropriate similarity/dissimilarity measure is influenced by the nature of attributes
• Numerical attributes• Categorical attributes• Mixed attributes
Neural Networks - Lecture 8 6
Data clustering• Measures adequate to numerical data
)),(1(2),(
: vectorsnormalizedFor
:distance) (Euclidean measureity Dissimilar
),(
:(cos) measure Similarity
2 YXSYXd
X-Yd(X,Y)
YX
YXYXS
T
Neural Networks - Lecture 8 7
Data clusteringClustering methods:
• Partitional methods:– Lead to a data partitions in disjoint or partially overlapping clusters
– Each cluster is characterized by one or multiple prototypes
• Hierarchical methods:– Lead to a hierarchy of partitions which can be visualized as a tree-like
structure called dendrogram.
Neural Networks - Lecture 8 8
Data clustering
The prototypes can be:• The average of data in the
class• The median of data in the
class
The data are assigned to clusters based on the nearest neighbour prototype
Neural Networks - Lecture 8 9
Neural networks for clustering
Problem: unsupervised classification of N-dimensional data in K clusters
Architecture: • One input layer with N units• Un linear layer with K output units• Fully connectivity between the input and the output layers (the
weights matrix, W, contains on row k the prototype corresponding to class k)
Functioning:For an input data X compute the distances between X and all
prototypes and find the closest one. This corresponds to the cluster to which X belongs.
The unit having the closest weight vector to X is called winning unit
W
Neural Networks - Lecture 8 10
Neural networks for clusteringTraining:
Training set: {X1,X2,…, XL}
Algorithms:• the number of clusters (output units) is known
– “Winner Takes All” algorithm (WTA)
• the number of clusters (output units) is unknown– “Adaptive Resonance Theory” (ART algorithms)
Particularities of these algorithms: • Only the weights corresponding to the winning unit are adjusted• The learning rate is decreasing
Neural Networks - Lecture 8 11
Neural networks for clusteringExamplu: WTA algorithm
)( UNTIL
1
ENDFOR
))((:
1 allfor ),(such that find
DO L1,:l FOR
REPEAT
adjustment Prototypes
1
set training thefromelement selectedrandomly :
tion Initializa
t
tt:
WXtWW
..KkXWd),Xd(Wk*
t:
W
k*lk*k*
lklk*
k
Neural Networks - Lecture 8 12
Neural networks for clusteringRemarks:• Decreasing learning rate
)1ln(/)0()(
[0.5,1] ,/)0()(
:
)( ,)( ,0)(lim1
2
1
tt
tt
Examples
ttttt
t
Neural Networks - Lecture 8 13
Neural networks for clusteringRemarks:• “Dead” units: units which are never winners
Cause: inappropriate initialization of prototypes
Solutions:• Using vectors from the training set as initial
prototypes
• Adjusting not only the winning units but also the other units (using a much smaller learning rate)
• Penalizing the units which frequently become winners
Neural Networks - Lecture 8 14
Neural networks for clustering• Penalizing the units which frequently become winners: change the
criterion to establish if a unit is a winner or not
increases winner a isk unit e th
whensituations ofnumber theas decreases which threshold
),(),( **
k
kk
kk WXdWXd
Neural Networks - Lecture 8 15
Neural networks for clustering
It is useful to normalize both the input data and the weights (prototypes):
• The normalization of data from the training set is realized before the training
• The weights are normalized during the training:
Neural Networks - Lecture 8 16
Neural networks for clustering
Adaptive Resonance Theory: gives solutions for the following problems arising in the design of unsupervised classification systems:
• Adaptability– Refers to the capacity of the system to assimilate new data
and to identify new clusters (this usually means a variable number of clusters)
• Stability – Refers to the capacity of the system to conserve the
clusters structures such that during the adaptation process the system does not radically change its output for a given input data
Neural Networks - Lecture 8 17
Neural networks for clusteringExample: ART2 algorithm
max
kk
max
UNTIL
1
ENDFOR
ENDIF
1 ELSE
))card(1/())card((: THEN
KK OR IF
1 allfor ),(such that find
DO L1,:l FOR
REPEAT
adjustment Prototypes
1
set training thefromelement selectedrandomly :
L)(K prototypes ofnumber initial theChoose
tion Initializa
tt
tt:
X:; WKK:
WXW
),Xd(W
..KkXWd),Xd(Wk*
t:
W
lK
k*lk*
lk*
lklk*
k
Neural Networks - Lecture 8 18
Neural networks for clustering
Remarks:
• The value of rho influences the number of output units (clusters)– A small value of ρ leads to a large number of clusters– A large value of ρ leads to a small number of clusters
• Main drawback: the presentation order of the training data influences the training process
• The main difference between this algorithm and that used to find the centers of a RBF network is represented just by the adjustment eq.
• There are also versions for binary data (alg ART1)
Neural Networks - Lecture 8 19
Vectorial quantization
Aim of vectorial quantization:
• Mapping a region of RN to a finite set of prototypes
• Allows the partitioning of a N-dimensional region in a finite number of subregions such that each subregion is identified by a prototype
• The cuantization process allows to replace a N-dimensional vector with the index of the region which contains it leading to a very simple compression method with a loss of information. The aim is to minimize the loss of information
• The number of prototypes should be large in the dense regions and small in less dense regions
Neural Networks - Lecture 8 20
Vectorial quantizationExample
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
1
Neural Networks - Lecture 8 21
Vectorial quantization• If the number of regions is predefined then one can use the
WTA algorithm
• There is also a supervised variant of vectorial quantization (LVQ - Learning Vector Quantization)
Training set: {(X1,d1),…,(XL,dL)}
LVQ algorithm:1. Initialize the prototypes by applying a WTA algorithm to the
set {X1,…,XL}2. Identify the clusters based on the nearest neighbour criterion3. Establish the label for each class by using the labels from the
training set: for each class is assigned the most frequent label from d1,…dL
Neural Networks - Lecture 8 22
Vectorial quantization
• Iteratively adjust the prototypes by applying an algorithm similar to that of perceptron. At each iteration one applies
ENDFOR
ENDIF
))((: ELSE
))((: THEN IF
:such that *k find
DO ,1: FOR
k*lk*k*
k*lk*k*l
klk*l
WXtWW
WXtWW dc(k*)
),Wd(X),Wd(X
Ll
Neural Networks - Lecture 8 23
Topological mapping• This is a variant of vectorial quantization which ensures a
preservation of the neighbourhood relationship:
– Similar input data either belongs to the same cluster or to neighboring clusters
– There is necessary to define a neighborhood relation on the prototypes and on the output units of the network
– The networks for topological mapping have the output units placed on the nodes of a geometrical grid (one, bi or three dimensional)
– Networks having such an architecture are called Kohonen networks
Neural Networks - Lecture 8 24
Kohonen networks• Architecture
Two inputs and a bidimensional grid
Square grid Hexagonal grid
Neural Networks - Lecture 8 25
Kohonen networks• Functioning: similar to all competitive networks – by using the
nearest neighbor criterion one finds the winning unit
• In order to define a topology for the output level one needs :– A position vector for each output unit– A distance between position vectors
||max),(
distance)Manhattan ( ||),(
distance) (Euclidean )(),(
,13
12
1
21
iq
ipniqp
n
i
iq
ipqp
n
i
iq
ipqp
rrrrd
rrrrd
rrrrd
Neural Networks - Lecture 8 26
Kohonen networks• The neighbourhood of radius s of unit p is:
• Example: for a squared grid the neighborhood of radius 1 defined by the previous three distances are: