som-mds
TRANSCRIPT
-
ARTICLE IN PRESSNeurocomputing 63 (2005) 1711920925-2312/$ -
doi:10.1016/j
$This worCorrespoE-mail adwww.elsevier.com/locate/neucomVisualizing asymmetric proximities with SOMand MDS models$
Manuel Martn-Merinoa,, Alberto Munozb
aComputer Science Department, University Pontificia de Salamanca, C/Compana 5, 37002 Salamanca,
SpainbStatistics Department, University Carlos III de Madrid, C/Madrid 126, 28903 Getafe, Madrid, Spain
Received 21 October 2003; received in revised form 19 February 2004; accepted 23 April 2004
Available online 20 August 2004Abstract
Multidimensional scaling (MDS) and self organizing maps (SOM) algorithms are useful to
visualize object relationships in a data set. These algorithms rely on the use of symmetric
distances or similarity measures; for instance the Euclidean distance. There are a number of
relevant applications, such as text mining and DNA microarray processing for which it is
worth considering non symmetric similarity measures, that allow us to properly represent
hierarchical relationships. In this paper we present asymmetric versions of SOM and MDS
algorithms able to deal with asymmetric proximity matrices. We also compare these
approaches to the corresponding symmetric versions. Experimental work on text databases
and gene expression data sets show that the asymmetric proposed algorithms outperform their
symmetric counterparts.
r 2004 Elsevier B.V. All rights reserved.
Keywords: Multidimensional scaling; Self organizing maps; Textual data analysis; Asymmetric
proximities; DNA microarray processingsee front matter r 2004 Elsevier B.V. All rights reserved.
.neucom.2004.04.010
k was partially supported by DGICYT Grant BEC2000-0167 (Spain).
nding author. Tel.: +34-923-277-199; fax: +34-923-277-101
dresses: [email protected] (M. Martn-Merino), [email protected] (A. Munoz).
www.elsevier.com/locate/neucom
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 1711921721. Introduction
Visualization algorithms are helpful tools to represent high dimensional data in anintuitive way [6,9,17,22]. This is particularly useful when dealing with complex datasets where the underlying structure is unknown. Textual analysis constitutes aparadigmatic example of this situation. A number of data visualization techniqueshave been applied to textual information in the past: Self-organizing maps (SOM)[18], multidimensional scaling (MDS) algorithms [9,20] and correspondence analysis(CA) [22] are a few representative examples.
The primary source of information for any visualization algorithm is a n nmatrix D made up of data point dissimilarities dij, where d is some predefineddistance measure. Alternatively, a n n matrix S made up of object similarities sijcan be used. Often distances (or similarities) are derived from a n p data matrix X ,being n the number of data points and p the number of variables used to representthem.
There are a number of relevant application fields, such as text mining and DNAmicroarray processing, where the very high dimension causes the occurrence ofproblems related to the curse of dimensionality phenomenon [13]. In particular,most of the similarities sij tend to be close to zero (see [3] for details) andvisualization algorithms, being based on the use of the S matrix, will provide poorresults [5]. Moreover, there are some considerations that may advise the use of nonsymmetric similarity (or distance) measures. Focusing on text mining problems,consider the task of building word associations [26]. When modeling word relations,many people will relate, for instance, neural to networks more strongly thanconversely. Therefore, word relations should be modeled by asymmetric similarities.Regarding DNA processing, it has been noticed that asymmetry could play animportant role in microarray gene expression datasets [29]: specific genes appearingin a small number of diseases can be considered as subsets of broad genes thatappear in a larger number of diseases (while the inverse relation is much weaker).
The aim of this paper is to introduce asymmetric versions of SOM and MDSalgorithms to deal with data sets where the described problems appear. To this aim, anew class of weighting factors, called asymmetry coefficients will be introduced toderive new versions of the aforementioned algorithms.
The rest of the paper is organized as follows. Section 2 presents some considerationsabout asymmetric similarities and their implications. Section 3 presents the SOM andMDS algorithms in a suitable form to facilitate the posterior introduction of theirasymmetric counterparts. Section 4 proposes the new SOM and MDS algorithms ableto process asymmetric proximity matrices. Experiments on text and gene expressiondatabases are presented in Section 5. Finally, Section 6 concludes.2. Asymmetry
Consider a set of n objects and let S sij be the similarity matrix made up ofobject proximities. Asymmetry arises when sijasji. In this case, several algorithms
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 173have been proposed to generate a visual representation of object relationships. In [8]the similarity matrix is first decomposed into a symmetric and skew symmetriccomponent. Both matrices are processed independently and visualized by twodifferent maps. The first one represents the object proximities and the second one thedeviation from symmetry. However, the symmetric component does not reflectaccurately the object proximities [23]. Therefore, object distances in the map becomeoften meaningless. Other asymmetric MDS algorithms have been proposed in theliterature (see for instance [12,16,36,38]) but they suffer from similar drawbacks: onlythe symmetric component of the similarity measure is used to produce the map ofobject proximities.
In this section, we first analyze how asymmetry impacts on a number of commonlyused similarity measures. Next new coefficients of asymmetry are introduced in orderto model the skew symmetric component of a given similarity matrix. Thesecoefficients will be incorporated into visualization algorithms to prevent the negativeconsequences of asymmetry.
2.1. Asymmetry implications
When asymmetry arises, symmetric similarities produce small values for mostpairs of data points and do not reflect accurately the object relationships [26]. We aregoing to consider a simple example from text mining to illustrate the problem.
Consider a collection of abstracts from scientific journals where, for instance, theterm mathematics appears in 400 documents, while the more specific termbayesian occurs just in a subset of 10 documents. The relation between bayesianand mathematics is strongly asymmetric, in the sense that the concept representedby the word bayesian is a subset of the concept represented by the wordmathematics (but not conversely). Let xm, xb denote the binary vector spacerepresentation [2] of both terms.
A large number of measures can be considered to quantify the similarity betweentwo binary vectors (see [9,10] for a review), depending on the field of application. Fortext data sets measures that rank similar terms by the number of cooccurrences arepreferred [2]. The Jaccard similarity coefficient is a popular measure inside this groupthat has been widely used in the information retrieval literature [7,20,33]. Thissimilarity coefficient is strongly correlated with other important measures consideredin the textual literature such as the Cosine, Dice or Kulczynski [7] (r 0:99 for thetextual collection used in the experiments here). This suggests that the Jaccardsimilarity represents somewhat the behavior of a broad range of similarities overtextual data. Being textual data a paradigmatic example of the problems studiedhere, in the following we will focus on this measure. Let compute now the Jaccardcoefficient [9] in the example considered above:
Jmb P
kxmkxbkPkxmk
Pkxbk
Pkxmkxbk
1010 400 10 0:025: 1
The similarity value is very close to 0, and this fact implies that the termsmathematics and bayesian are hardly related, which is not true.
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192174This situation is similar for the Euclidean distance, commonly used by mappingalgorithms (including SOMs). To alleviate this problem it is a common practice inthe textual literature [18] to normalize the terms by their L2 norm before computingthe Euclidean distance. However, the resulting dissimilarity is equivalent to theCosine measure. As we have mentioned earlier, this similarity is affected by theterms L1 norm and it is strongly correlated with the Jaccard index. Finally, aninteresting alternative to the Euclidean distance is the w2 [22] that normalizes objectsby the L1 norm before computing a weighted Euclidean distance. Nonetheless, thereis empirical evidence [7] that the dependence of this index on the L1 norm is onlyslightly weaker than for the cosine similarity.
Finally, it is worth noting that several asymmetric measures such as the fuzzy logicsimilarity [21] have been proposed in the literature but they do not overcome theproblem analyzed above because only the symmetric component of the similaritymatrix is used to derive the proximity map. In our example, using the fuzzy logicsimilarity (2), we get s
smb 0:5, which suggests that this class of measures are affected
by asymmetry as well.From the above example one could infer that there is a relation between
asymmetry and hierarchy. This relation has been noticed in [28] and is explainedbelow.
There is a particular choice of the asymmetric similarity measure sij that makessense in a number of interesting cases, including the above mentioned. Denote by ^the fuzzy AND operator, and define:
sij jxi ^ xjjjxij
P
kjminxik;xjkjPkjxikj
; 2
where the existence of a data matrix X is assumed. Suppose X corresponds to aterms documents matrix. jxijmeasures the number of documents indexed by term i,and jxi ^ xjj the number of documents indexed by both i and j terms. Therefore, sijmay be interpreted as the degree in which topic represented by term i is a subset oftopic represented by term j. This numeric measure of subsethood is due to Kosko[21]. In the example above, sbm 1 while smb 0:024. In the case of a cocitationmatrix, jxij is the number of cites received by author (or Web page) i, and jxi ^ xjjmeasures the number of authors (or Web pages) that simultaneously cite authors iand j. The case of gene expressions is similar, as explained in Section 2. All theseproblems have in common that the norms of objects (computed by the jxijs) follow aZipfs law [2,26]: there are a few individuals with very large norms, and in theopposite side of the distribution, there are a lot of individuals with very small norms.Therefore, asymmetry can be interpreted as a particular type of hierarchy.Individuals organize in a kind of tree: in the top lie words with large norms,corresponding to broad topics (genes present in many diseases in the DNA datasets). In the base would lie words with small norms, corresponding to rare topics(rare genes).
We are going next to show the relation between the concepts of norms andasymmetry. In the decomposition sij 12 sij sji
12sij sji, the second term
conveys the information provided by asymmetry (it equals zero if S is symmetric).
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 175This skew-symmetric term can be written as follows:
12sij sji 12
jxi ^ xjjjxij
jxi ^ xjjjxjj
jxi ^ xjjjxijjxjj
jxjj jxij / jxjj jxij: 3
Thus, asymmetry is directly related to difference in norms, and will naturally arisewhen the norms of data points follow a Zipfs law. Fig. 1 shows a L1 norm histogramfor the terms of a textual database, illustrating this phenomenon. The figure showsthat most of individuals have a norm close to zero and therefore, most of similaritieswill be close to zero too. This is illustrated in Fig. 2 for the cosine similarity where thestandard deviation is as low as 0:03. This behavior is similar for any of thesimilarities considered in this paper and in particular for the w2 s 0:11.Consequently, distances may become almost constant and the maps generated byMDS algorithms will be highly distorted [5].
In addition, it has been pointed out in [1] that for sparse data bases such astextual data sets, the relations among specific (low norm) terms can only beestablished through relations with related broader terms (larger norm). In parti-cular, if similarities between specific and broad terms are underestimated,specific terms will occur together in the map just because they are far awayfrom most of the terms in the database [6]. Therefore, it is important to modelaccurately similarities between broad and specific terms in order to obtainmeaningful maps.L1 norm
Fre
quen
cy
0 50 100 150 200 250
0
200
400
600
800
Fig. 1. Histogram for the L1 norms of terms in a textual database.
-
ARTICLE IN PRESS
Cos ()
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
0e+00
2e+05
4e+05
6e+05
8e+05
1e+06
Fig. 2. Histogram for the cosine similarity between terms of a textual database.
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 1711921762.2. Asymmetry coefficients
Next we are going to associate an asymmetry coefficient to each data point tomodel the skew symmetric component of the similarity matrix.
Consider the fuzzy logic similarity defined in Eq. (2). The skew symmetriccomponent is given by Eq. (3). This suggests that any asymmetry coefficient aishould verify:
skij / aj ai; 4
where skij denotes the skew symmetric component of the similarity matrix. Then we
will choose the L1 norm as asymmetry coefficient for the fuzzy logic similarity:
ai jxij: 5
In text mining problems, this coefficient will take large values for broad terms andsmall values for specific terms. A similar interpretation applies for gene data bases.
Consider now a general asymmetric similarity sij . If we impose the conditionskij aj ai (by analogy with Eq. (4)) and sum in both sides of this equation over iwe get
Nsk
j Naj
XNi1
ai; 6
where sk
j is defined as
1N
Pi s
kij . Due to the fact that the asymmetry coefficients
suffer from translation indetermination, the restriction 1N
Pi ai 0 is imposed.
Applying the previous restriction to Eq. (6) the following expression for the
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 177coefficient aj is obtained:
aj sk:j 1
N
XNi1
ai sk:j : 7
The above expression allows us to derive general asymmetry coefficients once sij isgiven. For instance, [27] presents a coefficient of asymmetry based on theKullbackLeibler divergence.3. Symmetric mapping algorithms
Let V in fy1; y2; . . . ; yng be a set of objects codified in Rp and D dij thedissimilarity matrix made up of object dissimilarities. Mapping algorithms embedthe original object configuration in a low dimensional space (usually 2 forvisualization purposes) preserving as much as possible the object dissimilarities.This is accomplished by the optimization of any of the Stress functions proposed inliterature [9]. This paper will focus on the widely used C measure [11] defined as:
C 12
XNi1
Xjoi
dij dMyi;Myj; 8
where M is the mapping function M : Rp ! Rm, dij is the original dissimilarity and drefers to the distance defined in Rm. The C measure maximizes the correlationbetween the object dissimilarities in input and output spaces. On the other hand, ithas been shown in [11] that a map that maximizes function (8) will preserve theneighbors order induced by the original dissimilarity dij . That is, if dijodik thendMyi;MyjodMyi;Myk 8i; j; k.
Next sections introduce two mapping algorithm derived from the optimization ofthe C measure. Section 3.1 introduces the iterative MDS algorithm presented in [20]and Section 3.2 the SOM [17].
3.1. Iterative MDS algorithm
The MDS considered in this paper has been presented in [20] as a heuristic modelbased on ideas of classic mechanics. It has been chosen by its adequacy for textprocessing tasks. In this section the model is presented as a MDS algorithm thatoptimizes a C measure. The new interpretation helps to understand better the modelperformance and justifies rigorously the convergence to an ordered map.
Consider the following Stress function:
rxi; xj X
i
Xjai
f sijkxi xjk2; 9
where sij denotes the similarity matrix, f is a monotonic function and kxi xjk is theEuclidean object distance defined in the map. The error function (9) is obviously aparticular case of C measure (see Eq. (8)). Therefore, as we have mentioned earlier,
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192178the minimization of (9) will converge to an ordered map that preserves the originalsimilarities sij.
However, in our application we are interested in mapping algorithms that achievea balance between distances preservation and minimum overlapping between nonrelated clusters in the map. To reduce the inter-cluster overlapping, the originalsimilarity is transformed using the following monotonic function:
f sij sij T
maxijfsij Tg; 10
where sij is any symmetric similarity measure and T is a threshold that serves to splitthe similarities in two groups: those corresponding to related objects sij4T andthose corresponding to non related objects sijoT. The similarity (10) increases thedistance between non related objects favoring a smaller overlap between differentclasses in the map. T must be determined experimentally by analysis of the similarityhistogram [28].
Finally, the minimization of r by a gradient descent technique gives the followingiterative solution for each coordinate k:
xikt 1 xikt aqrqxik
xikt aXjai
oijf sijxjk xik: 11
This adaptation rule is equivalent to the one derived for the heuristic model in [20].3.2. Self organizing maps
The SOM [17] is a nonlinear visualization technique for high dimensional data.Input vectors are represented by neurons arranged according to a regular grid(usually 1D-2D) in such a way that similar vectors in input space become spatiallyclose in the grid.
It can be shown that the results obtained by the SOM algorithm are equivalent tothat obtained by optimizing the following energy function [14]:
EW X
r
Xxm2Vr
Xs
hrsDxm;ws 12
X
r
Xxm2Vr
Dxm;wr
|fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Quantization error
KX
r
Xsar
hrsDwr;ws|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
C measure
; 13
where we have considered that the number of prototypes is large enough so thatDxm;ws Dxm;wr Dwr;ws; hrs is a neighborhood function (for instance theGaussian kernel) that transforms nonlinearly the neuron distances (see [17] for otherpossible choices), D denotes the squared Euclidean distance and Vr is the Voronoiregion corresponding to prototype wr.
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 179Eq. (13) shows that the SOM energy function may be decomposed as the sum of aquantization error and a C measure. The first one minimizes the information lostwhen the input patterns are represented by a set of prototypes. The second onemaximizes the correlation between the prototype dissimilarities and the correspond-ing neuron distances.
The SOM energy function may be optimized by an iterative algorithm made up oftwo steps [14]. First a quantization algorithm is run that represents each pattern bythe nearest neighbor prototype. This operation minimizes the first term in Eq. (13).Next, the prototypes are organized along the grid of neurons by minimizing thesecond term in the function error. The optimization problem can be solved explicitlyusing the following adaptation rule for each prototype [17]:
ws PM
r1P
xm2Vr hrsxmPMr1
Pxm2Vr hrs
; 14
where M is the number of neurons and hrs is for instance a Gaussian kernel of widthst. The kernel width is adapted in each iteration using the rule proposed by [24]st sisf =sit=N iter , where si M=2 is usually considered in literature [17] and sfis a parameter that determines the degree of smoothing of the principal curvegenerated by SOM [24].4. Asymmetric mapping algorithms
As we have shown earlier, ordinary similarities underestimate object proximitieswhen relations are asymmetric (or hierarchical). Therefore, any visualizationalgorithm based on such similarities will not properly represent the objectrelationships. On the other hand, there are a number of asymmetric MDSalgorithms proposed in the literature [12,16,36,38]. These algorithms optimize aquadratic error measure, called Stress, which takes the form (up to normalizationfactors)
Pijdij dij
2, where dij is the original dissimilarity between objects i and j,and dij is the Euclidean distance between their mappings via MDS. If we are givensimilarities sij instead dissimilarities or distances, they will have to be firsttransformed into distances [9]. Considering the decomposition of distancesdij d sij d
kij 12dij dji
12dij dji, the Stress can be expressed as:X
ij
dij dij2 X
ij
dsij dsij
2 X
ij
dkij dkij
2 15
just considering that the sum of the elements of a skew symmetric similarityequals zero. The optimization of Eq. (15) is equivalent to build two maps thatapproximate independently the symmetric and skew symmetric components of thedissimilarity matrix dij. Therefore, the map that visualizes object proximities isexclusively derived from the symmetric component of dij and is degraded byasymmetry.
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192180In this section we propose asymmetric variants of the algorithms introduced inSection 3. The new algorithms will try to improve the position in the map of objectsthat have a small asymmetry coefficient. This will be accomplished by improvingtheir relative position with respect to objects of larger asymmetry coefficient. Noticethat the objects of smaller asymmetry coefficient are the most problematic forclassical mapping algorithms [1] (see Section 2.1).4.1. MDS algorithm based on asymmetric similarities
A natural way to extend the MDS algorithm introduced in Section 3 is toincorporate asymmetric similarities that reflect better the object proximities.
Therefore, an asymmetric index is first defined by
eij s0ijaj ; 16
where s0ij is any symmetric similarity transformed in an appropriate fashionto reduce the clusters overlapping in the map (see Section 3.1) and aj is any ofthe asymmetry coefficients defined in Section 2.2. Notice that the object proximitiesinduced by the symmetric component of eij are now modeled by the followingcoefficient:
esij s0ij
ai aj2
: 17
This proximity index is larger than s0ij even when just one of the objects has a largeasymmetry coefficient. In this case, the relation is highly asymmetric (aibaj orconversely) and s0ij is compensated proportionally to the degree of asymmetryaj ai maxaj ; ai.
The Stress function for the asymmetric MDS algorithm may now be written as:
S X
i
Xjai
eijd2ij
Xi
Xjai
esij kxi xjk
2; 18
where we have considered that any asymmetric matrix can be expressed as the sum ofa symmetric and a skew-symmetric component eij esij e
skij [38] and that the
sum of the elements of a skew-symmetric matrix equals 0.Eq. (18) shows that object similarities are compensated wherever one of the objects
has a large asymmetry coefficient. Therefore, distances between related objectsbecome smaller in the map, even if their asymmetry coefficients are disparate. Inaddition, the similarity compensation reduces the percentage of similarities that arealmost 0 and smooths the histogram of similarities. This fact helps to alleviate theindifferentiation effect [5] that arises when a large proportion of the objectsimilarities are very similar each other. In particular, specific terms (regardlessof the semantic meaning) will no more concentrate around the central region inthe map.
Finally, the Stress function (18) may be optimized by a gradient descent techniquein the same way as the symmetric counterpart (see Section 3.1). After that, we obtain
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 181the following iterative rule for the adaptation of each object coordinates:
xikt 1 xikt aXjai
s0ijajxjk xik: 194.2. MDS algorithm based on asymmetric distances
An alternative way to extend the MDS algorithm to the asymmetric case is tomodel object relations in the map by an asymmetric distance.
In our model, the asymmetric distance is defined as follows:
d 0ij oijkxi xjk2; 20
where oij is an asymmetric weigh matrix defined as oij 1 s0ijaj. The proximitymeasure induced by the symmetric component of the new dissimilarity is expressedas
d 0ijs osij kxi xjk
2 1 s0ijai aj
2
kxi xjk2: 21
This dissimilarity measure yields larger values than the Euclidean distance for relatedobjects such that one of them has a large asymmetry coefficient. This feature allow usto approximate large dissimilarities when asymmetry raises keeping the Euclideandistance between objects in the map conveniently small.
Next a Stress function is defined to incorporates the asymmetric distance (21)introduced earlier:
S X
i
Xjai
oijs0ijkxi xjk2
Xi
Xjai
osij s0ijkxi xjk
2; 22
where we have considered as in previous algorithm that an asymmetric matrixdecomposes as oij osij o
skij and that the sum of the elements of a skew
symmetric matrix equals zero.As we have detailed in previous section, the optimization of the error function (22)
reduces the distances in the map between objects of disparate coefficient ofasymmetry. Consequently, it is expected that the position of the more specific objectsget improved.
Finally, the optimization of the Stress function by a gradient descent techniquegives a simple updating rule for each object coordinate:
xikt 1 xikt aXjai
s0ij oijxjk xik: 234.3. Asymmetric SOM algorithm
As we have mentioned in Section 3.2, from a practical point of view the SOMalgorithm may be derived from the optimization of a Stress function. Therefore, theSOM algorithm may be extended to the asymmetric case taking advantage of theideas presented for the MDS algorithms.
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192182To derive an asymmetric version of the SOM we will proceed as follows. First, anew asymmetric similarity based on the Euclidean distance is defined. Next anenergy function which incorporates the asymmetric similarity is introduced. Finally,the error function is optimized following the same procedure that in the symmetricSOM.
Consider a dissimilarity measure (for instance, the Euclidean distancedxi;xj kxi xjk2. Using Eq. (24) it can be transformed into a similarity (see [9]).
sij C kxi xjk2; 24
where the constant C is a upper bound for the Euclidean distances. Next, anasymmetric similarity is defined that takes into account the objects frequency:
sij C kxi xjk2jxij: 25
This asymmetric similarity compensates the Euclidean bias toward small valueswhen considering objects of disparate frequencies. This drawback of the Euclideandistance has been explained in Section 2.1.
Substituting the new similarity in Eq. (12) the function error for our asymmetricmodel is expressed as:
EW X
r
Xxm2Vr
Xs
hrsomC kxm wsk2; 26
where om is any of the asymmetry coefficient defined in Section 2.2. Eq. (26) showsthat, in the asymmetric version, the object similarities become larger when one of theobjects has a large asymmetry coefficient. In this case, the object relationshipsbecome asymmetric and the corresponding distance along the grid of neurons arereduced proportionally to the degree of asymmetry.
The error function (26) may be optimized in two steps in a similar manner as in thesymmetric case. First a quantization algorithm is run that generates the SOMprototypes ws. Next the function error is maximized by solving the set of linearequations qEW=qws 0. This system of linear equations can be solved explicitly,giving a simple updating solution for each SOM prototype.
ws PM
r1P
xm2VromhrsxmPMr1
Pxm2Vromhrs
; 27
where hrs is for instance a Gaussian kernel of width st that determines the degree ofsmoothing of the principal curve. st is adapted in each iteration using the same ruleproposed for the symmetric version.
Notice that the asymmetric version of SOM maintains the simplicity of theoriginal algorithm and do not add computational burden.
4.4. Computational complexity considerations
The MDS algorithm introduced in Section 3 computes nn 1=2 distances in eachiteration. Therefore, the computational complexity is quadratic with the number of
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 183patterns, ON2 [19]. However, several ideas have been proposed in the literature toovercome this drawback. Some of them are commented briefly next.
In [31] a simple method is proposed that reduces the number of patterns used bythe MDS algorithm drawing a small random sample from the original dataset.However, the performance is poor because no method is used to determine therelevance of each pattern. In [19,31] the data set is first submitted to a quantizationalgorithm (SOM or k-means). Next the MDS algorithm is carried out only on thesmall set of prototypes. Finally, the authors have proposed two different methods togenerate a subset of prototypes but taking advantage of the information provided bythe L1 norm [29]. These methods improve the map quality when the object L1 normfollows a Zipf law (see [29] for an explanation and some experimental results).
Computational complexity of the asymmetric SOM proposed is OKN2 [18].However, the same methods proposed to increase the efficiency of the symmetricversion [17,18] are also applicable to the asymmetric model proposed in this paper.5. Experimental results
In this section we apply the proposed algorithms to the construction of maps tovisualize database terms relationships. Also some preliminary experiments arecarried out with DNA microarray expression data that suggest new interesting areasof application.
We describe now briefly the textual collection used in the experiments. The firstcollection, is made up of 2000 scientific abstracts retrieved from three commercialdatabases LISA, INSPEC and Sociological Abstracts. The collection may bereproduced by asking the queries shown in Table 1 to the corresponding databases.For each database a thesaurus created by human experts is available. Therefore, thethesaurus induces a classification of terms according to their semantic meaning. Thiswill allow us to exhaustively check the term associations created by the map.
The second collection is made up of 6702 abstracts corresponding to the journalsof the ACM digital library.1 The collection was retrieved by means of a robotdeveloped under the project [32]. In this case, no thesaurus is available for thecollection and therefore the evaluation must rely on unsupervised measures.
Assessing the performance of algorithms that generate word maps is not an easytask. There are no theoretical arguments to prefer one map to another in the absenceof labelled information. This holds even for simpler mapping algorithms likeprincipal component analysis: should be convenient to normalize data points to zeromean, zero variance or both? In this paper the maps are evaluated from differentviewpoints through several objective functions. This study will be complementedwith the qualitative evaluation of the maps.
The first measure considered is the Spearman rank correlation coefficient [4] (Sp.).This coefficient checks if the neighbors ordering induced by a dissimilarity defined in1Available at http://www.acm.org.
http://www.acm.org
-
ARTICLE IN PRESS
Table 1
Semantic groups for the multi-topic database
LISA Business archives; Lotkas law; biology; automatic abstracting
INSPEC Self organizing maps; dimensionality reduction; power semiconductor
device; optical cables; feature selection
Sociological abstracts Intelligence tests; retirement communities; sociology of literature and
discourse; rural areas and rural poverty
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192184Rp agree with the one induced by the Euclidean distance in the map. Larger valuessuggest that object proximities according to the dissimilarity defined in Rp are betterrepresented in the map. However, the Sp. coefficient is useless when the originaldissimilarity does not reflect term relationships, due for instance to the existence ofasymmetry. To avoid this problem, a new dissimilarity is defined in Rp that is notaffected by asymmetry:
d0ij dij
1 sijoij; 28
where sij is the symmetric component of the fuzzy logic similarity, dij 1 sij thecorresponding dissimilarity [9] and oij is a weight matrix that reduces thedissimilarities d0ij for asymmetric relations.
Finally, we evaluate also the Sp. coefficient taking into account only the 10% ofnearest neighbors. Notice that the nearest neighbors of specific terms are frequentlybroad terms [23,26] (see Section 2.1). Therefore, this index provides more specificinformation about the preservation of dissimilarities between specific and broadterms. Notice that the Sp. coefficient value depends usually on the number ofpatterns considered. This can be explained because the correct ordering of objects inthe map becomes more difficult when the number of patterns increases.
The second group of measures quantifies the agreement between the semanticword classes induced by the map and the thesaurus. Therefore, once the objects havebeen mapped, they are grouped into topics with a clustering algorithm (for instancePAM [15]). Next the partition induced by the map is evaluated through the followingmeasures: F measure [2]: it is a compromise between Recall and Precision and it has beenwidely used by the information retrieval community. Intuitively, F measures ifwords associated by the thesaurus are clustered together in the map. Entropy measure E [25,35]: it measures the uncertainty for the classification ofwords that belong to the same cluster. Small values suggest little overlappingbetween different topics in the maps. Obviously smaller values are preferred. Mutual information I [35]: it is a nonlinear correlation measure between theword classification induced by the thesaurus and the word classification given bythe clustering algorithm. Notice that this measure gives more weight to specificterms [37] and therefore provides a valuable information about changes in theposition of less frequent terms.
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 185Table 2 shows the experimental results for the asymmetric mapping algorithms
proposed in Section 4. They are compared with their symmetric counterpartsintroduced in Section 3 and with the Sammon nonlinear mapping presented in [34].This algorithm is an interesting reference because it has been successfully applied to awide range of multivariate applications [19,31]. In all the experiments, term relationshave been measured by the fuzzy logic similarity (2) and the L1 norm is considered asasymmetry coefficient. Term vectors, have been normalized by the L2 norm. Theprimary conclusions are the following:
Ta
Re
1S2S3S4A
Im5A
Im6A
Im
Le
pe
M5N
a N iThe linear MDS algorithm introduced in Section 4 (row 2) outperforms theSammon nonlinear mapping. On the one hand the F measure suggests that theoverall quality of both maps is similar. On the other hand, the entropy E suggestsa smaller overlapping between the clusters for the linear mapping. This may beconsidered a consequence of the similarity matrix transformation (10) that favorsthe separation of weak related terms. Finally, the mutual information I shows thatthe position of non frequent terms is slightly worse than in the Sammon mappingpossibly due to the effect of asymmetry. This should be improved by theasymmetric versions. The MDS algorithm that defines asymmetric similarities (row 4), outperformssignificantly both, the symmetric counterpart and the Sammon mapping. Theposition of non frequent terms is significantly improved DI 24%. Conse-quently, distances from specific terms to their respective nearest neighbors (usuallybroad terms [23,26]) are better preserved. This fact is supported by an importantincrease of the Sp. 10 coefficient 17%. In this way, the specific termble 2
sults for the asymmetric versions of SOM and MDS algorithms
Multi-topic collection ACM corpus
Sp. Sp. 10 F E I Sp. Sp. 10
ammon mapping 0.17 0.20 0.53 0.51 0.18 0.11 0.14
ymmetric MDS 0.26 0.30 0.53 0.48 0.17 0.25 0.30
ymmetric SOM 0.43 0.64 0.70 0.38 0.23 0.43 0.74
symmetric MDS (asymmetric similarities) 0.28 0.35 0.60 0.43 0.21 0.27 0.34
provement (%) 8 17 13 10 24 8 13
symmetric MDS (asymmetric distances) 0.29 0.34 0.60 0.48 0.19 0.27 0.33
provement (%) 12 13 13 0 12 8 10
symmetric SOM 0.57 0.76 0.78 0.35 0.27 0.51 0.76
provement (%) 33 16 11 8 17 19 3
ft column gives results for the multi-corpus database and right column for the ACM digital library. The
rcentages of improvement are computed considering the symmetric version as reference. Parameters:
ulti-topic corpus: 1N iter 70, a 0:28; T 0:015; 2N iter 25, a 0:02; 4N iter 20, a 0:01;iter 25, a 0:03; 3;6Nneur 88, N iter 30, ai 30, af 2. ACM: 1N iter 130, a 0:4; 2N iter 25,0:01, T 0:009 ; 4N iter 20, a 0:02, T 0:006; 5N iter 18, a 0:01, T 0:008; 3;6Nneur 100,
ter 30, ai 36, af 2 .
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192186concentration around the center map (see Section 2.1) is smoothed and clusteroverlapping is reduced DE 10%. The experiments over the ACM digitallibrary support similar conclusions. The MDS algorithm that incorporate asymmetric distances (row 5) outper-forms the symmetric alternatives as well. However, the reorganization of nonfrequent terms is weaker than in previous model DI 12%; DSp:10 13%.This can be explained because the compensation of the similarities for weeklyrelated objects via the weights defined in 4.2 is smaller than in previousmodel. This case arises in some relations between broad and specific terms.Finally, the experiments over the ACM digital library corroborate the previousconclusions. The asymmetric SOM proposed in Section 4.3 (row 6) improves the symmetriccounterpart (row 3) and performs significantly better than any of the MDSalgorithms shown in Table 2. In particular, the position of specific terms in themap is significantly improved in the asymmetric model DI 17% and as aresult, the overlapping between specific weak related objects is reducedDE 8%. Finally, the overall word map quality is a 10% better than in thesymmetric version. The ACM digital library collection, corroborates the super-iority of the proposed asymmetric version.
Next we show some word maps that illustrate the performance of the algorithmsfrom a qualitative point of view.
Figs. 3 and 4 shows the visual maps generated by the symmetric MDS algorithmand by the asymmetric version, respectively. The experimental corpus usedis the multi-topic collection. For the sake of clarity, only a small sample ofterms belonging to two topics have been shown. Terms with L1 norm 430 andp30 are visualized in different colors. Fig. 3 shows that the symmetric versiontends to group the terms by L1 norm and the overlapping between clusters is severe.These problems are alleviated by the asymmetric version (Fig. 4). On the other hand,the following associations between specific and broad terms get better in theasymmetric map: PCA, projection 2 principal, dimensionality; pattern recognition2 perceptron, generalization; laser 2 optical, fiber, light; diodes, doped 2semiconductor.
Finally, Fig. 5 shows the visual map generated by the asymmetric SOM for thesame subset of terms. The SOM prototypes have been projected using Sammonmapping (see [17]) and those one corresponding to neighboring neurons are joinedtogether by continuous trace. Terms with L1 norm 430 and p30 are visualized indifferent colors.
The figure shows that the terms are spread along the map regardless of thefrequency (L1 norm). The term associations induced by the map are satisfactory evenfor words with disparate degree of generality (L1 norm). See for instance: Selforganizing 2 mapping, cluster, Kohonen; dimensionality reduction 2 discriminant,projection, nonlinear, PCA; statistical 2 bayesian, Gaussian; communication 2internet, telecommunications. Notice also that the network organization issatisfactory.
-
ARTICLE IN PRESS
-8 -6 -4 -2
-8
-6
-4
-2
0
2
4
6
perceptron
gaussian
prior
wavelength
laser
doped
substrate
polarization
passive
diodesdiffusion
thyristors
bandwidthinternet
amplifier
som
semiconductor
phase
electricallines
frequencyintegration
speed
multidimensionaldiscriminant
probability
extraction
electronic
voltage devices
cabletransmissionoptical
operational
channelcircuit
neurons
pca wavelet
organizing
dimensionalityrecognition
selectionneural
fiber
light
communicationtechnologypower
thermal
silicon
reductionself
visualization
parameter
pattern
mapsclassification
load
transientprototypedefects machine
telecommunications
unsupervised
estimation
quantization
projectionbayesian
generalizationkohonenmapping
fuzzy
optimizationnonlinear
normal
statisticallearningfeature
visual
likelihood
clusterprincipal
Electronics andcommunications.
Patternclassification.
Fig. 3. Map generated by the symmetric MDS algorithm for two subjects of the multi-topic collection.
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 187Finally we have applied the proposed asymmetric algorithms to the visualizationof gene relations using DNA microarrays. Results are shown in Table 3. The datasethas been considered earlier by [13]. Once more time, the algorithm evaluation relieson unsupervised measures because there is no a priori classification of genesavailable.
For the sake of computational efficiency, the objects are first submitted to aquantization algorithm (see Section 4.4 and [29] for more detail) before running theMDS algorithms. The number of prototypes selected equals the 5% of the sample.By the same reason, the SOM algorithms have been run on a random sample of 3000points.
Table 3 shows that the asymmetric MDS algorithms proposed (rows 4,5) improvesignificantly the quality of the maps generated by their symmetric alternatives (rows1,2) in agreement with the textual data results. On the other hand the asymmetric
-
ARTICLE IN PRESS
-8 -7 -6 -5 -4 -3 -2
-6
-4
-2
4
cluster
projectionpca
kohonen
prototype
wavelet
channel
operational
diodesinternet
machine
circuit
phasespeed
optimizationmapping
neurons
visualization
classificationneural rule
recognition
parameter
cable
electrical
bandwidth
lines
communication
visual
maps
likelihoodnormal
somfuzzypattern
electronic
thermal
transmission
diffusionsubstrate
doped
amplifier
loadlaser
transient
defects
feature
discriminant
multidimensional
statisticallearning
prior
self
gaussianselection
power
integration
fiber
optical
thyristors
wavelength
silicon
technology
light
frequency
passivepolarization
telecommunications
semiconductor
devices
voltage
dimensionalityprincipal
generalization
bayesianprobability
extraction
unsupervised
reduction
organizingestimationnonlinear
perceptron
Electronics andcommunications.
Patternclassification.
0
2
Fig. 4. Map generated by the asymmetric version of the MDS algorithm for two subjects of the multi-
topic collection.
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192188SOM (row 6) appears to be the best technique to deal with the problems consideredin this paper.6. Conclusions and future research trends
In this work, we have proposed new asymmetric versions of SOM and MDSalgorithms that model more accurately data relationships when asymmetry problemsarise. The new algorithms have been tested on real data sets such as the ACM digitallibrary collection and gene expression microarray datasets. Besides, they have beenexhaustively evaluated through several objective functions and from a qualitativepoint of view.
-
ARTICLE IN PRESS
-0.25 - 0.20 -0.15 -0.10 -0.05 0.00
-0.2
-0.1
0.0
0.1
0.2
0.3
lines
integration
fuzzy
potential
diffusion thyristors
estimation quantization parameter prior
nonlinear wavelet
extraction
optimization
recognition
classification
communication
telecommunications
light
technologylaser defects frequency
voltage silicon power devices semiconductor electronic thermal circuit
unsupervised neural learning machine
statistical
probability gaussian
projection normal prototype visual
maps mapping cluster visualization neurons kohonen
bandwidth doped amplifier polarizationoptical fiber cable transmission channel wavelength
internet
speed
operational
loadphasetransient
substrateelectrical diodes
perceptronfeature selection pattern rule
bayesian
discriminant generalization multidimensional likelihood pcareduction dimensionality principal
self organizing som
Electronics and
Pattern classification
communications
Fig. 5. Map generated by the asymmetric version of SOM algorithm for two subjects of the multi-topic
collection.
Table 3
Experimental results for the asymmetric techniques proposed against some symmetric alternatives.
Experiments for the Microarray gene expression dataset
Sp. Sp. 10
1Sammon mapping 0.48 0.542Symmetric MDS 0.43 0.493Symmetric SOM 0.58 0.744Asymmetric MDS (asymmetric similarities) 0.63 0.595Asymmetric MDS (asymmetric distances) 0.57 0.596Asymmetric SOM 0.63 0.77
Parameters: 1a 0:8, N iter 100; 225a 0:002, N iter 13, T 0:4; 4a 0:005, N iter 13, T 0:4;36N iter 30, Nneur 100, ai 50, af 2.
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 189The experimental results show that the asymmetric algorithms improvesignificantly the maps generated by mapping techniques that rely solely on the useof traditional symmetric distances. In particular, the position in the map of rareobjects is strongly improved. Finally, it is worth noting that the asymmetric SOMgives excellent results and arises as the best visualization technique for the problemsconsidered in this paper.
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192190Future research will focus on the study of asymmetric techniques for classificationpurposes.Acknowledgements
The authors wish to thank two anonymous referees for their useful comments andsuggestions. The authors thank also Professors Yannis Dimitriadis and Pablo de laFuente (University of Valladolid) and their research team for the help provided withthe ACM data collection used in this paper.References
[1] C.C. Aggarwal, P.S. Yu, Redefining clustering for high-dimensional applications, IEEE Trans.
Knowledge and Data Eng. 14 (2) (2002) 210225.
[2] R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, Wokingham,
UK, 1999.
[3] K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, When is Nearest Neighbor Meaningful?
Springer, Lecture Notes in Computer Science, vol. 1540, 1999, pp. 217235.
[4] J.C. Bezdek, N.R. Pal, An index of topological preservation for feature extraction, Pattern
Recognition 28 (3) (1995) 381391.
[5] A. Buja, B. Logan, F. Reeds, R. Shepp, Inequalities and positive-definite functions arising from a
problem in multidimensional scaling, Ann. Statist. 22 (1994) 406438.
[6] A. Buja, D. Swayne, M. Littman, N. Dean, XGVIS: interactive data visualization with
multidimensional scaling, J. Comput. Graphical Statist. 2003, submitted for publication, available
at: http://www.research.att.com/andreas.[7] Y.M. Chung, J.Y. Lee, A corpus-based approach to comparative evaluation of statistical term
association measures, J. Am. Soc. Inf. Sci. Technol. 52 (4) (2001) 283296.
[8] A.G. Constantine, J.C. Gower, Graphical representation of asymmetric matrices, Appl. Statist. 27 (3)
(1978) 297304.
[9] T.F. Cox, M.A.A. Cox, Multidimensional Scaling, second ed., Chapman & Hall/CRC, USA, 2001.
[10] B.V. Cutsem, Classification and Dissimilarity Analysis, Lecture Notes in Statistics, Springer, New
York, 1994.
[11] G.J. Goodhill, T.J. Sejnowski, A unifying objective function for topographic mappings,
Neurocomputing 9 (1997) 12911303.
[12] R.A. Harshman, P. Green, Y. Wind, M.E. Lundy, A model for the analysis of asymmetric data in
marketing research, Marketing Sci. 1 (2) (1982) 205242.
[13] T. Hastie, T. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, Heidelberg,
2001 available at: http://www-stat.stanford.edu/tibs.[14] T. Heskes, Self-Organizing maps, vector quantization, and mixture modeling, IEEE Trans. Neural
Networks 12 (6) (2001) 12991305.
[15] L. Kaufman, P.J. Rousseeuw, Finding groups in data, An Introduction to Cluster Analysis, Wiley,
New York, 1990.
[16] H.A.L. Kiers, Y. Takane, A generalization of GIPSCAL for the analysis of nonsymmetric data,
J. Classification 11 (1994) 7999.
[17] T. Kohonen, Self-Organizing Maps, second ed., Springer, Germany, 1997.
[18] T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, A. Saarela, Organization of a
massive document collection, IEEE Trans. Neural Networks 11 (3) (2000) 574585.
[19] A. Konig, Interactive visualization and analysis of hierarchical neural projections for data mining,
IEEE Trans. Neural Networks 11 (3) (2000) 615624.
http://www.research.att.com/~andreashttp://www.research.att.com/~andreashttp://www-stat.stanford.edu/~tibshttp://www-stat.stanford.edu/~tibs
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 191[20] A. Kopcsa, E. Schievel, Science and technology mapping: a new iteration model for representing
multidimensional relationships, J. Am. Soc. Inf. Sci. 49 (1) (1998) 717.
[21] B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Approach to Machine Intelligence,
Prentice-Hall, Englewood Cliffs, New Jersey, 1991.
[22] L. Lebart, A. Morineau, J.F. Warwick, Multivariate Descriptive Statistical Analysis, Wiley, New
York, 1984.
[23] M. Martn-Merino, A. Munoz, Self Organizing Map and Sammon Mapping for Asymmetric
Proximities, in: Lecture Notes in Computer Science, vol. 2130, Springer, Berlin, 2001, pp. 429435.
[24] F. Mulier, V. Cherkassky, Self-organization as an iterative kernel smoothing process, Neural
Comput. 7 (1995) 11651177.
[25] A. Munoz, Neural Networks for non supervised organization of document databases, Ph.D. Thesis,
1994 (in Spanish).
[26] A. Munoz, Compound key word generation from document databases using a hierarchical clustering
ART model, J. Intell. Data Anal. 1 (1) (1997) 2548.
[27] A. Munoz, M. Martn-Merino, New asymmetric iterative scaling models for the generation of textual
word maps, Proceedings of the International Conference on Textual Data Statistical Analysis
JADT02, Saint Malo, France, 2002, pp. 593603, available from Lexicometrica Journal at
www.cavi.univ-paris3.fr/lexicometrica/index-gb.htm.
[28] A. Munoz, I. Martin, J.M. Moguerza, Support Vector Machine Classifiers for Asymmetric
Proximities, in: Lecture Notes in Computer Science, vol. 2714, 2003, pp. 217224.
[29] A. Munoz, M. Martn-Merino, Visualizing asymmetric proximities with MDS models, in:
Proceedings of the European Symposium on Artificial Neural Networks ESANN03, Bruges,
Belgium, 2003, pp. 5158.
[30] A. Okada, Asymmetric multidimensional scaling of two-mode three-way proximities, J. Classification
14 (1997) 195224.
[31] N.R. Pal, V.K. Eluri, Two efficient connectionist schemes for structure preserving dimensionality
reduction, IEEE Trans. Neural Networks 9 (6) (1998) 11421154.
[32] O. Riano de Antonio, M.A. Mulero Martnez, Visucluster: visualizacin de jerarquas web, Master
Thesis, P. de la Fuente and Y. Dimitriadis, Computer Science School, University of Valladolid,
February, 2002.
[33] M. Rorvig, Images of similarity: a visual exploration of optimal similarity metrics and scaling
properties of TREC topic-document sets, J. Am. Soc. Inf. Sci. 50 (8) (1999) 639651.
[34] J.W. Sammon, A nonlinear mapping for data structure analysis, IEEE Trans. Comput. C-18 (1969)
401409.
[35] A. Strehl, J. Ghosh, R. Mooney, Impact of similarity measures on web-page clustering, in:
Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial
Intelligence for Web Search, Austin, Texas, USA, July, 2000, pp. 5864.
[36] Y. Takane, Latent class DEDICOM, J. Classification 14 (1997) 225247.
[37] Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in:
Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA,
July, 1997, pp. 412420.
[38] B. Zielman, W.J. Heiser, Models for asymmetric proximities, British J. Math. Statist. Psychol. 49
(1996) 127146.Manuel Martn-Merino received the B.S. degree in physics from the University of
Salamanca (Spain) in 1996 and the PhD. degree in applied physics from the same
University in 2003. He is currently an Associate Professor in the Computer
Science school at the University Pontificia of Salamanca. His research interests
include visualization algorithms, pattern recognition, neural networks and data
mining applications. He is a member of the IEEE.
http:www.cavi.univ-paris3.fr/lexicometrica/index-gb.htm
-
ARTICLE IN PRESS
M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192192Alberto Munoz received a B.S. degree in Mathematics from the University of
Salamanca (Spain) in 1988 and the PhD in applied Mathematics in 1994, from the
same University. He is currently an Associate Professor of Statistics at the Carlos
III University (Madrid). His research interest include cluster analysis, data
visualization, Support Vector Machines and Kernel Methods in general.
Visualizing asymmetric proximities with SOM and MDS modelsIntroductionAsymmetryAsymmetry implicationsAsymmetry coefficients
Symmetric mapping algorithmsIterative MDS algorithmSelf organizing maps
Asymmetric mapping algorithmsMDS algorithm based on asymmetric similaritiesMDS algorithm based on asymmetric distancesAsymmetric SOM algorithmComputational complexity considerations
Experimental resultsConclusions and future research trendsAcknowledgementsReferences