som-mds

22
Neurocomputing 63 (2005) 171–192 Visualizing asymmetric proximities with SOM and MDS models $ Manuel Martı´n-Merino a, , Alberto Mun˜oz b a Computer Science Department, University Pontificia de Salamanca, C/Compan ˜ı´a 5, 37002 Salamanca, Spain b Statistics Department, University Carlos III de Madrid, C/Madrid 126, 28903 Getafe, Madrid, Spain Received 21 October 2003; received in revised form 19 February 2004; accepted 23 April 2004 Available online 20 August 2004 Abstract Multidimensional scaling (MDS) and self organizing maps (SOM) algorithms are useful to visualize object relationships in a data set. These algorithms rely on the use of symmetric distances or similarity measures; for instance the Euclidean distance. There are a number of relevant applications, such as text mining and DNA microarray processing for which it is worth considering non symmetric similarity measures, that allow us to properly represent hierarchical relationships. In this paper we present asymmetric versions of SOM and MDS algorithms able to deal with asymmetric proximity matrices. We also compare these approaches to the corresponding symmetric versions. Experimental work on text databases and gene expression data sets show that the asymmetric proposed algorithms outperform their symmetric counterparts. r 2004 Elsevier B.V. All rights reserved. Keywords: Multidimensional scaling; Self organizing maps; Textual data analysis; Asymmetric proximities; DNA microarray processing ARTICLE IN PRESS www.elsevier.com/locate/neucom 0925-2312/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2004.04.010 $ This work was partially supported by DGICYT Grant BEC2000-0167 (Spain). Corresponding author. Tel.: +34-923-277-199; fax: +34-923-277-101 E-mail addresses: [email protected] (M. Martı´n-Merino), [email protected] (A. Mun˜oz).

Upload: uno-de-madrid

Post on 05-May-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

  • ARTICLE IN PRESSNeurocomputing 63 (2005) 1711920925-2312/$ -

    doi:10.1016/j

    $This worCorrespoE-mail adwww.elsevier.com/locate/neucomVisualizing asymmetric proximities with SOMand MDS models$

    Manuel Martn-Merinoa,, Alberto Munozb

    aComputer Science Department, University Pontificia de Salamanca, C/Compana 5, 37002 Salamanca,

    SpainbStatistics Department, University Carlos III de Madrid, C/Madrid 126, 28903 Getafe, Madrid, Spain

    Received 21 October 2003; received in revised form 19 February 2004; accepted 23 April 2004

    Available online 20 August 2004Abstract

    Multidimensional scaling (MDS) and self organizing maps (SOM) algorithms are useful to

    visualize object relationships in a data set. These algorithms rely on the use of symmetric

    distances or similarity measures; for instance the Euclidean distance. There are a number of

    relevant applications, such as text mining and DNA microarray processing for which it is

    worth considering non symmetric similarity measures, that allow us to properly represent

    hierarchical relationships. In this paper we present asymmetric versions of SOM and MDS

    algorithms able to deal with asymmetric proximity matrices. We also compare these

    approaches to the corresponding symmetric versions. Experimental work on text databases

    and gene expression data sets show that the asymmetric proposed algorithms outperform their

    symmetric counterparts.

    r 2004 Elsevier B.V. All rights reserved.

    Keywords: Multidimensional scaling; Self organizing maps; Textual data analysis; Asymmetric

    proximities; DNA microarray processingsee front matter r 2004 Elsevier B.V. All rights reserved.

    .neucom.2004.04.010

    k was partially supported by DGICYT Grant BEC2000-0167 (Spain).

    nding author. Tel.: +34-923-277-199; fax: +34-923-277-101

    dresses: [email protected] (M. Martn-Merino), [email protected] (A. Munoz).

    www.elsevier.com/locate/neucom

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 1711921721. Introduction

    Visualization algorithms are helpful tools to represent high dimensional data in anintuitive way [6,9,17,22]. This is particularly useful when dealing with complex datasets where the underlying structure is unknown. Textual analysis constitutes aparadigmatic example of this situation. A number of data visualization techniqueshave been applied to textual information in the past: Self-organizing maps (SOM)[18], multidimensional scaling (MDS) algorithms [9,20] and correspondence analysis(CA) [22] are a few representative examples.

    The primary source of information for any visualization algorithm is a n nmatrix D made up of data point dissimilarities dij, where d is some predefineddistance measure. Alternatively, a n n matrix S made up of object similarities sijcan be used. Often distances (or similarities) are derived from a n p data matrix X ,being n the number of data points and p the number of variables used to representthem.

    There are a number of relevant application fields, such as text mining and DNAmicroarray processing, where the very high dimension causes the occurrence ofproblems related to the curse of dimensionality phenomenon [13]. In particular,most of the similarities sij tend to be close to zero (see [3] for details) andvisualization algorithms, being based on the use of the S matrix, will provide poorresults [5]. Moreover, there are some considerations that may advise the use of nonsymmetric similarity (or distance) measures. Focusing on text mining problems,consider the task of building word associations [26]. When modeling word relations,many people will relate, for instance, neural to networks more strongly thanconversely. Therefore, word relations should be modeled by asymmetric similarities.Regarding DNA processing, it has been noticed that asymmetry could play animportant role in microarray gene expression datasets [29]: specific genes appearingin a small number of diseases can be considered as subsets of broad genes thatappear in a larger number of diseases (while the inverse relation is much weaker).

    The aim of this paper is to introduce asymmetric versions of SOM and MDSalgorithms to deal with data sets where the described problems appear. To this aim, anew class of weighting factors, called asymmetry coefficients will be introduced toderive new versions of the aforementioned algorithms.

    The rest of the paper is organized as follows. Section 2 presents some considerationsabout asymmetric similarities and their implications. Section 3 presents the SOM andMDS algorithms in a suitable form to facilitate the posterior introduction of theirasymmetric counterparts. Section 4 proposes the new SOM and MDS algorithms ableto process asymmetric proximity matrices. Experiments on text and gene expressiondatabases are presented in Section 5. Finally, Section 6 concludes.2. Asymmetry

    Consider a set of n objects and let S sij be the similarity matrix made up ofobject proximities. Asymmetry arises when sijasji. In this case, several algorithms

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 173have been proposed to generate a visual representation of object relationships. In [8]the similarity matrix is first decomposed into a symmetric and skew symmetriccomponent. Both matrices are processed independently and visualized by twodifferent maps. The first one represents the object proximities and the second one thedeviation from symmetry. However, the symmetric component does not reflectaccurately the object proximities [23]. Therefore, object distances in the map becomeoften meaningless. Other asymmetric MDS algorithms have been proposed in theliterature (see for instance [12,16,36,38]) but they suffer from similar drawbacks: onlythe symmetric component of the similarity measure is used to produce the map ofobject proximities.

    In this section, we first analyze how asymmetry impacts on a number of commonlyused similarity measures. Next new coefficients of asymmetry are introduced in orderto model the skew symmetric component of a given similarity matrix. Thesecoefficients will be incorporated into visualization algorithms to prevent the negativeconsequences of asymmetry.

    2.1. Asymmetry implications

    When asymmetry arises, symmetric similarities produce small values for mostpairs of data points and do not reflect accurately the object relationships [26]. We aregoing to consider a simple example from text mining to illustrate the problem.

    Consider a collection of abstracts from scientific journals where, for instance, theterm mathematics appears in 400 documents, while the more specific termbayesian occurs just in a subset of 10 documents. The relation between bayesianand mathematics is strongly asymmetric, in the sense that the concept representedby the word bayesian is a subset of the concept represented by the wordmathematics (but not conversely). Let xm, xb denote the binary vector spacerepresentation [2] of both terms.

    A large number of measures can be considered to quantify the similarity betweentwo binary vectors (see [9,10] for a review), depending on the field of application. Fortext data sets measures that rank similar terms by the number of cooccurrences arepreferred [2]. The Jaccard similarity coefficient is a popular measure inside this groupthat has been widely used in the information retrieval literature [7,20,33]. Thissimilarity coefficient is strongly correlated with other important measures consideredin the textual literature such as the Cosine, Dice or Kulczynski [7] (r 0:99 for thetextual collection used in the experiments here). This suggests that the Jaccardsimilarity represents somewhat the behavior of a broad range of similarities overtextual data. Being textual data a paradigmatic example of the problems studiedhere, in the following we will focus on this measure. Let compute now the Jaccardcoefficient [9] in the example considered above:

    Jmb P

    kxmkxbkPkxmk

    Pkxbk

    Pkxmkxbk

    1010 400 10 0:025: 1

    The similarity value is very close to 0, and this fact implies that the termsmathematics and bayesian are hardly related, which is not true.

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192174This situation is similar for the Euclidean distance, commonly used by mappingalgorithms (including SOMs). To alleviate this problem it is a common practice inthe textual literature [18] to normalize the terms by their L2 norm before computingthe Euclidean distance. However, the resulting dissimilarity is equivalent to theCosine measure. As we have mentioned earlier, this similarity is affected by theterms L1 norm and it is strongly correlated with the Jaccard index. Finally, aninteresting alternative to the Euclidean distance is the w2 [22] that normalizes objectsby the L1 norm before computing a weighted Euclidean distance. Nonetheless, thereis empirical evidence [7] that the dependence of this index on the L1 norm is onlyslightly weaker than for the cosine similarity.

    Finally, it is worth noting that several asymmetric measures such as the fuzzy logicsimilarity [21] have been proposed in the literature but they do not overcome theproblem analyzed above because only the symmetric component of the similaritymatrix is used to derive the proximity map. In our example, using the fuzzy logicsimilarity (2), we get s

    smb 0:5, which suggests that this class of measures are affected

    by asymmetry as well.From the above example one could infer that there is a relation between

    asymmetry and hierarchy. This relation has been noticed in [28] and is explainedbelow.

    There is a particular choice of the asymmetric similarity measure sij that makessense in a number of interesting cases, including the above mentioned. Denote by ^the fuzzy AND operator, and define:

    sij jxi ^ xjjjxij

    P

    kjminxik;xjkjPkjxikj

    ; 2

    where the existence of a data matrix X is assumed. Suppose X corresponds to aterms documents matrix. jxijmeasures the number of documents indexed by term i,and jxi ^ xjj the number of documents indexed by both i and j terms. Therefore, sijmay be interpreted as the degree in which topic represented by term i is a subset oftopic represented by term j. This numeric measure of subsethood is due to Kosko[21]. In the example above, sbm 1 while smb 0:024. In the case of a cocitationmatrix, jxij is the number of cites received by author (or Web page) i, and jxi ^ xjjmeasures the number of authors (or Web pages) that simultaneously cite authors iand j. The case of gene expressions is similar, as explained in Section 2. All theseproblems have in common that the norms of objects (computed by the jxijs) follow aZipfs law [2,26]: there are a few individuals with very large norms, and in theopposite side of the distribution, there are a lot of individuals with very small norms.Therefore, asymmetry can be interpreted as a particular type of hierarchy.Individuals organize in a kind of tree: in the top lie words with large norms,corresponding to broad topics (genes present in many diseases in the DNA datasets). In the base would lie words with small norms, corresponding to rare topics(rare genes).

    We are going next to show the relation between the concepts of norms andasymmetry. In the decomposition sij 12 sij sji

    12sij sji, the second term

    conveys the information provided by asymmetry (it equals zero if S is symmetric).

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 175This skew-symmetric term can be written as follows:

    12sij sji 12

    jxi ^ xjjjxij

    jxi ^ xjjjxjj

    jxi ^ xjjjxijjxjj

    jxjj jxij / jxjj jxij: 3

    Thus, asymmetry is directly related to difference in norms, and will naturally arisewhen the norms of data points follow a Zipfs law. Fig. 1 shows a L1 norm histogramfor the terms of a textual database, illustrating this phenomenon. The figure showsthat most of individuals have a norm close to zero and therefore, most of similaritieswill be close to zero too. This is illustrated in Fig. 2 for the cosine similarity where thestandard deviation is as low as 0:03. This behavior is similar for any of thesimilarities considered in this paper and in particular for the w2 s 0:11.Consequently, distances may become almost constant and the maps generated byMDS algorithms will be highly distorted [5].

    In addition, it has been pointed out in [1] that for sparse data bases such astextual data sets, the relations among specific (low norm) terms can only beestablished through relations with related broader terms (larger norm). In parti-cular, if similarities between specific and broad terms are underestimated,specific terms will occur together in the map just because they are far awayfrom most of the terms in the database [6]. Therefore, it is important to modelaccurately similarities between broad and specific terms in order to obtainmeaningful maps.L1 norm

    Fre

    quen

    cy

    0 50 100 150 200 250

    0

    200

    400

    600

    800

    Fig. 1. Histogram for the L1 norms of terms in a textual database.

  • ARTICLE IN PRESS

    Cos ()

    Fre

    quen

    cy

    0.0 0.2 0.4 0.6 0.8 1.0

    0e+00

    2e+05

    4e+05

    6e+05

    8e+05

    1e+06

    Fig. 2. Histogram for the cosine similarity between terms of a textual database.

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 1711921762.2. Asymmetry coefficients

    Next we are going to associate an asymmetry coefficient to each data point tomodel the skew symmetric component of the similarity matrix.

    Consider the fuzzy logic similarity defined in Eq. (2). The skew symmetriccomponent is given by Eq. (3). This suggests that any asymmetry coefficient aishould verify:

    skij / aj ai; 4

    where skij denotes the skew symmetric component of the similarity matrix. Then we

    will choose the L1 norm as asymmetry coefficient for the fuzzy logic similarity:

    ai jxij: 5

    In text mining problems, this coefficient will take large values for broad terms andsmall values for specific terms. A similar interpretation applies for gene data bases.

    Consider now a general asymmetric similarity sij . If we impose the conditionskij aj ai (by analogy with Eq. (4)) and sum in both sides of this equation over iwe get

    Nsk

    j Naj

    XNi1

    ai; 6

    where sk

    j is defined as

    1N

    Pi s

    kij . Due to the fact that the asymmetry coefficients

    suffer from translation indetermination, the restriction 1N

    Pi ai 0 is imposed.

    Applying the previous restriction to Eq. (6) the following expression for the

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 177coefficient aj is obtained:

    aj sk:j 1

    N

    XNi1

    ai sk:j : 7

    The above expression allows us to derive general asymmetry coefficients once sij isgiven. For instance, [27] presents a coefficient of asymmetry based on theKullbackLeibler divergence.3. Symmetric mapping algorithms

    Let V in fy1; y2; . . . ; yng be a set of objects codified in Rp and D dij thedissimilarity matrix made up of object dissimilarities. Mapping algorithms embedthe original object configuration in a low dimensional space (usually 2 forvisualization purposes) preserving as much as possible the object dissimilarities.This is accomplished by the optimization of any of the Stress functions proposed inliterature [9]. This paper will focus on the widely used C measure [11] defined as:

    C 12

    XNi1

    Xjoi

    dij dMyi;Myj; 8

    where M is the mapping function M : Rp ! Rm, dij is the original dissimilarity and drefers to the distance defined in Rm. The C measure maximizes the correlationbetween the object dissimilarities in input and output spaces. On the other hand, ithas been shown in [11] that a map that maximizes function (8) will preserve theneighbors order induced by the original dissimilarity dij . That is, if dijodik thendMyi;MyjodMyi;Myk 8i; j; k.

    Next sections introduce two mapping algorithm derived from the optimization ofthe C measure. Section 3.1 introduces the iterative MDS algorithm presented in [20]and Section 3.2 the SOM [17].

    3.1. Iterative MDS algorithm

    The MDS considered in this paper has been presented in [20] as a heuristic modelbased on ideas of classic mechanics. It has been chosen by its adequacy for textprocessing tasks. In this section the model is presented as a MDS algorithm thatoptimizes a C measure. The new interpretation helps to understand better the modelperformance and justifies rigorously the convergence to an ordered map.

    Consider the following Stress function:

    rxi; xj X

    i

    Xjai

    f sijkxi xjk2; 9

    where sij denotes the similarity matrix, f is a monotonic function and kxi xjk is theEuclidean object distance defined in the map. The error function (9) is obviously aparticular case of C measure (see Eq. (8)). Therefore, as we have mentioned earlier,

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192178the minimization of (9) will converge to an ordered map that preserves the originalsimilarities sij.

    However, in our application we are interested in mapping algorithms that achievea balance between distances preservation and minimum overlapping between nonrelated clusters in the map. To reduce the inter-cluster overlapping, the originalsimilarity is transformed using the following monotonic function:

    f sij sij T

    maxijfsij Tg; 10

    where sij is any symmetric similarity measure and T is a threshold that serves to splitthe similarities in two groups: those corresponding to related objects sij4T andthose corresponding to non related objects sijoT. The similarity (10) increases thedistance between non related objects favoring a smaller overlap between differentclasses in the map. T must be determined experimentally by analysis of the similarityhistogram [28].

    Finally, the minimization of r by a gradient descent technique gives the followingiterative solution for each coordinate k:

    xikt 1 xikt aqrqxik

    xikt aXjai

    oijf sijxjk xik: 11

    This adaptation rule is equivalent to the one derived for the heuristic model in [20].3.2. Self organizing maps

    The SOM [17] is a nonlinear visualization technique for high dimensional data.Input vectors are represented by neurons arranged according to a regular grid(usually 1D-2D) in such a way that similar vectors in input space become spatiallyclose in the grid.

    It can be shown that the results obtained by the SOM algorithm are equivalent tothat obtained by optimizing the following energy function [14]:

    EW X

    r

    Xxm2Vr

    Xs

    hrsDxm;ws 12

    X

    r

    Xxm2Vr

    Dxm;wr

    |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Quantization error

    KX

    r

    Xsar

    hrsDwr;ws|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

    C measure

    ; 13

    where we have considered that the number of prototypes is large enough so thatDxm;ws Dxm;wr Dwr;ws; hrs is a neighborhood function (for instance theGaussian kernel) that transforms nonlinearly the neuron distances (see [17] for otherpossible choices), D denotes the squared Euclidean distance and Vr is the Voronoiregion corresponding to prototype wr.

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 179Eq. (13) shows that the SOM energy function may be decomposed as the sum of aquantization error and a C measure. The first one minimizes the information lostwhen the input patterns are represented by a set of prototypes. The second onemaximizes the correlation between the prototype dissimilarities and the correspond-ing neuron distances.

    The SOM energy function may be optimized by an iterative algorithm made up oftwo steps [14]. First a quantization algorithm is run that represents each pattern bythe nearest neighbor prototype. This operation minimizes the first term in Eq. (13).Next, the prototypes are organized along the grid of neurons by minimizing thesecond term in the function error. The optimization problem can be solved explicitlyusing the following adaptation rule for each prototype [17]:

    ws PM

    r1P

    xm2Vr hrsxmPMr1

    Pxm2Vr hrs

    ; 14

    where M is the number of neurons and hrs is for instance a Gaussian kernel of widthst. The kernel width is adapted in each iteration using the rule proposed by [24]st sisf =sit=N iter , where si M=2 is usually considered in literature [17] and sfis a parameter that determines the degree of smoothing of the principal curvegenerated by SOM [24].4. Asymmetric mapping algorithms

    As we have shown earlier, ordinary similarities underestimate object proximitieswhen relations are asymmetric (or hierarchical). Therefore, any visualizationalgorithm based on such similarities will not properly represent the objectrelationships. On the other hand, there are a number of asymmetric MDSalgorithms proposed in the literature [12,16,36,38]. These algorithms optimize aquadratic error measure, called Stress, which takes the form (up to normalizationfactors)

    Pijdij dij

    2, where dij is the original dissimilarity between objects i and j,and dij is the Euclidean distance between their mappings via MDS. If we are givensimilarities sij instead dissimilarities or distances, they will have to be firsttransformed into distances [9]. Considering the decomposition of distancesdij d sij d

    kij 12dij dji

    12dij dji, the Stress can be expressed as:X

    ij

    dij dij2 X

    ij

    dsij dsij

    2 X

    ij

    dkij dkij

    2 15

    just considering that the sum of the elements of a skew symmetric similarityequals zero. The optimization of Eq. (15) is equivalent to build two maps thatapproximate independently the symmetric and skew symmetric components of thedissimilarity matrix dij. Therefore, the map that visualizes object proximities isexclusively derived from the symmetric component of dij and is degraded byasymmetry.

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192180In this section we propose asymmetric variants of the algorithms introduced inSection 3. The new algorithms will try to improve the position in the map of objectsthat have a small asymmetry coefficient. This will be accomplished by improvingtheir relative position with respect to objects of larger asymmetry coefficient. Noticethat the objects of smaller asymmetry coefficient are the most problematic forclassical mapping algorithms [1] (see Section 2.1).4.1. MDS algorithm based on asymmetric similarities

    A natural way to extend the MDS algorithm introduced in Section 3 is toincorporate asymmetric similarities that reflect better the object proximities.

    Therefore, an asymmetric index is first defined by

    eij s0ijaj ; 16

    where s0ij is any symmetric similarity transformed in an appropriate fashionto reduce the clusters overlapping in the map (see Section 3.1) and aj is any ofthe asymmetry coefficients defined in Section 2.2. Notice that the object proximitiesinduced by the symmetric component of eij are now modeled by the followingcoefficient:

    esij s0ij

    ai aj2

    : 17

    This proximity index is larger than s0ij even when just one of the objects has a largeasymmetry coefficient. In this case, the relation is highly asymmetric (aibaj orconversely) and s0ij is compensated proportionally to the degree of asymmetryaj ai maxaj ; ai.

    The Stress function for the asymmetric MDS algorithm may now be written as:

    S X

    i

    Xjai

    eijd2ij

    Xi

    Xjai

    esij kxi xjk

    2; 18

    where we have considered that any asymmetric matrix can be expressed as the sum ofa symmetric and a skew-symmetric component eij esij e

    skij [38] and that the

    sum of the elements of a skew-symmetric matrix equals 0.Eq. (18) shows that object similarities are compensated wherever one of the objects

    has a large asymmetry coefficient. Therefore, distances between related objectsbecome smaller in the map, even if their asymmetry coefficients are disparate. Inaddition, the similarity compensation reduces the percentage of similarities that arealmost 0 and smooths the histogram of similarities. This fact helps to alleviate theindifferentiation effect [5] that arises when a large proportion of the objectsimilarities are very similar each other. In particular, specific terms (regardlessof the semantic meaning) will no more concentrate around the central region inthe map.

    Finally, the Stress function (18) may be optimized by a gradient descent techniquein the same way as the symmetric counterpart (see Section 3.1). After that, we obtain

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 181the following iterative rule for the adaptation of each object coordinates:

    xikt 1 xikt aXjai

    s0ijajxjk xik: 194.2. MDS algorithm based on asymmetric distances

    An alternative way to extend the MDS algorithm to the asymmetric case is tomodel object relations in the map by an asymmetric distance.

    In our model, the asymmetric distance is defined as follows:

    d 0ij oijkxi xjk2; 20

    where oij is an asymmetric weigh matrix defined as oij 1 s0ijaj. The proximitymeasure induced by the symmetric component of the new dissimilarity is expressedas

    d 0ijs osij kxi xjk

    2 1 s0ijai aj

    2

    kxi xjk2: 21

    This dissimilarity measure yields larger values than the Euclidean distance for relatedobjects such that one of them has a large asymmetry coefficient. This feature allow usto approximate large dissimilarities when asymmetry raises keeping the Euclideandistance between objects in the map conveniently small.

    Next a Stress function is defined to incorporates the asymmetric distance (21)introduced earlier:

    S X

    i

    Xjai

    oijs0ijkxi xjk2

    Xi

    Xjai

    osij s0ijkxi xjk

    2; 22

    where we have considered as in previous algorithm that an asymmetric matrixdecomposes as oij osij o

    skij and that the sum of the elements of a skew

    symmetric matrix equals zero.As we have detailed in previous section, the optimization of the error function (22)

    reduces the distances in the map between objects of disparate coefficient ofasymmetry. Consequently, it is expected that the position of the more specific objectsget improved.

    Finally, the optimization of the Stress function by a gradient descent techniquegives a simple updating rule for each object coordinate:

    xikt 1 xikt aXjai

    s0ij oijxjk xik: 234.3. Asymmetric SOM algorithm

    As we have mentioned in Section 3.2, from a practical point of view the SOMalgorithm may be derived from the optimization of a Stress function. Therefore, theSOM algorithm may be extended to the asymmetric case taking advantage of theideas presented for the MDS algorithms.

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192182To derive an asymmetric version of the SOM we will proceed as follows. First, anew asymmetric similarity based on the Euclidean distance is defined. Next anenergy function which incorporates the asymmetric similarity is introduced. Finally,the error function is optimized following the same procedure that in the symmetricSOM.

    Consider a dissimilarity measure (for instance, the Euclidean distancedxi;xj kxi xjk2. Using Eq. (24) it can be transformed into a similarity (see [9]).

    sij C kxi xjk2; 24

    where the constant C is a upper bound for the Euclidean distances. Next, anasymmetric similarity is defined that takes into account the objects frequency:

    sij C kxi xjk2jxij: 25

    This asymmetric similarity compensates the Euclidean bias toward small valueswhen considering objects of disparate frequencies. This drawback of the Euclideandistance has been explained in Section 2.1.

    Substituting the new similarity in Eq. (12) the function error for our asymmetricmodel is expressed as:

    EW X

    r

    Xxm2Vr

    Xs

    hrsomC kxm wsk2; 26

    where om is any of the asymmetry coefficient defined in Section 2.2. Eq. (26) showsthat, in the asymmetric version, the object similarities become larger when one of theobjects has a large asymmetry coefficient. In this case, the object relationshipsbecome asymmetric and the corresponding distance along the grid of neurons arereduced proportionally to the degree of asymmetry.

    The error function (26) may be optimized in two steps in a similar manner as in thesymmetric case. First a quantization algorithm is run that generates the SOMprototypes ws. Next the function error is maximized by solving the set of linearequations qEW=qws 0. This system of linear equations can be solved explicitly,giving a simple updating solution for each SOM prototype.

    ws PM

    r1P

    xm2VromhrsxmPMr1

    Pxm2Vromhrs

    ; 27

    where hrs is for instance a Gaussian kernel of width st that determines the degree ofsmoothing of the principal curve. st is adapted in each iteration using the same ruleproposed for the symmetric version.

    Notice that the asymmetric version of SOM maintains the simplicity of theoriginal algorithm and do not add computational burden.

    4.4. Computational complexity considerations

    The MDS algorithm introduced in Section 3 computes nn 1=2 distances in eachiteration. Therefore, the computational complexity is quadratic with the number of

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 183patterns, ON2 [19]. However, several ideas have been proposed in the literature toovercome this drawback. Some of them are commented briefly next.

    In [31] a simple method is proposed that reduces the number of patterns used bythe MDS algorithm drawing a small random sample from the original dataset.However, the performance is poor because no method is used to determine therelevance of each pattern. In [19,31] the data set is first submitted to a quantizationalgorithm (SOM or k-means). Next the MDS algorithm is carried out only on thesmall set of prototypes. Finally, the authors have proposed two different methods togenerate a subset of prototypes but taking advantage of the information provided bythe L1 norm [29]. These methods improve the map quality when the object L1 normfollows a Zipf law (see [29] for an explanation and some experimental results).

    Computational complexity of the asymmetric SOM proposed is OKN2 [18].However, the same methods proposed to increase the efficiency of the symmetricversion [17,18] are also applicable to the asymmetric model proposed in this paper.5. Experimental results

    In this section we apply the proposed algorithms to the construction of maps tovisualize database terms relationships. Also some preliminary experiments arecarried out with DNA microarray expression data that suggest new interesting areasof application.

    We describe now briefly the textual collection used in the experiments. The firstcollection, is made up of 2000 scientific abstracts retrieved from three commercialdatabases LISA, INSPEC and Sociological Abstracts. The collection may bereproduced by asking the queries shown in Table 1 to the corresponding databases.For each database a thesaurus created by human experts is available. Therefore, thethesaurus induces a classification of terms according to their semantic meaning. Thiswill allow us to exhaustively check the term associations created by the map.

    The second collection is made up of 6702 abstracts corresponding to the journalsof the ACM digital library.1 The collection was retrieved by means of a robotdeveloped under the project [32]. In this case, no thesaurus is available for thecollection and therefore the evaluation must rely on unsupervised measures.

    Assessing the performance of algorithms that generate word maps is not an easytask. There are no theoretical arguments to prefer one map to another in the absenceof labelled information. This holds even for simpler mapping algorithms likeprincipal component analysis: should be convenient to normalize data points to zeromean, zero variance or both? In this paper the maps are evaluated from differentviewpoints through several objective functions. This study will be complementedwith the qualitative evaluation of the maps.

    The first measure considered is the Spearman rank correlation coefficient [4] (Sp.).This coefficient checks if the neighbors ordering induced by a dissimilarity defined in1Available at http://www.acm.org.

    http://www.acm.org

  • ARTICLE IN PRESS

    Table 1

    Semantic groups for the multi-topic database

    LISA Business archives; Lotkas law; biology; automatic abstracting

    INSPEC Self organizing maps; dimensionality reduction; power semiconductor

    device; optical cables; feature selection

    Sociological abstracts Intelligence tests; retirement communities; sociology of literature and

    discourse; rural areas and rural poverty

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192184Rp agree with the one induced by the Euclidean distance in the map. Larger valuessuggest that object proximities according to the dissimilarity defined in Rp are betterrepresented in the map. However, the Sp. coefficient is useless when the originaldissimilarity does not reflect term relationships, due for instance to the existence ofasymmetry. To avoid this problem, a new dissimilarity is defined in Rp that is notaffected by asymmetry:

    d0ij dij

    1 sijoij; 28

    where sij is the symmetric component of the fuzzy logic similarity, dij 1 sij thecorresponding dissimilarity [9] and oij is a weight matrix that reduces thedissimilarities d0ij for asymmetric relations.

    Finally, we evaluate also the Sp. coefficient taking into account only the 10% ofnearest neighbors. Notice that the nearest neighbors of specific terms are frequentlybroad terms [23,26] (see Section 2.1). Therefore, this index provides more specificinformation about the preservation of dissimilarities between specific and broadterms. Notice that the Sp. coefficient value depends usually on the number ofpatterns considered. This can be explained because the correct ordering of objects inthe map becomes more difficult when the number of patterns increases.

    The second group of measures quantifies the agreement between the semanticword classes induced by the map and the thesaurus. Therefore, once the objects havebeen mapped, they are grouped into topics with a clustering algorithm (for instancePAM [15]). Next the partition induced by the map is evaluated through the followingmeasures: F measure [2]: it is a compromise between Recall and Precision and it has beenwidely used by the information retrieval community. Intuitively, F measures ifwords associated by the thesaurus are clustered together in the map. Entropy measure E [25,35]: it measures the uncertainty for the classification ofwords that belong to the same cluster. Small values suggest little overlappingbetween different topics in the maps. Obviously smaller values are preferred. Mutual information I [35]: it is a nonlinear correlation measure between theword classification induced by the thesaurus and the word classification given bythe clustering algorithm. Notice that this measure gives more weight to specificterms [37] and therefore provides a valuable information about changes in theposition of less frequent terms.

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 185Table 2 shows the experimental results for the asymmetric mapping algorithms

    proposed in Section 4. They are compared with their symmetric counterpartsintroduced in Section 3 and with the Sammon nonlinear mapping presented in [34].This algorithm is an interesting reference because it has been successfully applied to awide range of multivariate applications [19,31]. In all the experiments, term relationshave been measured by the fuzzy logic similarity (2) and the L1 norm is considered asasymmetry coefficient. Term vectors, have been normalized by the L2 norm. Theprimary conclusions are the following:

    Ta

    Re

    1S2S3S4A

    Im5A

    Im6A

    Im

    Le

    pe

    M5N

    a N iThe linear MDS algorithm introduced in Section 4 (row 2) outperforms theSammon nonlinear mapping. On the one hand the F measure suggests that theoverall quality of both maps is similar. On the other hand, the entropy E suggestsa smaller overlapping between the clusters for the linear mapping. This may beconsidered a consequence of the similarity matrix transformation (10) that favorsthe separation of weak related terms. Finally, the mutual information I shows thatthe position of non frequent terms is slightly worse than in the Sammon mappingpossibly due to the effect of asymmetry. This should be improved by theasymmetric versions. The MDS algorithm that defines asymmetric similarities (row 4), outperformssignificantly both, the symmetric counterpart and the Sammon mapping. Theposition of non frequent terms is significantly improved DI 24%. Conse-quently, distances from specific terms to their respective nearest neighbors (usuallybroad terms [23,26]) are better preserved. This fact is supported by an importantincrease of the Sp. 10 coefficient 17%. In this way, the specific termble 2

    sults for the asymmetric versions of SOM and MDS algorithms

    Multi-topic collection ACM corpus

    Sp. Sp. 10 F E I Sp. Sp. 10

    ammon mapping 0.17 0.20 0.53 0.51 0.18 0.11 0.14

    ymmetric MDS 0.26 0.30 0.53 0.48 0.17 0.25 0.30

    ymmetric SOM 0.43 0.64 0.70 0.38 0.23 0.43 0.74

    symmetric MDS (asymmetric similarities) 0.28 0.35 0.60 0.43 0.21 0.27 0.34

    provement (%) 8 17 13 10 24 8 13

    symmetric MDS (asymmetric distances) 0.29 0.34 0.60 0.48 0.19 0.27 0.33

    provement (%) 12 13 13 0 12 8 10

    symmetric SOM 0.57 0.76 0.78 0.35 0.27 0.51 0.76

    provement (%) 33 16 11 8 17 19 3

    ft column gives results for the multi-corpus database and right column for the ACM digital library. The

    rcentages of improvement are computed considering the symmetric version as reference. Parameters:

    ulti-topic corpus: 1N iter 70, a 0:28; T 0:015; 2N iter 25, a 0:02; 4N iter 20, a 0:01;iter 25, a 0:03; 3;6Nneur 88, N iter 30, ai 30, af 2. ACM: 1N iter 130, a 0:4; 2N iter 25,0:01, T 0:009 ; 4N iter 20, a 0:02, T 0:006; 5N iter 18, a 0:01, T 0:008; 3;6Nneur 100,

    ter 30, ai 36, af 2 .

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192186concentration around the center map (see Section 2.1) is smoothed and clusteroverlapping is reduced DE 10%. The experiments over the ACM digitallibrary support similar conclusions. The MDS algorithm that incorporate asymmetric distances (row 5) outper-forms the symmetric alternatives as well. However, the reorganization of nonfrequent terms is weaker than in previous model DI 12%; DSp:10 13%.This can be explained because the compensation of the similarities for weeklyrelated objects via the weights defined in 4.2 is smaller than in previousmodel. This case arises in some relations between broad and specific terms.Finally, the experiments over the ACM digital library corroborate the previousconclusions. The asymmetric SOM proposed in Section 4.3 (row 6) improves the symmetriccounterpart (row 3) and performs significantly better than any of the MDSalgorithms shown in Table 2. In particular, the position of specific terms in themap is significantly improved in the asymmetric model DI 17% and as aresult, the overlapping between specific weak related objects is reducedDE 8%. Finally, the overall word map quality is a 10% better than in thesymmetric version. The ACM digital library collection, corroborates the super-iority of the proposed asymmetric version.

    Next we show some word maps that illustrate the performance of the algorithmsfrom a qualitative point of view.

    Figs. 3 and 4 shows the visual maps generated by the symmetric MDS algorithmand by the asymmetric version, respectively. The experimental corpus usedis the multi-topic collection. For the sake of clarity, only a small sample ofterms belonging to two topics have been shown. Terms with L1 norm 430 andp30 are visualized in different colors. Fig. 3 shows that the symmetric versiontends to group the terms by L1 norm and the overlapping between clusters is severe.These problems are alleviated by the asymmetric version (Fig. 4). On the other hand,the following associations between specific and broad terms get better in theasymmetric map: PCA, projection 2 principal, dimensionality; pattern recognition2 perceptron, generalization; laser 2 optical, fiber, light; diodes, doped 2semiconductor.

    Finally, Fig. 5 shows the visual map generated by the asymmetric SOM for thesame subset of terms. The SOM prototypes have been projected using Sammonmapping (see [17]) and those one corresponding to neighboring neurons are joinedtogether by continuous trace. Terms with L1 norm 430 and p30 are visualized indifferent colors.

    The figure shows that the terms are spread along the map regardless of thefrequency (L1 norm). The term associations induced by the map are satisfactory evenfor words with disparate degree of generality (L1 norm). See for instance: Selforganizing 2 mapping, cluster, Kohonen; dimensionality reduction 2 discriminant,projection, nonlinear, PCA; statistical 2 bayesian, Gaussian; communication 2internet, telecommunications. Notice also that the network organization issatisfactory.

  • ARTICLE IN PRESS

    -8 -6 -4 -2

    -8

    -6

    -4

    -2

    0

    2

    4

    6

    perceptron

    gaussian

    prior

    wavelength

    laser

    doped

    substrate

    polarization

    passive

    diodesdiffusion

    thyristors

    bandwidthinternet

    amplifier

    som

    semiconductor

    phase

    electricallines

    frequencyintegration

    speed

    multidimensionaldiscriminant

    probability

    extraction

    electronic

    voltage devices

    cabletransmissionoptical

    operational

    channelcircuit

    neurons

    pca wavelet

    organizing

    dimensionalityrecognition

    selectionneural

    fiber

    light

    communicationtechnologypower

    thermal

    silicon

    reductionself

    visualization

    parameter

    pattern

    mapsclassification

    load

    transientprototypedefects machine

    telecommunications

    unsupervised

    estimation

    quantization

    projectionbayesian

    generalizationkohonenmapping

    fuzzy

    optimizationnonlinear

    normal

    statisticallearningfeature

    visual

    likelihood

    clusterprincipal

    Electronics andcommunications.

    Patternclassification.

    Fig. 3. Map generated by the symmetric MDS algorithm for two subjects of the multi-topic collection.

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 187Finally we have applied the proposed asymmetric algorithms to the visualizationof gene relations using DNA microarrays. Results are shown in Table 3. The datasethas been considered earlier by [13]. Once more time, the algorithm evaluation relieson unsupervised measures because there is no a priori classification of genesavailable.

    For the sake of computational efficiency, the objects are first submitted to aquantization algorithm (see Section 4.4 and [29] for more detail) before running theMDS algorithms. The number of prototypes selected equals the 5% of the sample.By the same reason, the SOM algorithms have been run on a random sample of 3000points.

    Table 3 shows that the asymmetric MDS algorithms proposed (rows 4,5) improvesignificantly the quality of the maps generated by their symmetric alternatives (rows1,2) in agreement with the textual data results. On the other hand the asymmetric

  • ARTICLE IN PRESS

    -8 -7 -6 -5 -4 -3 -2

    -6

    -4

    -2

    4

    cluster

    projectionpca

    kohonen

    prototype

    wavelet

    channel

    operational

    diodesinternet

    machine

    circuit

    phasespeed

    optimizationmapping

    neurons

    visualization

    classificationneural rule

    recognition

    parameter

    cable

    electrical

    bandwidth

    lines

    communication

    visual

    maps

    likelihoodnormal

    somfuzzypattern

    electronic

    thermal

    transmission

    diffusionsubstrate

    doped

    amplifier

    loadlaser

    transient

    defects

    feature

    discriminant

    multidimensional

    statisticallearning

    prior

    self

    gaussianselection

    power

    integration

    fiber

    optical

    thyristors

    wavelength

    silicon

    technology

    light

    frequency

    passivepolarization

    telecommunications

    semiconductor

    devices

    voltage

    dimensionalityprincipal

    generalization

    bayesianprobability

    extraction

    unsupervised

    reduction

    organizingestimationnonlinear

    perceptron

    Electronics andcommunications.

    Patternclassification.

    0

    2

    Fig. 4. Map generated by the asymmetric version of the MDS algorithm for two subjects of the multi-

    topic collection.

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192188SOM (row 6) appears to be the best technique to deal with the problems consideredin this paper.6. Conclusions and future research trends

    In this work, we have proposed new asymmetric versions of SOM and MDSalgorithms that model more accurately data relationships when asymmetry problemsarise. The new algorithms have been tested on real data sets such as the ACM digitallibrary collection and gene expression microarray datasets. Besides, they have beenexhaustively evaluated through several objective functions and from a qualitativepoint of view.

  • ARTICLE IN PRESS

    -0.25 - 0.20 -0.15 -0.10 -0.05 0.00

    -0.2

    -0.1

    0.0

    0.1

    0.2

    0.3

    lines

    integration

    fuzzy

    potential

    diffusion thyristors

    estimation quantization parameter prior

    nonlinear wavelet

    extraction

    optimization

    recognition

    classification

    communication

    telecommunications

    light

    technologylaser defects frequency

    voltage silicon power devices semiconductor electronic thermal circuit

    unsupervised neural learning machine

    statistical

    probability gaussian

    projection normal prototype visual

    maps mapping cluster visualization neurons kohonen

    bandwidth doped amplifier polarizationoptical fiber cable transmission channel wavelength

    internet

    speed

    operational

    loadphasetransient

    substrateelectrical diodes

    perceptronfeature selection pattern rule

    bayesian

    discriminant generalization multidimensional likelihood pcareduction dimensionality principal

    self organizing som

    Electronics and

    Pattern classification

    communications

    Fig. 5. Map generated by the asymmetric version of SOM algorithm for two subjects of the multi-topic

    collection.

    Table 3

    Experimental results for the asymmetric techniques proposed against some symmetric alternatives.

    Experiments for the Microarray gene expression dataset

    Sp. Sp. 10

    1Sammon mapping 0.48 0.542Symmetric MDS 0.43 0.493Symmetric SOM 0.58 0.744Asymmetric MDS (asymmetric similarities) 0.63 0.595Asymmetric MDS (asymmetric distances) 0.57 0.596Asymmetric SOM 0.63 0.77

    Parameters: 1a 0:8, N iter 100; 225a 0:002, N iter 13, T 0:4; 4a 0:005, N iter 13, T 0:4;36N iter 30, Nneur 100, ai 50, af 2.

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 189The experimental results show that the asymmetric algorithms improvesignificantly the maps generated by mapping techniques that rely solely on the useof traditional symmetric distances. In particular, the position in the map of rareobjects is strongly improved. Finally, it is worth noting that the asymmetric SOMgives excellent results and arises as the best visualization technique for the problemsconsidered in this paper.

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192190Future research will focus on the study of asymmetric techniques for classificationpurposes.Acknowledgements

    The authors wish to thank two anonymous referees for their useful comments andsuggestions. The authors thank also Professors Yannis Dimitriadis and Pablo de laFuente (University of Valladolid) and their research team for the help provided withthe ACM data collection used in this paper.References

    [1] C.C. Aggarwal, P.S. Yu, Redefining clustering for high-dimensional applications, IEEE Trans.

    Knowledge and Data Eng. 14 (2) (2002) 210225.

    [2] R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, Wokingham,

    UK, 1999.

    [3] K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, When is Nearest Neighbor Meaningful?

    Springer, Lecture Notes in Computer Science, vol. 1540, 1999, pp. 217235.

    [4] J.C. Bezdek, N.R. Pal, An index of topological preservation for feature extraction, Pattern

    Recognition 28 (3) (1995) 381391.

    [5] A. Buja, B. Logan, F. Reeds, R. Shepp, Inequalities and positive-definite functions arising from a

    problem in multidimensional scaling, Ann. Statist. 22 (1994) 406438.

    [6] A. Buja, D. Swayne, M. Littman, N. Dean, XGVIS: interactive data visualization with

    multidimensional scaling, J. Comput. Graphical Statist. 2003, submitted for publication, available

    at: http://www.research.att.com/andreas.[7] Y.M. Chung, J.Y. Lee, A corpus-based approach to comparative evaluation of statistical term

    association measures, J. Am. Soc. Inf. Sci. Technol. 52 (4) (2001) 283296.

    [8] A.G. Constantine, J.C. Gower, Graphical representation of asymmetric matrices, Appl. Statist. 27 (3)

    (1978) 297304.

    [9] T.F. Cox, M.A.A. Cox, Multidimensional Scaling, second ed., Chapman & Hall/CRC, USA, 2001.

    [10] B.V. Cutsem, Classification and Dissimilarity Analysis, Lecture Notes in Statistics, Springer, New

    York, 1994.

    [11] G.J. Goodhill, T.J. Sejnowski, A unifying objective function for topographic mappings,

    Neurocomputing 9 (1997) 12911303.

    [12] R.A. Harshman, P. Green, Y. Wind, M.E. Lundy, A model for the analysis of asymmetric data in

    marketing research, Marketing Sci. 1 (2) (1982) 205242.

    [13] T. Hastie, T. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, Heidelberg,

    2001 available at: http://www-stat.stanford.edu/tibs.[14] T. Heskes, Self-Organizing maps, vector quantization, and mixture modeling, IEEE Trans. Neural

    Networks 12 (6) (2001) 12991305.

    [15] L. Kaufman, P.J. Rousseeuw, Finding groups in data, An Introduction to Cluster Analysis, Wiley,

    New York, 1990.

    [16] H.A.L. Kiers, Y. Takane, A generalization of GIPSCAL for the analysis of nonsymmetric data,

    J. Classification 11 (1994) 7999.

    [17] T. Kohonen, Self-Organizing Maps, second ed., Springer, Germany, 1997.

    [18] T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, A. Saarela, Organization of a

    massive document collection, IEEE Trans. Neural Networks 11 (3) (2000) 574585.

    [19] A. Konig, Interactive visualization and analysis of hierarchical neural projections for data mining,

    IEEE Trans. Neural Networks 11 (3) (2000) 615624.

    http://www.research.att.com/~andreashttp://www.research.att.com/~andreashttp://www-stat.stanford.edu/~tibshttp://www-stat.stanford.edu/~tibs

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192 191[20] A. Kopcsa, E. Schievel, Science and technology mapping: a new iteration model for representing

    multidimensional relationships, J. Am. Soc. Inf. Sci. 49 (1) (1998) 717.

    [21] B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Approach to Machine Intelligence,

    Prentice-Hall, Englewood Cliffs, New Jersey, 1991.

    [22] L. Lebart, A. Morineau, J.F. Warwick, Multivariate Descriptive Statistical Analysis, Wiley, New

    York, 1984.

    [23] M. Martn-Merino, A. Munoz, Self Organizing Map and Sammon Mapping for Asymmetric

    Proximities, in: Lecture Notes in Computer Science, vol. 2130, Springer, Berlin, 2001, pp. 429435.

    [24] F. Mulier, V. Cherkassky, Self-organization as an iterative kernel smoothing process, Neural

    Comput. 7 (1995) 11651177.

    [25] A. Munoz, Neural Networks for non supervised organization of document databases, Ph.D. Thesis,

    1994 (in Spanish).

    [26] A. Munoz, Compound key word generation from document databases using a hierarchical clustering

    ART model, J. Intell. Data Anal. 1 (1) (1997) 2548.

    [27] A. Munoz, M. Martn-Merino, New asymmetric iterative scaling models for the generation of textual

    word maps, Proceedings of the International Conference on Textual Data Statistical Analysis

    JADT02, Saint Malo, France, 2002, pp. 593603, available from Lexicometrica Journal at

    www.cavi.univ-paris3.fr/lexicometrica/index-gb.htm.

    [28] A. Munoz, I. Martin, J.M. Moguerza, Support Vector Machine Classifiers for Asymmetric

    Proximities, in: Lecture Notes in Computer Science, vol. 2714, 2003, pp. 217224.

    [29] A. Munoz, M. Martn-Merino, Visualizing asymmetric proximities with MDS models, in:

    Proceedings of the European Symposium on Artificial Neural Networks ESANN03, Bruges,

    Belgium, 2003, pp. 5158.

    [30] A. Okada, Asymmetric multidimensional scaling of two-mode three-way proximities, J. Classification

    14 (1997) 195224.

    [31] N.R. Pal, V.K. Eluri, Two efficient connectionist schemes for structure preserving dimensionality

    reduction, IEEE Trans. Neural Networks 9 (6) (1998) 11421154.

    [32] O. Riano de Antonio, M.A. Mulero Martnez, Visucluster: visualizacin de jerarquas web, Master

    Thesis, P. de la Fuente and Y. Dimitriadis, Computer Science School, University of Valladolid,

    February, 2002.

    [33] M. Rorvig, Images of similarity: a visual exploration of optimal similarity metrics and scaling

    properties of TREC topic-document sets, J. Am. Soc. Inf. Sci. 50 (8) (1999) 639651.

    [34] J.W. Sammon, A nonlinear mapping for data structure analysis, IEEE Trans. Comput. C-18 (1969)

    401409.

    [35] A. Strehl, J. Ghosh, R. Mooney, Impact of similarity measures on web-page clustering, in:

    Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial

    Intelligence for Web Search, Austin, Texas, USA, July, 2000, pp. 5864.

    [36] Y. Takane, Latent class DEDICOM, J. Classification 14 (1997) 225247.

    [37] Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in:

    Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA,

    July, 1997, pp. 412420.

    [38] B. Zielman, W.J. Heiser, Models for asymmetric proximities, British J. Math. Statist. Psychol. 49

    (1996) 127146.Manuel Martn-Merino received the B.S. degree in physics from the University of

    Salamanca (Spain) in 1996 and the PhD. degree in applied physics from the same

    University in 2003. He is currently an Associate Professor in the Computer

    Science school at the University Pontificia of Salamanca. His research interests

    include visualization algorithms, pattern recognition, neural networks and data

    mining applications. He is a member of the IEEE.

    http:www.cavi.univ-paris3.fr/lexicometrica/index-gb.htm

  • ARTICLE IN PRESS

    M. Martn-Merino, A. Munoz / Neurocomputing 63 (2005) 171192192Alberto Munoz received a B.S. degree in Mathematics from the University of

    Salamanca (Spain) in 1988 and the PhD in applied Mathematics in 1994, from the

    same University. He is currently an Associate Professor of Statistics at the Carlos

    III University (Madrid). His research interest include cluster analysis, data

    visualization, Support Vector Machines and Kernel Methods in general.

    Visualizing asymmetric proximities with SOM and MDS modelsIntroductionAsymmetryAsymmetry implicationsAsymmetry coefficients

    Symmetric mapping algorithmsIterative MDS algorithmSelf organizing maps

    Asymmetric mapping algorithmsMDS algorithm based on asymmetric similaritiesMDS algorithm based on asymmetric distancesAsymmetric SOM algorithmComputational complexity considerations

    Experimental resultsConclusions and future research trendsAcknowledgementsReferences