[ieee 2009 international conference on complex, intelligent and software intensive systems (cisis) -...

8

Click here to load reader

Upload: andri

Post on 16-Apr-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

Link Structure Ranking Algorithm for Trading Networks

Andri Mirzal Graduate School of Information Science and Technology,

Hokkaido University, Kita 14 Nishi 9, Kita-Ku, Sapporo 060-0814, Japan [email protected]

Abstract

Ranking algorithms based on link structure of the network are well-known methods in web search engines to improve the quality of the searches. The most famous ones are PageRank and HITS. PageRank uses probability of a random surfer to visit a page as the score of that page, and HITS instead of produces one score, proposes using two scores, authority and hub scores. In this paper, we introduce a new link structure ranking algorithm for trading network based on the differences between trading network and WWW network in the links addition process, a process that known to be the foundation of PageRank and HITS formulation. In the last section, we describe the using of proposed algorithm as a tool for network clustering in addition to its original function as a ranking method. 1. Introduction

With the fast growing of Internet use, trading activities that use the Internet and WWW technologies as the media are flourishing. Some works have tried to analyze online trading activities to find similarity among users so that recommendation schemes can be built. For example Kawachi et al. [1] use ranking algorithm to characterize users in online auction network and group them based on their similarity in characteristic vectors. Then by using the most similar users’ preferentials, the recommendation to the users under consideration can be created. The recommendation schemes are not trivial tasks, for examples Netflix, an online DVD rental, on October 2006 offered one million USD prize to the first developer who can create a better recommendation algorithm at predicting customer ratings.

In [1] the authors use their modified HITS to group the users based on similarity on their labeled links. Online auction network is a labeled-link network, where the information about the transactions can not be

represented only by nodes connected with weighted links, but to which category a link belongs to must also be included. The method first calculates authority and hub vector for each category (in yahoo japan auction), and then uses these vectors as signatures to form characteristic vectors (see section 5). Grouping the most similar users is done by inputting the characteristic vectors into self-organizing map (SOM) [2]. The closer the distance between two users on the map, the more similar they are. However, there is one puzzled point here, even though the transactions are conducted on the Internet, is online trading network really similar to WWW network so that a technique borrowed from WWW network research like HITS can be used there. And if they are different, what the differences are.

In this work, we explore the differences between these networks and propose a new ranking algorithm based on those differences. To the best of our knowledge, we are the first to propose ranking algorithm for trading network based on links addition process that occurs in this network. To test the algorithm, instead of using online trading data, international trading data from United Nations [3], [4] is used. The reason for choosing the datasets is solid, as shown in the next section, the trading networks are similar to each other. So by choosing much smaller networks with clear classification of goods and almost fixed price for the same product, problem analysis and adjacency matrices construction become much easier. Not to mention that the matrices are not sparse, so some manipulation steps to ensure the convergence of power method [9] applied to the matrices can be avoided (even though the proposed algorithm is written with assumption that the matrices are sparse). For the sake of algorithm testing it provides us with the best condition.

Actually there are also some issues in the online trading networks that prevent them from being good test datasets. For example in an auction network, the sold goods are too diverse in both type and price to

International Conference on Complex, Intelligent and Software Intensive Systems

978-0-7695-3575-3/09 $25.00 © 2009 IEEE

DOI 10.1109/CISIS.2009.27

120

International Conference on Complex, Intelligent and Software Intensive Systems

978-0-7695-3575-3/09 $25.00 © 2009 IEEE

DOI 10.1109/CISIS.2009.27

120

Page 2: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

allow any classifications work. If clustering method like SOM is applied to this data, it is difficult to infer that goods in the vicinity on the map are more similar to each other than goods that are far away. If it is the case, there is no point to use this classification as the base for users clustering. And if classification cannot be used, adjacency matrix for each kind of good must be constructed, which is really an unfortunate situation, because there will be too many sparse matrices with only one, or two nonzero entries (number of identical goods bought by a user) for one network

And for other type of trading network like online shopping where there are two kinds of nodes, buyers and sellers, the network becomes bipartite graph. Thus ranking and clustering task become different problem, which is beyond the scope of this paper.

2. Trading network

The usual way to calculate degree of importance of nodes in a trading network is by using total amount of export/import of a particular good. This method, however, fails to capture the link structure of the network; to which nodes a node connect to and being connected to. For example the same amount of export to an insignificant country and to an important country will give the same weight to ranking score. This problem actually had ever occurred in WWW network, where the methods of only calculating content scores of web pages were no longer adequate to deal with users’ satisfaction and be able to provide needed accuracy to distinguish important pages from less ones in the fast growing WWW network environment. The solutions of this problem were proposed independently by Brin and Page [5], [6], and Kleinberg [7]. Both solutions use link structure of WWW network to improve the quality of web searches.

In PageRank, an important page is pointed to by other important pages [8 pp. 31], and usually the degree of importance is reflected by the number of inlinks a page has. On the other hand, HITS instead of producing only one score, proposes using two scores; authority and hub score. Good authorities are pointed to by good hubs and good hubs point to good authorities [8 pp. 115]. Authority scores are usually used to describe the degree of importance of the pages, and hub scores are usually used to find portal pages, pages that link to the important pages. In web search purpose, usually only authority scores are used.

Even though PageRank and HITS are the most popular and widely used algorithms in WWW network, they cannot simply be used in trading network because the nature of these networks is different. Each node in trading network has at least one type of resource before

any transactions can occur. The links addition happens when two nodes with different type of resources exchange their resources. Thus, the amount of resources limits number and weight of links a node can have. In WWW network, links addition is simply the creation of new hyperlinks on web pages, so there is no resource needs to be allocated in creating new links.

Another important point that differentiates these networks is links addition in trading network is mutual process, if node A creates a new link to node B, node B also creates a new link to node A. This is not the case in WWW network, whereas if page A has a hyperlink to page B, it doesn’t necessary that page B also has a hyperlink to page A. Another difference, but not related to the construction of the algorithm, is weights of the links in trading network are usually the price/quantity of exchanged goods, whereas in WWW network are usually only binary, either 0 if there is no link and 1 if there is a link connected two pages.

Figure 1. The differences between trading network and WWW network.

Further, the purpose of links addition in trading network is to maximize the benefit of the transactions. Thus, in the selling side, each node competes to get transactions from other nodes that are lack of the resources it offers, and in buying side, each node competes to get transactions from other nodes that have abundant resources it needs. In WWW network, the purpose of links addition is to get hyperlinks from popular pages, and popular pages are likely to get new hyperlinks than less popular ones.

3. Proposed algorithm

In trading network, every node should be careful in making new inlinks and outlinks due to the needed resources in the transactions. Each node competes to get inlinks from other important nodes (nodes with abundant resources that competing nodes need) and competes to make outlinks to other important ones (nodes that need resources from competing nodes) by considering the costs.

Due to this nature, none of the previous discussed web ranking algorithms are suitable. PageRank which

A

B

1 barrel

$70

A

B

1

1

(a) Trading Network (b) WWW network

121121

Page 3: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

focuses on inlinks clearly cannot be used in the environment where inlinks as well as outlinks are highly regarded. HITS is more interesting than PageRank, because it accommodates both inlinks and outlinks. In HITS’s a node should establish new outlinks to others with many inlinks to increase hub score, and should receive inlinks from others with many outlinks to increase authority score. However, in the context of our problem, where making and receiving new links can be expensive, it is more appropriate to make new outlinks to nodes that have many outlinks and receive inlinks from nodes that have many inlinks, because receiving inlinks means getting resources and creating outlinks means giving up resources. Figure 2 shows the differences in links addition process between trading network and WWW network. In trading network, A prefers B (node with many outlinks therefore lack of resource) when making a new outlink and C (node with many inlinks therefore full of resource) when looking for a new inlink. This preferential is opposite to WWW network.

Figure 2. Links addition processes.

Proposed algorithm is defined with following statement: a node becomes more important if being pointed to by others that have many inlinks and points to others that have many outlinks. By comparing the links addition processes in these networks (shown in figure 2) and HITS mathematical model [8 pp. 115], this definition can be written as:

( ) ( )∑∑

→→

×−+×=ji

jhjij

jaji ncnrncnrnr )()()1()()()( ββ (1)

where

,)(outdegree)(indegree)(degree

)(outdegree)(

,)(outdegree)(indegree)(degree)(indegree

)(

jpjj

j

jjh

jpjj

j

jja

nnn

nnc

nnnn

nc

−−×=

−×=

and

⎪⎩

⎪⎨

=<>

=)(outdegree)(indegree if0)(outdegree)(indegree if1-)(outdegree)(indegree if1

jj

jj

jj

j

nnnnnn

p

r(ni) denotes ranking score of node i, |*| denotes absolute value of *, i → j denotes node i links to node j, indegree(nj) / outdegree(nj) / degree(nj) denotes the number of inlinks / outlinks / links node j has, and parameter β (0.0 ≤ β ≤ 1.0) is used to determine which type of link is more important. If outlink is more important than inlink set β < 0.5, if inlink is more important than outlink set β > 0.5, and β = 0.5 otherwise.

As shown in eq. (1), r(nj) is weighted with constant ca(nj) in the first term of right hand part, where the bigger the number of inlinks and the smaller the number of outlinks of node j, the larger ca(nj) becomes. And in the second term, r(nj) is weighted with constant ch(nj), where the bigger the number of outlinks and the smaller the number of inlinks of node j, the larger ch(nj) becomes. Thus, the above equation agrees with proposed algorithm definition.

Figure 3. A schematic explanation of the differences among algorithms.

There is one problem in eq. (1); the r(ni) values are unknown. Consequently this equation is unsolvable. To sidestep this problem, iterative procedure will be applied to eq. (1), and initial values are set to 1/N for every node, where N is the number of nodes in the network (other initial values are possible, see [8 pp. 36, 164 for details]). Let k = 0, 1, … is the iterations number, then eq. (1) can be rewritten as:

( ) ( )∑∑

→→

+ ×−+×=ji

jhjk

ijjaj

ki

k ncnrncnrnr )()()1()()()( )()()1( ββ (2)

By introducing iterative procedure into algorithm

formulation, convergence issue arises because there is no guarantee that eq. (2) will converge into a single unique positive score value for every node in a reasonable number of iterations. To overcome this difficulty, we will modify the algorithm representation in matrix form and do some manipulations to ensure the convergence property.

Let M = βF+(1-β)G, where F = KD-1DiL is the authority part which describes fraction of scores a node receives from its inlinks, G = K-1D-1DoLT is the hub part which describes fraction of scores a node receives from its outlinks. and L is N×N the adjacency matrix of the network. Thus, eq. (2) can be rewritten as:

C

B

A

C

B

A

C

B

A

C

B

A

making a link receiving a link making a link receiving a link

(a) Trading network (b) WWW network

PageRank HITS Proposed Algorithm

122122

Page 4: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

( )TTk

TkTk

LDDKLDKDrMrr

oi111)(

)()1(

)1( −−−

+

−+==

ββ (3)

where r(k)T is the ranking vector at kth iteration, diagonal matrices Di, Do, D are defined as:

oiooii DDDdDdD +=== and ),(diag ),(diag (4)

and K is a diagonal matrix with diagonal entries defined as:

( ) ip

iiii oi DDK −= (5)

given ii = ∑j Lji is indegree of node i, oi = ∑k Lik is outdegree of node i, di = (i1 i2 … iN)T is inlink vector, and do = (o1 o2 … oN)T is outlink vector of the network.

The calculation of r(k)T can be done in two different ways, the first is by using direct method by inspecting the linear system property of the equation [8 pp. 71-74] and the second is by using iteration process (power method [9]), a common method in calculating ranking vector of web pages. For small network the first method is preferable because it is much faster than the second method. As the network size is getting bigger, only the second method is viable because the power method, like many other iterative methods, is matrix-free, so that no matrix manipulation is needed [8, pp. 40]. In this paper, however, the power method is used because we want to compare convergence property of proposed algorithm with convergence properties of PageRank and HITS.

To ensure the power method applied to matrix M converges to a positive and unique dominant eigenvector of M, two adjustments are required. The first is stochasticity adjustment, normalize all nonzero rows of M and then fill all zero rows with 1×N positive real vectors that have 1-norm equals to one. Usually, every entry of these vectors is set to 1/N.

Let eT is a 1×N row vector where every entry is one and c is a N×1 column vector which its ith row is set to 1 if row i of M is zero row, and 0 otherwise. Then stochastic version of matrix M is: S = M + (1/N)ceT. And the second, primitivity adjustment is done by replacing all zero entries of S with a small positive number, P = αS + (1/N)(1-α)eeT, where α (0.0 < α < 1.0, usually set near to 1.0) is a parameter that controls the amount of error (eeT) introduced to S. Thus, eq. (2) can be written as:

Prr TkTk )()1( =+ (6)

for initial condition r(0)T = (1/N)eT, until error of the process ||r(k+1)T - r(k)T||1 is smaller than desired error. Note that instead of using 1-norm termination criterion, the comparison between previous rank and current rank order can also be used to terminate the iteration process [10], [11], [12], [13], [14], [15].

4. Numerical results

Because matrix P is stochastic and primitive, the

power method applied to it converges to a unique positive vector called stationary vector for any starting vector [8 pp. 36]. So the question left is ‘will it converge to something that makes sense in the context of measuring the degree of importance of nodes in trading network’. We try to answer this question by calculating the similarity between vector of proposed algorithm and standard measure, vector of total export and import.

The data used in the experiments is international trading data from United Nations [3], [4] where the nodes are the countries that involved in the export and import activities, and the links are the flow of the products (weights of the links are the price of the products). The performance of the proposed algorithm is measured by comparing the number of required iterations to achieve the same desired error to the results of HITS and PageRank (note that it is only used for performances comparison, not for results comparison). In the experiments, termination criterion is set to 10-8 and β is set to 0.5. The number of iterations is chosen instead of computational time because the sizes of trading networks are very small, so the power method applied to the data produces negligible computational times.

Then similarity measures, (1) cosine of the angle between vector of proposed algorithm (u) and vector of total export and import (v),

22

cosvuvu ⋅=θ (7)

and (2) Spearman rank order correlation coefficient,

( )( )1

)()(61

21

2

−−=∑

=

NN

vrurN

iii

ρ (8)

are used to measure ranking quality, where r(u) is ranking order of vector u. For example if u = [0.3397 0.1819 0.3328] then r(u) = [1 3 2]. Table 1 gives summary of the results.

123123

Page 5: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

Table 1. The performance of proposed algorithm.

Data Nodes NonzeroNumber of iteration

cos θ ρ HITS PageRank Proposed

algorithm Steel Products 97 2627 26 54 42 0.862 0.874 Ethylene 43 169 7 44 54 0.849 0.916 Propylene 38 144 10 40 143 0.974 0.905 Sodium 49 268 11 53 143 0.808 0.850 Hydrogen Peroxide 47 261 51 61 99 0.752 0.902 Carbon 51 535 22 37 65 0.912 0.929 Radio-active 53 717 25 23 26 0.884 0.927 Plastics 53 1410 20 37 39 0.985 0.968 Medicinal Products 53 1504 9 18 14 0.989 0.965 Average 54 848 20 41 69 0.891 0.915

Table 2. Top ten countries in hydrogen peroxide trading.

Ranked by total export and import (normalized) Ranked by link structure Overall ranking

Country Score Country Score Country Score Netherlands 0.132290 Japan 0.172970 Netherlands 0.123250 Canada 0.095014 Norway 0.123360 Japan 0.110820 United States 0.088694 Netherlands 0.114200 Canada 0.088638 Moldova 0.065088 Canada 0.082261 Norway 0.071148 Austria 0.059850 Turkey 0.053170 United States 0.067877 China 0.054194 United States 0.047059 Moldova 0.051716 Japan 0.048676 Rep. Korea 0.043684 China 0.045555 Italy 0.045744 Moldova 0.038344 Turkey 0.045261 Colombia 0.037772 China 0.036916 Austria 0.043115 Turkey 0.037353 Thailand 0.034545 Italy 0.033318

Table 3. Top ten countries in medicinal products trading.

Ranked by total export and import (normalized) Ranked by link structure Overall ranking

Country Score Country Score Country Score Germany 0.133530 Germany 0.139490 Germany 0.136510 United States 0.114520 United Kingdom 0.107270 United States 0.106520 United Kingdom 0.096001 United States 0.098509 United Kingdom 0.101640 France 0.092408 Switzerland 0.095938 Switzerland 0.089591 Switzerland 0.083244 France 0.085463 France 0.088935 Italy 0.067707 Italy 0.064711 Italy 0.066209 Belg-Luxemb. 0.056696 Belg-Luxemb. 0.051169 Belg-Luxemb. 0.053932 Netherlands 0.051564 Netherlands 0.047270 Netherlands 0.049417 Japan 0.049308 Ireland 0.043663 Japan 0.039471 Sweden 0.033573 Sweden 0.041134 Sweden 0.037353

As shown in table 1 the proposed algorithm takes more iterations to converge, but because trading networks are usually much smaller than WWW network, this will not become a problem. And the similarity measures in both criterions give promising results, 89.1% and 91.5% in average respectively.

Table 2 and 3 show lists of top ten countries in hydrogen peroxide trading (similarity measure 0.752, the least similar to standard measure in cosine criterion) and medicinal products trading (similarity measure 0.989, the most similar to standard measure in cosine criterion). Note that overall ranking is the combination of two other scores.

5. Network clustering

To cluster the most similar nodes using ranking algorithm, instead of utilizing SOM [3] that requires entries of characteristic vectors (rows of matrix C in eq. (16)) being mapped first into previously chosen fixed-value numbers (thus will introduce errors before map is constructed), and is also computationally expensive, a simpler method, called multidimensional scaling [16] that reduces dimension of the characteristic vectors into two-dimensional plane, where the distances between nodes in the this plane reflect the real distances in multidimensional space, is used.

124124

Page 6: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

The first step is to rewrite eq. (3) into HITS-like notation by separating authority vector from hub vector.

( )

⎟⎟⎠

⎞⎜⎜⎝

−++

=−

+

T

TTkTk

NN

eeωeLDKD

ha i

)/1)(1()/1(1

)()1(

αα

(9)

( )

⎟⎟⎠

⎞⎜⎜⎝

−++

=−−

++

T

TTTkTk

NN

eeσeLDDK

ah o

)/1)(1()/1(11

)1()1(

αα

(10)

where a is authority vector, h is hub vector, ω is a N×1 column vector which its ith row is set to 1 if node i has no outlink, 0 otherwise, and σ is a N×1 column vector which its ith row is set to 1 if node i has no inlink (note that the above equations are still maintaining the stochasticity and primitivity property).

By doing normalization for each iteration, the stochasticity and primitivity requirement can be relaxed, and the equations are simplified into:

( )LDKDha i

1)()1( −+ = TkTk (11)

( )TTkTk LDDKah o11)1()1( −−++ = (12)

Because vector a and h will be calculated for each product category, category index is given to the above equations.

( )iiiiTk

iTk

i LDDKha i1)()1( −+ = (13)

( )T

iiiiTk

iTk

i LDDKah o11)1()1( −−++ = (14)

where i = 1, 2, …, I is the category number, Li is N×N adjacency matrix for category i, Di, Doi, and Dii are D, Do, and Di (shown in eq. (4)) for category i, and ai and hi are authority and hub vector for category i.

The next step is to form N×2I characteristic matrix, C = [c1 c2 … cN]T, where row is the node and column is the product category.

[ ]IIII ahahahhaC |||| 112211 −−= (15)

In order to simplify the process of separating import

and export activities, eq. (15) can also be arranged into:

[ ]IIII hhhhaaaaC 121121 |part lexport/sel part import/buy

−−= (16)

Each row of matrix C (cn, n = 1, 2, …, N), defined as characteristic vector of node n, consists of node’s score

as an importer (authority score) and node’s score as an exporter (hub score) for each category i. Thus cn functions as a signature of node’s activities in trading networks.

To visualize the clustering in two-dimensional plane, first real distances between each pair of characteristic vectors are calculated and distance matrix D (entries of D will be used as the target distances) is defined with following equation.

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

=

×−

××

×××

00

00

0

1

232

13121

NN

N

N

d

ddddd

D (17)

where Nnmd nmnm ≤≤−=× ,1,

2cc

Then every node is placed randomly in two-

dimensional plane, and the distances between each pair of the nodes are calculated (defined as the current distances). For every pair of nodes, the current distance is compared to the target distance. If the current distance is smaller than the target distance, one of the nodes is moved a small amount further and if the current distance is bigger than the target distance, one of the nodes is moved a small amount closer. This procedure is repeated until error between the target distances and the current distances can not be reduced anymore, or other conditions to stop the iterations have been satisfied.

By using the same dataset, where there are 97 nodes (countries) and 9 product categories, we apply multidimensional scaling to plot the cluster in two-dimensional plane. Figure 4 shows the result. 6. Conclusions

Due to the different nature in links addition process between WWW network and trading network, ranking algorithms that created based on this process for WWW network, like PageRank and HITS, cannot be used to analyze trading network. Thus, a new algorithm that takes the differences into account must be developed. By considering resources exchange process in the network, we propose a new link structure ranking algorithm for trading network. The similarity measures between proposed algorithm results and standard measure, total export and import sorted in decreasing order (normalized), in cosine criterion and Spearman rank order correlation criterion, are promising.

125125

Page 7: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

Figure 4. Clustering result in two-dimensional plane.

ANGOLA SWITZERLAND

GERMANY ITALY NEW ZEALAND

BANGLADESH NIGERIA

IRELAND SOUTH AFRICA

OTHER OCEANIA

ESTONIA

PERU AUSTRALIA

PHILIPPINES

CZECH REP.

UNITED STATES

OTHER NORTH AMERICA

NETHERLANDS

BOSNIA AND HERZEGOVINA CHILE ARGENTINA

CHINA UZBEKISTAN PORTUGAL

SINGAPORE EGYPT

MOLDOVA REP. OF SAUDI ARABIA

OTHER MIDDLE EAST

OTHER AFRICA

OTHER CIS

AZERBAIJAN SLOVENIA

LATVIA OTHER CENTRAL AND SOUTH AMERICA ISRAEL KOREA REP. OF

ROMANIA

AUSTRIA LIBYA

JAPAN MEXICO

PAKISTAN

HONG KONG

COSTA RICA

GREECE

KAZAKHSTAN CYPRUS

CROATIA

INDIA

POLAND

JORDAN ALBANIA MACEDONIA

SPAIN

UNITED KINGDOM

MALAYSIA KENYA

OTHER FAR EAST

LITHUANIA

COLOMBIA IRAN

ALGERIA YUGOSLAVIA

IRAQ

COTE D’IVOIRE

TUNISIA

BELARUS INDONESIA

THAILAND CANADA

ICELAND GEORGIA ECUADOR

DOMINICAN REP.

UKRAINE MOROCCO

GUATEMALA

RUSSIA FED. KOREA DPR

DENMARK

FRANCE

PANAMA

FINLAND VENEZUELA BULGARIA HUNGARY

BELGIUM-LUXEMBORG

SYRIAN ARAB REP.

KUWAIT

SLOVAKIA NORWAY

BRAZIL

UNITED REP. OF TANZANIA

SWEDEN BOLIVIA

TURKEY

126126

Page 8: [IEEE 2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS) - Fukuoka, Japan (2009.03.16-2009.03.19)] 2009 International Conference on Complex,

And then by utilizing this algorithm, we describe a clustering method to group the most similar nodes on two-dimensional plane based on their similarities in characteristic vectors using multi-dimensional scaling. Because the distances between nodes in the plane reflect their similarities, this method can be used as a recommendation system to find the most similar nodes to the nodes under consideration. 7. References [1] Y. Kawachi, S. Yoshii, and M. Furukawa, “Labeled Link Analysis for Extracting User Characteristics in E-commerce Activity Network,” in Proc. IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, 2006, pp. 73—80. [2] T. Kohonen, Self-organizing Maps, Springer-Verlag, 1995. [3] Annual Bulletin of Statistics of World Trade in Steel 1998. Destination of Export of Steel Products (Total), United Nations, 1999. [4] Annual Bulletin of Trade in Chemical Products 1996, United Nations, 1996. [5] S. Brin and L. Page, “The Anatomy of a Large-scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems vol. 33, 1998, pp. 107—117. [6] S. Brin, L. Page, R. Motwami, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Technical Report 1999-0120, Computer Science Department, Stanford University, 1999. [7] J. Kleinberg, “Authoritative Sources in A Hyperlink Environment,” Journal of the ACM vol. 46, 1999, pp. 604—632.

[8] A.N. Langville and C.D. Meyer, Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006. [9] C.D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, 2000, pp. 534. [10] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar, “Rank Aggregation Methods for the Web,” in Proc. 10th International WWW Conference, New York, ACM Press, 2001. [11] R. Fagin, R. Kumar, K.S. McCurley, J. Novak, D. Sivakumar, J.A. Tomlin, and D.P. Williamson, “Searching the Workplace Web,” in Proc. 12th International WWW Conference, New York, ACM Press, 2003, pp. 366—75. [12] R. Fagin, R. Kumar, and D. Sivakumar, “Comparing Top k Lists,” in Proc. ACM SIAM Symposium on Discrete Algorithms, 1999, pp. 251—62 [13] T.H. Haveliwala, “Efficient Computation of PageRank,” Technical Report 1999-31, Computer Science Department, Stanford University, 1999. [14] T.H. Haveliwala, “Topic Sensitive PageRank,” in Proc. 11th International WWW Conference, New York, ACM Press, 2002. [15] A.O. Mendelzon and D. Rafiei, “An Autonomous Page Ranking Method for Metasearch Engine,” in Proc. 11th International WWW Conference, New York, ACM Press, 2002. [16] T. Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O’Reilly Media Inc., 2007, pp. 49—52.

127127