veljko milutinovic, frédéric patricelli, school of ...sugawara/pdf/kurihara-ssgrr2002.pdftelecom...

General ChairmanVeljko Milutinovic,

School of Electrical Engineering,University of Belgrade

Deputy General ChairmanFrédéric Patricelli,Telecom Italia Learning Services(Head of International Education)

Conferenceorganization staff:

ConferenceManagers:

Miodrag StefanovicCesira Verticchio

Conference Staff

Renato CiampaVeronica Ferrucci

Maria Rosaria FioriMaria Grazia

GuidoneNatasa KukuljBratislav Milic

Zaharije RadivojevicMilan Savic

These pages are optimizedfor

Internet Explorer 4+ orNetscape Navigator v4+

and resolution of 1024x768pixels

in high color.

Designed by SSGRR

SSGRR-2002s - Papers

1. .NET All New?

Jürgen Sellentin, Jochen Rütschlin

2. A center for Knowledge Factory Network Services (KoFNet) as a support to e-business

Giuseppe Visaggio, Piernicola Fiore

3. A concept-oriented math teaching and diagnosis system

Wei-Chang Shann, Peng-Chang Chen

4. A contradiction-free proof procedure with visualization for extended logic programs

Susumu Yamasaki, Mariko Sasakura

5. A Framework For Developing Emerging Information Technologies Strategic Plan

Amran Rasli

6. A Generic Approach to the Design of Linear Output Feedback Controllers

Yazdan Bavafa-Toosi, Ali Khaki-Sedigh

7. A Knowledge Management Framework for Integrated design

Niek du Preez, Bernard Katz

8. A Method Component Programming Tool with Object Databases

Masayoshi Aritsugi, Hidehisa Takamizawa, Yusuke Yoshida and Yoshinari Kanamori

9. A Model for Business Process Supporting Web Applications

Niko Kleiner, Joachim Herbst

10. A Natural Language Processor for Querying Cindi

Niculae Stratica, Leila Kosseim, Bipin C. Desai

11. A New Approach to the Construction of Parallel File Systems for Clusters

Felix Garcia, Alejandro Calderón, Jesús Carretero, Javier Fernández, Jose M. Perez

12. A New model of On-Line Learning

file:///F¦/papers.html (1/15)2004/03/22 13:16:33


Marjan Gusev, Ljupco N. Antovski, Vangel V. Ajanovski

13. A New Paradigm for Network Management: Business Driven Device Management

John Strassner

14. A Prototype of a Retail Internet Banking for Thai Customers

Rawin Raviwongse, Pornpriya Koedrabruen

15. A Reuse-Oriented Approach for the Construction of Hypermedia Applications

Naoufel Kraiem

16. A Scientific Paradigm On Image Processing痴 Lecture

Sar Sardy

17. A Theory of Programming for e-Science and Software Engineering

Juris Reinfelds

18. A video based laboratory on the Internet, and the experiences obtained with high-school teachers

Fernando Gamboa Rodríguez, J.L. Pérez Silva, F. Lara Rosano, A. Miranda Vitela, F. Cabiedes Contreras

19. Web Engineering: Methods and Tools for Education

George E. Cormack, G. Griffiths, B. D. Hebbron, M. A. Lockyer, B. J. Oates

20. Adding Security to Quality of Service Architectures

Stefan Lindskog, Erland Jonsson

21. Advanced Mobile Multipoint Rela-Time Military Conferencing System (AMMCS)

R. Sureswaran, A. Osman, M. S. Mushardin, M. Yusof, B. Husain

22. Advanced Optical Infrastructure for the Emerging Optical Internet Services

Marian Marciniak, Marian Kowalewski, Miroslaw Klinkowski

23. Agent-based Intelligent Clinical Information System

Il Kon Kim, Ji Hyun Yun, Sang Wook Lee, Hang Chan Kim

24. An approach for implementing Object Persistence in C++ using Broker

Kulathu Sarma

25. Digital Learning: Infrastructure and Web Culture

Alexei L. Semenov

26. An Efficient and Adaptive Method for Reservation of Multiple Multicast Trees



Shiniji Inoue, Makoto Amamiya, Kasuga-shi, Yoshiaki Kakuda

27. Distributed Information Systems BuildingTechniques

Petr Smol勛, Tom癩 Hru嗅a

28. An eye-gaze input device for people with severe motor disabilities

Laura Farinetti, Fulvio Corno

29. Asia-Pacific and Australian Grid developments, the coming information Grid and the iSpace

Bernard A. Pailthorpe, Nicole S. Bordes

30. Askemos - a distributed settlement

J•g F. Wittenberger

31. Automatic Determination of Cluster Size Using Machine Learning Algorithm

Jihoon Yang, Sung-Hae Jun, Kyung-Whan Oh

32. B2E Business to Employee and Business to Everything

Prashant Killipara

33. Business Methods: Patent Practice at the European Patent Office

J•g Machek

34. CAL-Visual, an E-Education Tool for the Management of Digital Resources

Dino Bouchlaghem

35. Casual databases

Ken Roger Riggs

36. CLINICAL ANTHROPOLOGY - A NEW EDUCATIONAL METHOD FOR ETHICS AND HUMANITY USING INTERNET

Shinichi Shoji

37. Communication Behavior and Collaboration in Virtual Seminars - Experiences

Birgit Feldmann, G. Schlageter

38. DEC (Decision Making in Business Environment) Computer Age of business (Admiration or Admission?)

Nanayaa Owusu Prempeh

39. Symbolic Computation in Research

Qi Zheng

40. Computer 邦ediated Communication (CMC): A Shift Towards E-Education Systems In Malaysia

Rozhan M Idrus



41. Computer Simulation Of Combustion in Particulate 2-Phase Flow

Aleksandar Saljnikov, Simeon Oka, Elmira Karbozova, Miroslav Sijercic

42. COMPUTING OUR WAY TO THE ULTIMATE TOY

Kristinn R. Thorisson

43. Cooperative Learning: Multi-user Support for Advanced Distance Learning Services

Andrea B•

44. Coordinating Representations for Collaborative Systems

Richard Alterman, Alex Feinman, Seth Landsman, Josh Introne

45. Coordinatized Graphs: Interplay Between Graphical Properties and Adjacency Systems

Andrew Woldar

46. Cryptographic Schemes in Secure e-Course eXchange (eCX) for e-Course Workflow

Lucas C. K. Hui, Joe C.K. Yau

47. Data Mining from a Web Browser

David J. Haglin, Richard J. Roiger

48. Design and implementation of an adaptive learning management system

Giorgio Casadei, Matteo Magnani

49. E-learning, Metacognition and Visual design

David Kirsh

50. Development of Computer-Based Activities for Peer-Led Team Learning in University-Level General Chemistry

John Goodwin

51. Differentiated Multilayer Resilience in IP over Optical Networks

Achim Autenrieth

52. Distributed Medical Intelligence via Broadband Communication Networks

Constantinos Makropoulos

53. Multimedia based Learning and Working: a Cooperation of University with Industry

Peter Deussen, Hartmut Ehrich, Tim Young Weisssch臈el, Christian Zorn

54. Document Ontology: A Statistical Approach

Sadanand Srivastava, James Gil de Lamadrid, Chakravarthi S. Velvadapu



55. Does Attentional Load Affect Discourse Management in On-Line Communication?

Claude G. Cech, Sherri L. Condon

56. E-Book, an e-learning tool for Engineering Undergraduates

Eduardo Gomez-Ramirez

57. E-Business Management and Workflow Technologies

Zeljko Djuricic, Natasa Ilic, Zeljko Djuricic, Veljko Milutinovic

58. Economic Decision-making in a Technological Age

James R. Forcier

59. Complexity and the Emergent Web

Sorin Solomon, Eran Shir

60. E-Diagnosis Using GeneChip Technologies

Zhao Lue-Ping, S. Gilbert, C. Defty

61. e-DOCPROS: An e-Business Document Processing System

Zhenfu Cheng, Xuhong Li

62. Effects of Changing the Pedagogical Concept of a Part-time Bachelor of Science in Accounting from Traditional Lectures into an IT-supported Asynchronous and Flexible Teaching & Learning Concept

Lars Kiertzner, Maya Dole, Tage Rasmussen

63. e-Infrastructure in a complex environment

Julian Smith

64. E-learning at ENSAIT: a case study

Pierre Douillet, S. Pessé, A. M. Jolly

65. E-Learning Content Creation with MPEG-4

Michael Stepping

66. E-Learning of Spanish with Interactive Video and Blackboard Technologies for Elementary School Children

Julia Coll

67. EMERGENCY! Medicine and Modern Education Technology

Dag K.J. E. von Lubitz, Benjamin Carrasco, Francesco Gabbrielli, Frederic Patricelli, Tymoty Pletcher, Caleb Poirier, Simon Richir

68. e-Medicine Utilization: Socio-cultural issues



Robert Doktor, David Bangert

69. Emerging market mechanisms in Business-to-Business E Commerce A framework

B. Mahadevan

70. Enhanced Security Watermarking and Authentication based on Watermark Semantics

Dimitrios Koukopoulos, Y. C. Stamatiou

71. Environment for Teaching Support in the Medical Area

Rosa Maria Vicari, Cecilia Dias Flores, Louise Seixas, André Silvestre

72. Epidemic Communication Mechanisms in Distributed Computing

Oznur Ozkasap

73. Evaluating Java Applets for Teaching on the Internet

Michael R. Healy, Dale E. Berger, Victoria L. Romero, Amanda Saw

74. Evaluating network intrusion detection algorithm performance as attack complexity increases

Dirk Ourston, Bryan Hopkins, Sara Matzner, William Stump

75. Evaluating the Quality of Service for a Satellite Based Content Delivery Network

Helmut Hlavacs, Guido Aschenbrenner, Ewald Hotop, Aadarsh Baijal, Ashish Garg

76. Evaluation and perspectives of innovative Tunisian e-learning experimentation

Mohamed Jemni, Henda Chorfi

77. Evaluation of Minimal Deterministic Routing in Irregular Networks

Tor Skeie, Ingebjørg Theiss, Olav Lysne

78. Evolution and Convergence in Telecommunications

Gennady G. Yanovsky

79. Extending SOAP for handling lighweight transactional information

Mario Jeckle

80. Extending the Personal Response System (PRS) to Further Enhance Student Learning

Joan Wines, Julius Bianchi

81. Federated Profile Information Architecture

Guoping Jia

82. Fusion of Multiple Images with Robust Random Field Models



Kie B. Eom

83. e-Business MUSIC: New Ways to Perform Introspection Within the Corporation

Enrique Espinosa

84. Generating color palettes for compressed video sequences

Yuk-Hee Chan, Wan-Fung Cheung

85. Genetic Algorithms for Internet Search: Examining the Sensitivity of Internet Search by Varying the Relevant Components of Genetic Algorithm

Vesna 各嗽m, Dragana Cvetkovic

86. GroupIntelligence: Automated Support for Capitalising On Group Knowledge

Jules de Waart, Michiel van Genuchten

87. From Innovators to Laggards: Computer Scientists and E-learning

Roger Boyle, Martyn Clark

88. How to find similar web sites by using only link information

Satoshi Kurihara, Toshio Hirotsu, Toshihiro Takada, Osamu Akashi, Toshiharu Sugawara

89. Hardware RAID - 5 versus Non-RAID solution under UNIX Operating System

Borislav Djordjevic, Stanislav Miskovic, Nemanja Jovanovic, Veljko Milutinovic

90. Identity Management: a Key e-Business Enabler

Marco Casassa Mont, Pete Bramhall, Mickey Gittler, Joe Pato, Owen Rees

91. Impacts of the Global Information Society on the Banking Industry

Ondrej Slapak

92. Implementation of a remote-assistant application via Web over IP networks: CIMA Project

Francisco Sandoval, Francisco Javier González Cañete, Francisco Miguel García Palomo, Eduardo Casilari Pérez

93. Implementation of Feedback TM an Application for Quality Assurance, Learning and e-Communication of Diagnosis of Medical Images

M. Bergquist, H. Gater, O. Flodmark, J. Hedin, S. Hedin, M. Hellström, B. Jacobson, B. Johansson, N. Lundberg, K. Måre, J. Wallberg, P. Wenngren

94. The social infrastructure of E-Education

Peter Lyman



95. Information Publishing on FRIENDS

Alfons H. Salden, Ronald J. van Eijk, Mortaza S. Bargh, Johan de Heer

96. Infrastructure For E-Business, E-Education, E-Science, and E-Medicine; Challenges For Developing economies. The Nigerian Experience

Babatunde O.R. Ogundele

97. Observations and prescriptions for Web Standards

David Bodoff, Mordechai Ben-Mehachem

98. Infrastructure in Education time to learn lessons from elsewhere?

Tony Shaw

99. Infrastructure, requirements and applications for eScience: a European perspective

Ron Perrott

100. Infrastructures for Mobile Services in e-Medicine

Heinz Thielmann

101. SEMPEL: A Software Engineering Milieu for PEer-Learning

V.Lakshmi Narasimhan

102. Integrating Emerging E-Technologies into Traditional Classroom Settings

Jay M. Lightfoot

103. Integrating the Teaching of Psychology with Web-Based Distant Learning: Practicum and Internship

V. Wayne Leaver

104. Interconnecting Networks and the Performance of Multithreaded Mutiprocessors

Wlodek Zuberek

105. INTERNET PRIVACY CONCERNS AND TRADE-OFF FACTORS EMPIRICAL STUDY AND BUSINESS IMPLICATIONS

Tamara Dinev, Paul Hart

106. Internet: A Powerful Tool in Disseminating Medical Knowledge in Urban and Rural India

Deena Suresh, Dr. CB Sridhar

107. Introduction of Information Infrastructure for Medical Academic Activities in Japan - UMIN and MINCS-UH

Takahiro Kiuchi

108. Learning and Networking Concepts and Components of the Global Seminar

Dean Sutphin



109. Learning Objects -Pedagogy Based Structuring of Course Materials

Paul Juell, Elizabeth Smith, Lisa Daniels, Vijayakumar Shanmugasundaram

110. Living Book: an Interactive and Personalized Book

Margret Gross-Hardt, Peter Baumgartner, Anna B. Simon

111. Local telematics services for higher education

Joze Rugelj

112. Maestro: A Middleware for Distributed applications based in components software

Jorge Risco Becerra

113. Magenta Multi-Agent Engines For Decision-Making Support

Peter Skobelev, V. Andrejev, S. Batishchev, K. Ivkushkin, I. Minakov, G. Rzevski, A. Safronov

114. Mapping Object Oriented Models into Relational Models: a formal approach

Pedro Ramos, Luís Rio

115. Marketing and Engineering Criteria for the Implementation of a top level Tnternet Infrastructure

Enrique S. Draier

116. Marrying Sanskrit to Java - an e-tutor for Sanskrit

Sudhir Kaicker, Jayant Shekhar

117. m-commerce: why it does not fly (yet?)

Peter Langendoerfer

118. Measurement Technique for Object Oriented Systems

Sallie Henry, Cary Long

119. Measuring the Effectiveness of Internal Electronic Communication Channels in Achieving Business Goals

Angela Sinickas

120. Medical eLearning, eTraining and Interactive Telemedicine via Satellite in the operating room of the future

G. Graschew, T. A. Roelofs, S. Rakowsky, P. M. Schlag

121. Meta-Learning Functionality in eLearning Systems

Ulrik Schroeder

122. Mobile Commerce: Some Extensions of Core Concepts and Key Issues

Christer Carlsson, Pirkko Walden



123. Models for E-Learning environment evaluation: a proposal

Francesco Colace

124. Multidrop Generic Framing Procedure (GFP-MD)

Kari Sepp舅en

125. Multi-grid Parallel Algorithm with Virtual Boundary Forecast for Solving 2D Transient Equation

Guo Qingping, Yakup Paker, Dennis Parkinson, Wei Jialin, Zhang Sheng

126. Multilingual Multimedia Electronic Dictionary for Children

Valentin E. Brimkov, Reneta P. Barneva, Peter L. Stanchev

127. Mutual indexing of video and bulletin board for lecture video

Hirohide Haga

128. Navigation Support System for Live e-CRM

Hideto Ikeda, Nikolaos Vogiatzis, Aki Shibuya

129. New media - traditional universities: Success factors and obstacles for e-learning technologies

Georg J. Anker, Yuka Sasaki

130. Object Oriented Communication Design Tool usable for Everyone

Hajime Nonogaki

131. On the Viability on E-learning

Sunil Choenni

132. Optimal Link Allocation and Charging Model

Jyrki Joutsensalo, Timo Hamalainen

133. Oral Metaphor Construct. New direction in cognitive linguistics

Asa Stepak

134. Our Progress Into E-Business Education: How we have Incorporated Higher-Order Thinking Skills into our Web-based class

Rexford H. Draman, Robin Eanes

135. Overview of the role of ATM/AAL2 Aggregator in UMTS Access Network

Aleksandar D. Petrovic

136. Parallel Solutions of Coupled Problems

Felicja Okulicka Dluzewska



137. Practical Traffic Grooming Formulation for SONET/WDM Rings

Paul Ghobril

138. Privacy Issues Arising from a Smart_ID Application in eHealth

John Fulcher

139. The Brave New World of the Cyber Speech and Hearing Clinic: Treatment Possibilities

William R. Culbertson, Dennis C. Tanner

140. GIS and DGPS via Web: the GIS on line of the Everest National Park

Giorgio Vassena, Roberto Cantoni, Carlo Lanzi, Giuseppe Stefini

141. Holarchies on The Internet: Enabling Global Collaboration

Mihaela Ulieru

142. Purdue Center for Technology Roadmapping: A Resource for Research and Education in Technology Roadmapping

Edward J. Coyle

143. QUALITY MANAGEMENT SYSTEM BASED ON THE NATIONAL TRAUMA REGISTRY

Drago Brilej, Radko Komadina

144. Rationales for Consumer Adoption or Rejection of E-Commerce: Exploring the Impact of Product Characteristics

Bill Anckar

145. Some remarks on time modelling in interactive computing systems

Merik Meriste, Leo Motus

146. Schema Validation Applied to Native XML Databases

Gongzhu Hu, Qinglan Li

147. Search and Discover on the Web

Bipin C. Desai

148. Server Load Balancing in the Next Generation Internet

Jamalul-lail Abdul Manan, Habibah Hashim

149. Service Oriented Community Systems for Mobile Commerce

Kinji Mori

150. SIP and the Internet

Gianni Scandroglio



151. Socrates Meets The Web: Incorporating the Internet Into U.S. Law Classes

Anna Williams Shavers

152. Software Issues for Applying Conversation Theory For Effective Collaboration Via the Internet

William Klemm

153. Solving scaling problems with the modern GUI

Peter M. Bagnall

154. Some like it soft

Ole Lauridsen

155. A Monitoring System for Manufacturing Machines Based on SNMP

GSangyong Lee, Joongsoon Jang, Gihyun Jung, Kyunghee Choia

156. Enhancing IDS performance through dropping hacking-free packets

Jongwook Moon, Jongsu Kim, Gihyun Jung, Kangbin Yim, Kyunghee Choi, Haiyoung Yoo

157. Analysis on Utilization and Delay of Memory in a Lossless Packet Processing System

Jongsu Kim, Jongwook Moon, Gihyun Jung, Kangbin Yim, Kyunghee Choi, Joongsoon Jang

158. Protecting Mail Server using the CBT algorithm

Hyun-Suk Lee, Soo-Juong Lee, Hui-Sug Jung, Gihyun Jung, Kyunghee Choi

159. Storage Technologies for an Efficient e-Infrastructure

Satish Rege

160. Structured Metadata Analysis

Steve Probets

161. Superscalar in City-1: An Educational Guide to the next step beyond Pipelining

Ryuichi Takahashi, Noriyoshi Yoshida

162. Supervision of Electrical Utility Works Based on Internet

Felipe Alaniz, Pablo R. de Buen

163. Teaching Novices Programming Skills Efficiently: What, When and How?

Yuh-Huei Shyu

164. Teaching, Technology and Teamwork

Elaine Carbone, Shaun Stemmler, Jon Beal



165. Software solutions for Science e-Education: A case study from the VISIT Project

Yichun Xie

166. Technologies for Student-Generated Work in a Peer-Led, Peer-Review Instructional Environment

Brian P. Coppola, Ian C. Stewart

167. TEN WAYS TO IMPACT THE WEB WITHOUT A WEB MEISTER

Ken McNaughton

168. The Architecture of Knowledge: Representation and Theorization of Violence on the Internet

Lily Alexander

169. The emergence a of web-mediated genres: the home page

Anne Ellerup Nielsen

170. The Emerging Autosophy Internet

Klaus Holtz, Eric Holtz

171. The Future of Education

Lalita Rajasingham

172. The impact of internet technologies on the financial markets

Ross A. Lumley

173. The Mathematical Structure model of a Word-unit-based Program

Hamid Fujita, Osamu Arai

174. The Role of XML in E-Business

Betty Harvey

175. Think before you click customerschallenges in the e-commerce

Zita Zoltay Paprika

176. Topological Design of Multiple VPNs over MPLS Network

Anotai Srikitja, David Tipper

177. Toward logical-probabilistic modeling of complex systems

Taisuke Sato, Yoichi Motomura

178. Towards to e-transport

Miroslav Svítek, Mirko Nov疚

179. Emergence and Evolution of Microturbine Generators [MTGs] to Provide Infrastructure for E-Related Applications



Stephanie L. Hamilton

180. Using Building Blocks to Implement a Business-to-Supplier Portal

Shannon Fowler

181. Using CORBA Interceptors to Implement a Security Wrapper

Luigi Romano, D. Cotroneo, A. Mazzeo, S. Russo

182. Using Internet and Database Technology to Enable Collaboration between Researchers and Teachers Developing Educational Websites Featuring Endangered Species Research and Conservation

Mary A. Overby, Mark MacAllister, Jeffrey Hoffman, Chris Bulla

183. Using the Quick Look Methodology to Plan and Implement Complex Information Technology Transformations

Richard C. Staats

184. Verifying and Leveraging Software Frameworks

Trent Larson

185. Virtual Communities for Service Delivery: Transferring the Notion of Pro-Social Behavior from 撤laceto 鉄pace"

Ko de Ruyter, Caroline Wiertz, Sandra Streukens

186. Visualizing Molecules Helps Students 'See' Chemistry in a New Light

Harry Ungar, Albion Baucom

187. Wavelet-based Blind Watermark Embedding Technique

Sanghyun Joo, Yongseok Seo, Youngho Suh

188. Web-based Tools for Supporting Health Education

William B. Hansen

189. What can you do with a frozen leg of lamb? - Connecting products and information services in a web-based environment

Benkt Wangler, Ingi Jonasson, Eva Söderström

190. What is Virtual about the Web?

Murat Karamuftuoglu

191. Wireless Control of the Virtual Kiosk

Charles A. Milligan, Steven H. McCown

192. World-wide interaction with 3D-data

Gerd Kaupp, Svetlana Stepanenko, Andreas Herrmann



193. XML Technologies, Value-based Marketing, Franchising and the New Paradigm in Business

Dino Karabeg

194. Electronic Public Transmission Act of 2002 to cope with the Convergence and as the Minimum Regulations on the Internet

Koichiro Hayashi

195. Domain-Specific Language Agents

Merik Meriste, Jüri Helekivi, Tõnis Kelder, Leo Motus

196. Development of Distributed Package of Finite Element Method

F. Okulicka-Dluzewska, J.M. Dluzewski

197. Feasibility Study and Strategic Business Analysis of Fuel Ethanol Production in Indiana (May 2002)

Dusan V. Milutinovic

198. Virtual Marketplace on the Internet (May 2002)

Zaharije R. Radivojevic, Živoslav Adamovic, Veljko M. Milutinovic

199. Dissemination Of World Health Organisation Reproductive Health Library (WHO-RHL)Information To Doctors In IndiaRECON HEALTHCARE, BANGALORE MODEL

CB Sridhar, Deena Suresh

200. Technological High School Education through Internet

L. P駻ez Silva, F. Cabiedes Contreras, F. Gamboa Rodr刕uez, F. Lara Rosano, A. Viniegra Hern疣dez

201. Agent-based brokerage of personalised B2B mobile services

Alfons H. Salden, Ronald J. van Eijk, Mortaza S. Bargh, Johan de Heer

202. Denial of Service Attacks: methods, tools, defenses

Fred Darnell, Bratislav Milic, Milan Savic, Veljko Milutinovic

203. Scalability and Knowledge Reusability in Ontology Modeling

Mustafa Jarrar, Robert Meersman

204. (Web) Self Service

Tanja MILOŠEVIC


1

How to find similar web sites by using only link information

Satoshi Kurihara, Toshio Hirotsu, Toshihiro Takada, Osamu Akashi, and Toshiharu Sugawara

NTT Network Innovation Labs. 3-9-11, Midori-cho, Musashino-shi, Tokyo, 180-8585

JAPAN [email protected]

Abstract—We are studying techniques that allow even ordinary end users to make efficient use of the Internet. We previously proposed an algorithm for determining the degree of similarity between web sites by using link information to find web sites that are mirrors of each other and ones that are not mirrors but have similar content and can be used as substitutes for each other. As a result of verifying the basic effectiveness of that algorithm, we found that when trying to find similar web sites to site-A, in addition to ones found to have almost 100% similarity to site-A, there were also ones that were thoroughly adequate for use as substitutes for site-A, even though they had a low degree of similarity of 50% or less. Therefore, for practical use of that algorithm, it is essential to be able to automatically judge whether web sites that can be inferred to have some kind of similarity are actually mirror sites or similar sites that can be used as substitutes. To solve this problem, in this paper, we propose and evaluate the basic effectiveness of an automatic judgment methodology, and we focus on its operation and propose a methodology for effectively finding candidates for a similar site by using a user’s Internet access history. Index Terms—Internet, Mirror site, Access history, Link information

I. INTRODUCTION

Due to the rapid expansion of the Internet, it has become possible for ordinary end users to obtain many kinds of information easily. However, it is still difficult for them to use the network effectively. For example, although mirror servers and cache servers have been provided in order to improve scalability and response times, it is difficult for users to identify the optimal server.

To solve this problem we have already proposed a “URL Resolver” framework, which allows users to select the optimal server from multiple servers that provide various kinds of services via data storage facilities such as caches or mirror servers [1]. To enable users to select one of the servers, it is first necessary to gather information such as a list of servers that might be useful to the user. Initially, we focused on information related to mirror sites or

similar sites, and have already proposed a basic algorithm for detecting similar web sites by focusing on the link information embedded in web pages [2]. As a result of verifying the basic effectiveness of that algorithm, we then found that there are some sites that are thoroughly adequate for use as substitutes yet have a degree of similarity of no more than 50%. But to use this detection method in practice, it is necessary to employ a mechanism for automatically judging whether sites for which a low degree of similarity has been detected are mirrors or similar sites that can actually be used instead of mirrors. So, in this paper, we propose an automatic determination algorithm, in which web pages are divided into hub-type and content-type and the judgment is done based on the results of judgment algorithms specifically tailored to each type of web page. Initial trials of this approach have yielded favorable detection results. We also examine the operation of this detection methodology and propose an algorithm for effectively finding candidates for similar web sites by using the user’s access history to the Internet.

Section 2 reviews our previously proposed algorithm for detecting similar web sites based on link information and describes our new automatic similar web site detection method. Section 3 discusses a similar-web-site candidate finding method.

II. USING LINK INFORMATION TO FIND SIMILAR

WEB SITES

A. HOW TO FIND A SIMILAR WEB SITE? We define a mirror site as follows:

This definition is based on the observation that mirror sites or sites that hold information that is so similar that they closely resemble mirror sites should more or less match in terms of the number and types of links embedded within them, even if there are slight differences

If the link structure of site-A is very similar to that of site-B, then sites-A and -B are mirrors of each other.

2

such as inserted advertising banners (This definition is based on [3]).

The detection method we proposed in reference [2] is as follows. Assume starting site-A and mirror candidate site-B. The degree of similarity between these two sites is as follows. The total number of embedded inward links that can be gathered when tracing the links of web pages to a depth of N levels from the top web page of site-A is referred to as url (Ain)N, and the total number of embedded outward links is referred to as url(Aout)N. Here, an inward link is one in which the host part of the link’s destination URL is the same as that of the current host, and an outward link is one link in which the host part of the link destination is different. The corresponding properties of site-B are similarly expressed as url(Bin)N and url(Bout)N. Then, the total number of inward links in url(Bin)N that are also included in url(Ain)N is expressed as url(Ain ∩ Bin)N, while the corresponding property of the outward links is expressed as url(Aout ∩ Bout)N. At this point, when determining the value of url(Ain ∩ Bin)N, the comparisons are made after replacing the host parts of site-A and -B with the same arbitrary text strings. So, the degree of similarity between site-A and -B when links are followed to a depth of N levels is denoted by the symbol α , which is given by

α = url(Ain ∩ Bin)N + url(Aout ∩ Bout)N

url(Ain)N + url(Aout)N × 100 (%).

Since this procedure only compares the link structures, it does not perform a text-level comparison of every character in every word on the web pages. The reasons for adopting this approach are as follows:

(1) As mentioned above, we think it is possible to judge the similarity of hypertext documents such as web pages by comparing only their link structures. The practicality of focusing on the link structure in web pages in also highlighted in other studies [4] and [5].

(2) In this procedure, although a lot of processing time is taken up by gathering web pages, the amount of text to be compared also increases substantially when links are followed to a depth of several levels and much more time is required for a comparing text than for comparing just the links. Besides, the gathering work can be speeded up by increasing the network bandwidth, so we decided not to compare the processing times required for a text-level comparison.

(3) We are planning to use this similar-web-site finding method even in environments with limited processing resources, such as users’ notebook PCs. Therefore, considering the storage of information obtained when calculating the degree of similarity, performing a text-level comparison would require all the text information to be stored, which would be a waste of resources. The link information takes up considerably less space than the text information, which is another

reason why we decided not to perform a text level comparison.

Next, we discuss the way to find candidates for mirror sites. Using a web robot to recursively access suitable sites indiscriminately and compare them with site-A to

find a candidate for site-B would be far too inefficient, so instead we adopted the following strategy:

For the actual trials, we extracted 1000 URLs entries from the access log stored in a proxy server used by our organization of about 200 people, and applied our similar-site detection program to each one (See α in Figure. 1). Similar site candidates were detected for 65% of these. It is interesting to note that we found that many sites can be used as substitutes, even though their degrees of similarity were less than 50% (see Table 1). So, if a way can be found to automatically judge whether or not they are actually capable of being used as substitutes, then it should be possible to present a greater number of sites to the users in addition to the sites having a high degree of similarity for which judgment is unnecessary.

Fig. 1: Detection results

Since web sites that employ mirror servers do so withthe aim of dispersing the load, they will probably wantto provide users accessing the site with informationabout these mirrors. In other words, it is reasonable toassume that the site will make some mention of whereits mirrors can be found. Accordingly, it is highlylikely that site-B can be found by gathering andanalyzing the content accessible within a certainnumber of link levels from the top page of site-A. It isalso highly likely that the web site will contain links tosites of a similar nature, so there should be a highlikelihood of being able to find similar sites bychecking the link structure.

3

Degree of similarity

Strength of relationship between two sites

0%–10% Probably unrelated

10%–60% May include some sites of a similar nature

60%–90% Either a mirror site or a site that is highly similar

90%–100% Almost certainly a mirror site

Table 1: Results of classifying detected sites

B. SITES CAPABLE OF BEING USED AS SUBSTITUTES

Thus, we propose the following detection method. First, we divide web pages into the following two broad categories according to the style of user access.

• Web pages that are accessed as a starting point for net surfing are called hub-type sites.

• Web pages that are accessed in order to view the content on the page itself are called content-type sites.

Then, by considering the conditions of sites that can be considered as substitutes for hub- and content-type sites, respectively, we propose the following degree-of-similarity calculation methods.

1) METHODOLOGY FOR JUDGING A HUB-TYPE SITE

For example, consider hub-type sites-A and –B. If site-A has many embedded external links that are the same as those in site-B, then it is highly likely that the user will be able to use both sites equally.

That is, site-A and the possible substitute candidate site-B are deemed to have a greater degree of similarity with respect to their outward links if they satisfy the condition

α < β , (1)

where α is as defined in Section 2.1 and

β = url(Aout ∩ Bout)N

url(Aout)N x 100(%)

is the degree of similarity related to outward links only, from which it can be inferred that site-B is highly likely to be suitable for use as a substitute hub-type site. 2) METHODOLOGY FOR JUDGING A CONTENT-TYPE SITE

On the other hand, in a content-type site we think that there may be some differences in the page structure such

as the way of embedding links, even if a site is capable of being used as a substitute. To deal with this, in addition to the links, we also use the label strings of links as important elements expressing the attributes of the links, and add to the calculation of the degree of similarity as follows: we use these labels corresponding to the text enclosed within the links; e.g., the text string “XXXXXX” in the link “<A href="url">XXXXXX</A>”, and the text string “YYYYY” in the link “<Ahref="url"><IMG src="url" alt="YYYYY"> </A>”.

We decided to rate these links by scoring them according to the length of the text strings “XXXXXX” and “YYYYY” embedded in their labels when a match is found between a pair of labels. Note that the calculation is performed using only the text string “XXXXXX” for links where the text strings “XXXXXX” and “YYYYY” match.

In content-type sites like a news site, the headline of the article is usually used as a link label, and the string length of the headline is usually longer than that of a link label whose reference address is another Web site.

An example of a link in which the alt option is set is as follows: <a href = http://www.apple.com/store/ > < imgsrc = "http://a772.g.ak···/2.gif" width = "84"height="42" alt="The Apple Store." Border ="0"></A>

In the algorithm in [2], the degree of similarity was calculated using only the text string of the URL part “http://www.apple.com/store”, but here in addition to this, the link label “The Apple Store.” is also included in the calculation. The labels are scored according to the following rules:

(1) When the URL parts and label parts both match, the link is awarded a score corresponding to the number of characters in the label.

(2) When the URL parts match but the label parts are different, the link is awarded a score of 70% of the number of characters in the starting link label.

(3) When the label parts match but the URL parts are different, the link is awarded a score of 50% of the number of characters in the starting link label.

(4) When both parts are different, the link is awarded no score.

These scoring settings are made based on experience, and further study is required to investigate their validity. Also, for rule (3), since we focus on the similarity of the link structure, it could conceivably be wrong to consider cases where the URL parts are different. However, in the case of content-type sites, since we concentrate on the label parts, we decided to include cases where the URL parts are different in the detection by reducing the score awarded. In the above example of “The Apple Store.”, if a link is detected whose URL parts and label parts are identical when matching the link with the mirror candidate site,

4

then this link is awarded 16 points (the number of characters in “The Apple Store.”). In calculating the number of characters in a label, all single-byte and double-byte characters (including English and Japanese characters, spaces, and so on) are each counted as one character.

The value of urllabel(Ain)N for the starting site is given by the sum of the number of characters in the labels added to each link in url(Ain)N, and is awarded the maximum possible score when matching is performed with an identical mirror site. The value of urllabel(Bin)N for a mirror candidate site-B is defined in the same way. Furthermore, the value of urllabel(Ain ∩ Bin)N is given by the sum of the scores awarded for matching combinations of the abovementioned URL parts and label parts in each respective label. If the degree of similarity between site-A and -B in terms of inward links is given by

γ = urllabel(Ain ∩ Bin)N

urllabel(Ain)N x 100(%) , then if

α <γ , (2)

it is judged to be likely that the mirror candidate can be used as a substitute for a content-type site.

When the number of links used as the denominator of α or β or γ, when judging Equations (1) and (2), is small (in the current version, less than 10), the degree of similarity is recalculated by following links to a greater depth. Unfortunately, this procedure is unable to detect sites that have a mirror relationship but have differences between both the link structure and the labels added to the links. However, it is doubtful whether many sites of this sort actually exist.

At present, the following simple techniques are used to select a URL from the list of URLs that are possibly mirrors: (1) select the site with the highest likelihood and (2) select the candidate with the highest transfer rate at the time of retrieval in the case of multiple candidates by phase (1). In the future, we also plan to make use of the transfer rate at the time of user access and feedback data from users, etc. Here we note that in terms of user access, the process of indicating the most suitable URL requires real-time properties unlike the mirror search process. In relation to the above, we are investigating using a technique that can flexibly select optimal strategies for selecting a URL at any time through an algorithm that executes multiple strategies in parallel [9]. For example, when a user is the first to access a certain URL, there is no time available that for measuring the transfer rate, and

the most suitable URL is selected on the basis of information from the mirror information managing agent. On the other hand, there may be a small amount of time available up until the user clicks an anchor within that URL, and if such is the case, it might be possible to select the most suitable URL according to new transfer rate information. If this can be accomplished, access will be forcibly changed to the most suitable URL when the user clicks the anchor. C. INITIAL TRIALS

As mentioned in Section 2.1, Fig. 1 shows the results of calculating β and γ for 1000 URLs. About 18% of the URLs were classified as hub-type sites with a degree of similarity of 30% or more, and about 6% of them were classified as content-type sites. Both these include URLs that were detected as hub-type and content-type pairs respectively. We then manually checked 50 sites for which either β or γ was 30% or more from among the sites classified as hub- or content-type sites, and found that all of them were indeed suitable for use as substitutes for these hub- and content-type sites. In future we plan to perform verifications with a greater number of access logs and to investigate and examine the reliability of the degree of similarity calculations.

III. USING USER’S ACCESS HISTORY

With the procedure in Section 2.1, we were able to detect similar sites by using an access log as a starting point. However, it will be more effective if it is possible to detect similar sites that are significant for each individual end user. Of course, facilities such as proxy servers contain access logs that reflect the character of the community that uses them, and can themselves be thought of as candidates for similar web sites that may match the users’ preferences. However, in the current version, we can only find mirror or similar sites that are limited to the range of sites traced from the starting host. That is, we do not evaluate the degree of similarity between different hosts in the access log. This is because it would lead to a combinatorial explosion and we judge it to be inefficient. However, it is clear that it is highly effective to detect similar sites including hosts that cannot be reached from the detection origin host, not just the hosts recorded in the access log. Therefore, we propose a method for retrieving mirror or similar sites by using the users’ access history to filter similar site candidates.

5

Figure 2 shows excerpts from web pages related to the same content (new digital camera products) in four web sites. On finding an article about a new product on one web site, many people (certainly the authors do) often habitually browse through other related web sites and look for articles related to the same content. This is because the content of the articles changes slightly from one site to the next. In the example shown in Fig. 2, an article related to resolution and the number of pictures that can be taken was only mentioned at www.nikkeibp.co.jp, while an article relating to the manufacturer’s business strategy was only mentioned at

www.watch.impress.co.jp, and detailed specifications were only mentioned at www.zdnet.co.jp.

Here, when a user browses through some content zzz at a certain site-A, if sites-B, -C, and -D which are highly likely to contain similar articles related to content zzz—i.e., sites that have a high degree of similarity to site-A—have been detected, then the user will be more likely to obtain a greater amount of information if these sites are recommended to him/her. Next, we discuss how to efficiently extract sites-A, -B, -C, and -D.

1. First, acquiring the following user’s access history: every time the user clicks on a link, we extract site-R in which this link is embedded, site-T which is the destination site of the link, and label-L which is the text string of the link’s label. Moreover, label-L is subjected to morphological analysis (using widely used general-purpose morphological analysis software, like [7]) to extract several noun parts nN ⋅⋅⋅1

1, and {R, T, nN ⋅⋅⋅1 } triplets are recorded. In the example shown in Fig. 2, the following lists are recorded:

{biztech.nikkeibp.co.jp, www.kodak.co.jp,“Kodak”}{www.watch.impress.co.jp/pc, www.kodak.co.jp,“Kodak”}{www.watch.impress.co.jp/av, www.kodak.co.jp,

1 By the morphological analysis, a noun is classified into a proper noun or a general noun or an unknown word, and we use a proper noun and an unknown words to express a character of a link.

Fig. 2: Examples of the same conent in different web sites.

www.zdnet.co.j

www.watch.impress.co.jp/p

www.nikkeibp.co.j

www.watch.impress.co.jp/

6

“Kodak”}{www.zdnet.co.jp/news, www.kodak.co.jp,“Kodak”}

2. When there are pages carrying the same content in

different sites, it is highly likely that the destinations of the outward links embedded within these pages will be the same, so there is a higher possibility that access histories such as {Ra, T, N} and {Rb, T, N} where only site-R is different can be extracted to evaluate similarity. In Fig. 2, a link to the manufacturer’s site “Kodak” is embedded in all the sites.

3. Then, by extracting from the access history the sites-R1...n for which the site-T and N terms are the same and only the R terms are different, we can obtain a list of sites where the same content appears, and the degree of similarity between these sites is calculated. Of course, a different site list could be accumulated from all the sites where only the noun parts N are the same, but in practice the content will have a lower likelihood of being related.

This procedure is very general-purpose because it can learn {R, T, N} triplets as soon as the user first accesses sites, even if they have not yet been registered in the access log.

As for a way to recommend detected sites to the users, several methodologies are considered as follows: When the user has accessed any one of these sites, he/she is recommended to browse other sites with priority given to those having a higher degree of similarity. Moreover, when the user has accessed site-T which has already been recorded in {R1...n, T, N}, he/she is recommended to browse other sites with priority given to pages in the individual Rm of R1...n whose content has been updated recently. Another effective measure is to pre-fetch the contents of similar sites related to sites accessed by the user and to display a compiled list of these sites.

If there is a similarity relationship between site-A and site-B, but site-B is the competitor of site-A, it may be difficult to find a similar web site-B from site-A by using the strategy proposed in section 2.1. But, by using user’s access history, it may be possible to find a similarity relationship between site-A and site-B.

To verify the basic efficiency of this methodology, we investigated how many {R, T, N}s, having the same noun part N and same URL of the destination site T could be found from two similar sites. Figure 3 shows the results of this investigation: First, we extracted the link information from www.watch.impress.co.jp and www.zdnet.co.jp, which are hub-type sites concerning new products in the computer or office automation fields, and from www.asahi.com and www.yomiuri.co.jp, which are web sites of newspapers. Specifically, we extracted noun parts and their destination sites of the outward links to a depth of 3 levels (N=3). And second, we searched for the following types of {R, T, N}s from this link information: Case-I: {R, T, N} in which noun part N and destination

site T are both the same. Case-II: {R, T, N} in which only noun part N is the same. Case-III: {R, T, N} in which only destination site T is the

same. Finally, we checked that among the searched {R, T, N}s in Case-I whether each {R, T, N} did indeed express the character of the site T or not. The results show that even though many {R, T, N}s were searched for Case-II and Case-III from every combination of the sites, in Case-I, {R, T, N}s were mainly extracted from only combinations of the similar sites (“watch vs. zdnet” and “asahi vs. yomiuri”). And although, several {R, T, N}s were also extracted from the combination of “watch vs. asahi” having no relationship between them, we could find only one {R, T, N} which indeed expressed the character of both sites (see the following partial lists of extracted {T, N}s). watch vs. zdnet {http://www.minolta.co.jp/, MINOLTA} (The company name) {http://www.newtech.co.jp/, NEWTECH} (The company name) {http://www.melcoinc.co.jp/, MELCO} (The company name) {http://www.tsutaya.co.jp/, TSUTAYA} (The company name) {http://www.sony.co.jp/sd/, SONY} (The company name)

All Both N and T are same Only N is same Only T is same When both N and T are same, Case-I Case-II Case-III N indeed expressed the character of T

watch vs. zdnet 24482 22 23966 494 20 asahi vs. yomiuri 2288 15 2198 76 14 watch vs. asahi 4389 8 4281 100 1 zdnet vs. asahi 15879 1 14789 89 0 watch vs. yomiuri 1869 1 1845 23 0 zdnet vs. yomiuri 6350 3 6289 58 0 ( watch: www.watch.impress.co.jp, zdnet: www.zenet.co.jp, asahi: www.asahi.com, yomiuri: www.yomiuri.co.jp )

Fig. 3: Number of extracted {R, T, N}s

7

watch vs. asahi {http://www.microsoft.com/japan/misc/cpyright.htm, Microsoft} (The company name) {http://www.microsoft.com/japan/misc/cpyright.htm, Corporation.} {http://www.microsoft.com/japan/misc/cpyright.htm, ALL} {http://www.microsoft.com/japan/misc/cpyright.htm, rights} {http://www.microsoft.com/japan/misc/cpyright.htm, reserved}

Therefore, from this initial investigation, we can infer that if two sites have links having the same noun parts and same destination sites, these two sites can be thought as being strongly candidates for having a similarity relationship. Of course, this procedure is still at the stage of initial trials, and we are planning to verify its effectiveness by conducting full-scale verification trials.

IV. CONCLUDING REMARKS

In this study, only link information was used to detect similarity on the grounds that in hypertext environments such as the WWW, links express the most information regarding the characteristics of content. On the other hand, a considerable amount of research is being done in the field of natural language processing for procedures that determine the degree of similarity by analyzing the text content. Reference [8] describes one example of a study in which this procedure is applied to the WWW. However, it has been concluded that this sort of conventional text-based procedure does not function effectively in hypertext environments such as the WWW [5],[6].

Studies of ways to detect mirror sites by focusing on the link structure include references [3] and [4]. However, the aim of those methods is to detect only complete mirror sites, and in these procedures, other information—such as the link connection relationships and the information from a DNS, etc—is used besides the calculated degree of similarity corresponding to α in this study. Our procedure is different in that it regards some sites as being capable of being used as substitutes even though they have a low α value, and aims to detect these sites as well. To do this, we broadly divide web sites into hub-type and content-type sites, and the degree of similarity is calculated using methods tailored to each type. By comparing the degrees of similarity thereby obtained, it is possible to automatically judge whether or not web pages can be used as substitutes for each other.

In this paper, we focused on the operation of this similar web site detection method and proposed an effective procedure for finding candidates for similar web site that match the user’s preferences. This involves storing the connection relationships and label parts of links in sites accessed by the user, and extracting similar site candidates by starting from sites where the noun parts of the labels are the same.

ACKNOWLEDGEMENTS We thank our executive manager, Dr. Keiichi Koyanagi of NTT Network Innovation Labs, and the researchers of the Computer Networking Principles Research Group.

REFERENCES [1] Toshio Hirotsu, Satoshi Kurihara, Toshihiro

Takada, and Toshiharu Sugawara: ARESAIN - Alternative Resource Access Information Navigator, Thirteenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2001), 2001.

[2] Satoshi Kurihara, Toshio Hirotsu, Toshihiro Takada, and Toshiharu Sugawara: Mirror Site Navigator using Link Information, Proceedings of World Multiconference on Systemics, Cybernetics and Informatics (SCI2000), pp. 283–290, 2000.

[3] Krishna Bharat, Andrei Z. Broder, Jeffrey Dean, Monika Rauch Henzinger: A Comparison of Techniques to Find Mirrored Hosts on the WWW, Journal of the American Society for Information Science (JASIS), Vol. 51, No. 12, Nov. 2000, pp. 1114–1122.

[4] Narayanan Shivakumar and Hector Garcia-Molina: Finding near-replicas of documents on the web, International Workshop on the World Wide Web and Databases (WebDB ’98), 1998.

[5] O. Zamir and O. Etzioni: Grouuper -A Dynamic Clustering Interface to Web Search Results-, The Eighth International WWW Conference, 1999.

[6] L. Page, S. Brin, R. Motwani, and T. Winograd: The PageRank Citation Ranking: Bringing Order to the Web, Work in progress. http://google.stanford.edu/~backrub/pageranksub.ps.

[7] http://chasen.aist-nara.ac.jp/ [8] S. Chakrabarti, B. Dom, R. P., S. Rajagopalan, D.

Gibsoon, and J. Kleinberg: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text, The Seventh International WWW Conference, pp. 65–74, 1998.

[9] S Kurihara, S, Aoyagi, S, Onai, R, and Sugawara, T: Adaptive Selection of Reactive/Deliberate Planning for the Dynamic Environment, Robotics and Autonomous Systems, vol. 24, No. 3--4, pp. 183--195, 1998.

veljko milutinovic, frédéric patricelli, school of ...sugawara/pdf/kurihara-ssgrr2002.pdftelecom...

Documents