on the functional quality of service (fqos) to discover and compose interoperable web services

8
On the functional quality of service (FQoS) to discover and compose interoperable web services Buhwan Jeong a, * , Hyunbo Cho b , Choonghyun Lee c a Data Mining Team, Daum Communications Corporation, 1730-8 Odeung, Jeju 690-150, Republic of Korea b Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), San 31 Hyoja, Pohang, Kyungbuk 790-784, Republic of Korea c Industrial and Information Engineering, Yonsei University, 135 Sinchon, Seodamun, Seoul 120-749, Republic of Korea article info Keywords: Functional attribute Information compatibility Quality of service (QoS) Semantic similarity Service discovery Service-oriented architecture (SOA) abstract Despite its prevalence, the service-oriented architecture (SOA) still has an imperative challenge to achieve interoperability within and across enterprise applications. Minimal conditions for interoperabil- ity are: (1) to discover and plug in proper services for integration and (2) to support for seamless data exchanges between component services. A similarity-based approximate matching is a practical approach for both, in that the service discovery relies on functional matches between a query and service descriptions, and the seamless data exchange is granted by mapping information from a service to others. To these ends, this paper comprehensively investigates functional attributes of web services and their manipulation, and particularly highlights an information compatibility and mapping analysis. The func- tional quality of service (FQoS) allows service discovery and selection to step forward. Simulation results show that the present FQoS metrics are effective in service discovery. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction Nowadays, service-oriented architecture (SOA) and web ser- vices that enable flexible and loose integration of applications within and across enterprises have become one of the most phe- nomenal subjects both in academia and in industry. Within SOA, web service providers describe their services in WSDL 1 to designate what they are and how to invoke them, and then publish these descriptions via a public UDDI 2 registry. On the other hand, a service requester subscribes those WSDL descriptions and selects such ser- vices that satisfy an integration need. The requester often has to compose several services to accomplish complex tasks. Then, the re- quester invokes selected web services using XML/SOAP 3 messages. A widely accepted view is that the service discovery relies on service descriptions, the service selection on quality attributes, and the ser- vice composition on formal models and/or process models (e.g., Pet- ri-net, BPEL, WS-CDL), respectively (Chi & Lee, 2008; Day & Deters, 2004; Wang, Lee, & Ho, 2007; Menasce, 2004; Peltz, 2003). However, as indicated in (Jeong, Cho, Kulvatunyou, & Jones, 2007) the overall service composition and execution must be dealt in an integrated manner to establish interoperable interactions among web services. The importance of finding and selecting proper web services in- creases due to not only the increasing number of services available, but also dynamic changes in existing services themselves. The non- functional QoS (quality of service) metrics – for example, operating cost, availability, security, throughput, and latency (Menasce, 2002) – help us to select a high quality service from discovered ones. However, the mission-critical service discovery, which is a matchmaking process between a service query and service descrip- tions, relies wholly on the functional attributes such as service type, operation name, and input/output (I/O) data format and semantics. Until now, we have paid less attention on how to semantically manipulate functional attributes for service discov- ery. Moreover, advancements of similarity measures in various re- search fields such as natural language processing (NLP), information retrieval, and text mining are seldom applied to the service engineering. The precise service discovery further grants interoperable data exchanges with other services. There is an ur- gent need to consolidate a rigid measure to evaluate the degree of semantic match between functional properties. The primary objective of this paper is to present functional attributes of web services and to comprehensively investigate mathematical methods to manipulate and quantify those attri- butes, namely functional quality of service (FQoS). The FQoS metrics are defined in terms of term similarity, tree similarity, and text similarity measures in accordance with the attribute 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.06.087 * Corresponding author. Address: Data Mining Team, Daum Communications Corporation, 1730-8 Odeung, Jeju 690-150, Republic of Korea. E-mail addresses: [email protected], [email protected] (B. Jeong). 1 Web Service Description Language: http://www.w3.org/TR/wsdl. 2 Universal Description, Discovery, and Integration: http://www.uddi.org. 3 Simple Object Access Protocole: http://www.w3.org/TR/2000/NOTE-SOAP- 20000508. Expert Systems with Applications 36 (2009) 5411–5418 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Upload: buhwan-jeong

Post on 21-Jun-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Expert Systems with Applications 36 (2009) 5411–5418

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

On the functional quality of service (FQoS) to discover and composeinteroperable web services

Buhwan Jeong a,*, Hyunbo Cho b, Choonghyun Lee c

a Data Mining Team, Daum Communications Corporation, 1730-8 Odeung, Jeju 690-150, Republic of Koreab Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), San 31 Hyoja, Pohang, Kyungbuk 790-784, Republic of Koreac Industrial and Information Engineering, Yonsei University, 135 Sinchon, Seodamun, Seoul 120-749, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Keywords:Functional attributeInformation compatibilityQuality of service (QoS)Semantic similarityService discoveryService-oriented architecture (SOA)

0957-4174/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.eswa.2008.06.087

* Corresponding author. Address: Data Mining TeCorporation, 1730-8 Odeung, Jeju 690-150, Republic o

E-mail addresses: [email protected], bjeong@g1 Web Service Description Language: http://www.w2 Universal Description, Discovery, and Integration:3 Simple Object Access Protocole: http://www.w

20000508.

Despite its prevalence, the service-oriented architecture (SOA) still has an imperative challenge toachieve interoperability within and across enterprise applications. Minimal conditions for interoperabil-ity are: (1) to discover and plug in proper services for integration and (2) to support for seamless dataexchanges between component services. A similarity-based approximate matching is a practicalapproach for both, in that the service discovery relies on functional matches between a query and servicedescriptions, and the seamless data exchange is granted by mapping information from a service to others.To these ends, this paper comprehensively investigates functional attributes of web services and theirmanipulation, and particularly highlights an information compatibility and mapping analysis. The func-tional quality of service (FQoS) allows service discovery and selection to step forward. Simulation resultsshow that the present FQoS metrics are effective in service discovery.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Nowadays, service-oriented architecture (SOA) and web ser-vices that enable flexible and loose integration of applicationswithin and across enterprises have become one of the most phe-nomenal subjects both in academia and in industry. Within SOA,web service providers describe their services in WSDL1 to designatewhat they are and how to invoke them, and then publish thesedescriptions via a public UDDI2 registry. On the other hand, a servicerequester subscribes those WSDL descriptions and selects such ser-vices that satisfy an integration need. The requester often has tocompose several services to accomplish complex tasks. Then, the re-quester invokes selected web services using XML/SOAP3 messages. Awidely accepted view is that the service discovery relies on servicedescriptions, the service selection on quality attributes, and the ser-vice composition on formal models and/or process models (e.g., Pet-ri-net, BPEL, WS-CDL), respectively (Chi & Lee, 2008; Day & Deters,2004; Wang, Lee, & Ho, 2007; Menasce, 2004; Peltz, 2003). However,as indicated in (Jeong, Cho, Kulvatunyou, & Jones, 2007) the overall

ll rights reserved.

am, Daum Communicationsf Korea.

mail.com (B. Jeong).3.org/TR/wsdl.http://www.uddi.org.

3.org/TR/2000/NOTE-SOAP-

service composition and execution must be dealt in an integratedmanner to establish interoperable interactions among web services.

The importance of finding and selecting proper web services in-creases due to not only the increasing number of services available,but also dynamic changes in existing services themselves. The non-functional QoS (quality of service) metrics – for example, operatingcost, availability, security, throughput, and latency (Menasce,2002) – help us to select a high quality service from discoveredones. However, the mission-critical service discovery, which is amatchmaking process between a service query and service descrip-tions, relies wholly on the functional attributes such as servicetype, operation name, and input/output (I/O) data format andsemantics. Until now, we have paid less attention on how tosemantically manipulate functional attributes for service discov-ery. Moreover, advancements of similarity measures in various re-search fields such as natural language processing (NLP),information retrieval, and text mining are seldom applied to theservice engineering. The precise service discovery further grantsinteroperable data exchanges with other services. There is an ur-gent need to consolidate a rigid measure to evaluate the degreeof semantic match between functional properties.

The primary objective of this paper is to present functionalattributes of web services and to comprehensively investigatemathematical methods to manipulate and quantify those attri-butes, namely functional quality of service (FQoS). The FQoSmetrics are defined in terms of term similarity, tree similarity,and text similarity measures in accordance with the attribute

5412 B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418

types. In particular, the paper presents an information compatibil-ity and mapping analysis to support for interoperable service dis-covery and composition. Accordingly, we address an FQoS-drivenservice discovery procedure. Additionally, the service compositionproblem and quality attributes are surveyed and discussed.

The rest of this paper is organized as follows: Section 2 providesa multi-criteria service composition procedure and service attri-butes. Section 3 presents various FQoS metrics to quantify the de-gree of similarity of various functional attributes, the informationcompatibility and mapping analysis, and an FQoS-driven servicediscovery. Section 4 gives a preliminary validation of using theFQoS metrics for service discovery. Finally, Section 5 summarizesthe paper with future works.

2. Service composition and service attributes

2.1. Multi-criteria service composition

Using processes is the typical approach to services composition(Zeng, Benatallah, Dumas, Kalagnanam, & Sheng, 2003), in whichthe interdependencies and interaction logics among web servicesare expressed in a process model. Fig. 1 depicts an epitome to dis-cover and allocate web services for a given composition model.This allocation procedure illustrates that a composite web serviceis an optimal combination of associated web services, meetingthe quality constraints as well as the composition rules and func-tional attributes. Suppose a simple scenario in which a Weather-Forecaster system performs sub-tasks in the order of inquiringthe zip code for a given city and retrieving the weather informationbased on the zip code. In this case, the composition model statesthose tasks in that order. We first look-up services that providethe same functionality of each task as specified, generate variouscombinations of discovered services, and then choose the best exe-cution plan. If weather forecasting has two possible services –WeatherByZip and FastWeather – with operation cost $1 and $3,respectively, and if the allowable amount of costs for this task isup to $2, then the final composite service includes the WeatherBy-Zip service.

For each sub-task, the discovery activity finds and aligns poten-tial services that match the functional attributes. The activity re-ceives a composition model as input and returns anothercomposition model aligned with sets of relevant services as output.In particular, the initial composition model, referred to as a ser-vice-independent composition model (SICM), only indicates impli-cit sub-tasks (in a node) and their interdependencies (in a directedarc) necessary to accomplish a complex task. We assume that the

Discover Services

GenerateExecution P

Service-IndependentComposition Model Service-Mapped

Composition Model

MatchMaker

Schedul

Functional Attributes

Fig. 1. Procedure for allocating and op

SICM is complete, in that each sub-task has at least one corre-sponding service available, thereby requiring no further taskdecomposition. The service discovery finds and maps correspond-ing services to each sub-task. After this process, the SICM becomesa service-mapped composition model (SMCM), each node of whichindicates a list of alternative services. This activity is undertook onMatchMaker that compares the functional attributes such as ser-vice names and I/O data definitions.

An SMCM can represent various execution plans with differentservices allocated and/or different execution orders. Each execu-tion plan provides the same functionalities, but different combina-tions of services and different overall quality assessment. Anexecution plan is represented in a directed acyclic graph (DAG)of services, in which a node represents a service and an arc repre-sents the execution order between two services (Zeng et al., 2003).The plan allows concurrent executions (i.e., AND relations) if pos-sible, but not alternative executions (i.e., OR relations). In addition,the execution plan must be information compatible (see Section3.5 below). For a given SCMC, Scheduler generates all the possibleexecution plans.

Finally, Optimizer evaluates those execution plans based ontheir quality assessments, QoS, and selects the best one as the finalcomposite service. Selecting the final execution plan is formulatedas a multi-objective optimization problem to trade off betweenmultiple quality attributes. In addition, since the evaluation usesaveraged QoS in simulations or historical data, the final plan maynot be the optimal one in a run-time. For run-time recovery, Opti-mizer may choose alternative plan(s) or specify alternative ser-vice(s) for each task. Following subsections deal with a fewquality attributes and functional attributes of web services.

2.2. Service attributes

The service composition problem tightly involves with interpre-tation of service attributes – usually, functional attributes andquality attributes. As the name indicates, the former functionalattributes represent a web service’s functions and operations toperform, whereas the quality attributes are non-functional proper-ties describing performance and quality the service provides.Roughly stated, the functional attributes involve more in the ser-vice advertisement and discovery, while the quality attributes arelikely responsible for the service selection.

First, the quality attributes, a.k.a., QoS (quality of service) met-rics, are critical in selecting an optimal execution plan within re-source constraints such as total operation cost and lead time.Many researches have thoroughly investigated the quality attri-

lans

Evaluate ExecutionPlans and Select

Optimal One

Execution Plans

Execution Plan(Composite service)

Quality Constraints

er

Optimizer

timizing web service composition.

B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418 5413

butes. Major quality attributes include operation cost (&4) to in-voke a service, performance (%) in terms of latency and throughput,availability (%) in probability whether the service is ready for animmediate use, accessibility (%) indicating the capability of servingrequests, security (%) in terms of confidentiality, authenticity, andintegrity, and interoperability (%) between services, and reliability(%) of both message and service. Other minor attributes are usabil-ity, scalability, extensibility, adaptability, testability, auditability,operability, deployability, modifiability, reputation, and so forth(Menasce, 2002; O’Brien, Bass, & Merson, 2005). In addition to thoseobjective attributes, according to (Wang et al., 2007), we may definesubjective quality attributes (e.g., user satisfaction level) that areusers’ feedbacks and thoughts about the services. Since most of theseattributes are not explicit, they should be evaluated with historicaldata and/or simulations.

Second, the functional attributes are explicit, but have been lessstudied from the engineering perspective. Although services aretransparent to any users, their implementations are like a black-box. This means that users cannot know how the services exactlyimplement intended functionalities. In other words, the only partof a service that is visible to the outside world is what is exposed viaits description and formal contacts, but the underlying logic is invisibleand irrelevant to service requesters (O’Brien et al., 2005). Therefore,we should infer functionalities from visible information via a re-verse engineering process. The service discovery has to completelyrely on semantics-rich descriptions about the service, includingWSDL, UDDI, and any semantic annotations in, for example,OWL-S5. The descriptions mainly capture representations of whatthe service does (its capabilities), the information it needs to com-plete a task (its inputs), and the information it provides when thetask is done (its outputs) (Kokash, 2006; Dong, Halevy, Madhavan,Nemes, & Zhang, 2004). We list here the functional attributes thatare augmentable in WSDL, UDDI, and OWL-S6 as follows:

Service Category The service category, or the service type, spec-ifies a classification of the service – for example, businessdomains the service belongs to. Widely accepted classificationschemes are UNSPSC7 and NAICS8.Service Name The service name refers to the name of the servicethat is being offered. It can be used as a unique identifier of theservice and also provides a high-level description of what theservice represents.Operation Name Since a service can contain a collection of oper-ations, the operation name designates a more specific function-ality the service provides.Data Definition The essential attributes describing a service’sfunctionalities are the input and output data/messages con-sumed and produced by the service. Those data definitionsare assumed to be expressed in XML Schema for maximuminteroperability and platform neutrality.Annotation The annotation is any auxiliary description about aservice either in a plain text or in a structured and ontologicalmanner.

3. Functional quality of service (FQoS)

The service discovery finds and ranks potential services basedon a quantitative analysis of functional attributes for a given task

4 & and % indicate ‘less-is-better’ and ‘more-is-better’ attributes, respectively.5 Web Ontology Language for Services: http://www.daml.org/services6 Note that OWL-S refers to non-functional quality attributes as functional

attributes.7 The United Nations Standard Products and Services Code, http://www.unspsc.org8 The North American Industrial Classification System, http://www.naics.com

description (i.e., service query). We refer to the functional analysisas functional quality of service (FQoS), which is defined by seman-tic similarity of each functional attribute. We can classify the func-tional attributes into atomic labels (i.e., service category, servicename, and operation name), structured XML data (i.e., input andoutput data definitions), and textual data (i.e., annotation). Foreach of them, we need to approach in different ways to exploitits semantics.

3.1. Name reasoning: term similarity

The first group of functional attributes represents atomic termssuch as a service category and name. To deal with such atomicterms, name reasoning is envisioned to compute the similarity be-tween terms. Term similarity is defined as lexical similarity(Jeong, Kulvatunyou, Ivezic, Cho, & Jones, 2005), which intuitivelycompares the labels of service categories or names, i.e., quantify-ing commonality between those individual terms using purelylexical information. There are many existing approaches includinglexical form-based (syntactic) measures such as prefix/suffix, n-gram, and edit distance (Shvaiko & Euzenat, 2005; Do & Rahm,2003); and lexical semantics-based (semantic) measures such asword sense and synonym, edge counting, and information content(Jarmasz & Szpakowicz, 2003; Pedersen, Patwardhan, & Michel-izzi, 2004; Resnik, 1995; Wu & Palmer, 1994). The latter semanticmeasures usually provide more accurate predictions. For example,even though two services HotelReservation and Accommodation-Booking provide the same functionality, a syntactic measure givesa very low score. On the other hand, a high score is expected by asemantic measure due to synonymous relations between pairs ofHotel and Accommodation, and Reservation and Booking. The syn-tactic measures, however, are useful to compare symbols suchas abbreviations/acronyms and classification codes. Note that foraccurate measurement the semantic measures necessitate awell-compiled lexical knowledge resource such as thesaurus andWordNet.

In addition, as shown in terms HotelReservation and Accommo-dationBooking, the labels are often compound words, which con-catenate individual dictionary words and/or allowableabbreviations to enhance expressivity. Those compound wordsprovide more information than individual words do because addi-tional words provide contextual information to others. In otherwords, the compound words make the meaning of includedwords more specific. For example, the Reservation is restrictedto HotelReservation. Special treatments are necessary to computesimilarity between such compound words. First, those compoundwords must be normalized by a recursive process of: (1) tokeni-zation, which separates a compound word into atomic dictionarywords; (2) lemmatization, which analyzes these separated wordsmorphologically in order to find their basic forms; and (3) elimi-nation, which discards meaningless stop words such as article,preposition, and conjunction (Jeong, 2006). After normalization,we compute the compound term similarity by solving a stablemarriage problem, in which two parties make a perfect marriage– a complete matching – based on a perfectionist egalitarianpolygamy (Jeong, 2006). The compound words’ similarity is as-sessed as follows:

(1) For two compound words, normalize them and assign thecorresponding individual words to vertex sets of a fully con-nected bipartite graph.

(2) For each pair of words from opposite partitions, compute theterm similarity with any measure, and assign it to the con-necting edge as a preference.

(3) Using the preference-attached bipartite graph, solve the sta-ble marriage problem (or an assignment problem).

Hotel

Reservation

Accommodation

Booking

HotelReservation AccommodationBooking

0.7368

1.0

0.21050.8421

Fig. 2. An example of measuring compound words’ similarity.

9 Here, we do not deal with a structured ontological representation for it mayfurther require a complicated ontology reasoning process such as (Anicic, Marjanovic,Ivezic, & Jones, 2007).

5414 B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418

(4) Assess the overall similarity between the compound words.For this process, a few possible functions include max, min,or average of the individual similarities of the matched wordpairs. Alternatively, we introduce a new computation as afraction of the total of matched pairs’ similarities over thelarger number of meaningful individual words of each com-pound word.

As shown in Fig. 2, for instance: (1) the terms HotelReservationand AccommodationBooking are normalized and assigned to abipartite graph, (2) term similarity of each pair is assigned byWu and Palmer’s algorithm (Wu & Palmer, 1994), (3) plausiblemappings are selected, and accordingly, and (4) the overall similar-ity is assessed as Simcomp = (1 + 0.7368)/max(2,2) = 0.8684.

3.2. Structure matching: tree similarity

The second structure matching exploits semantics captured instructured documents. The structure matching receives two (ormore) input XML documents, and then establishes a plausible setof semantic mappings among them. The input and output data def-initions in XML Schema are subject to the structure matching dueto their structured expression. The structure matching is a morecomplicated analysis than the name reasoning because it must dealwith more complex structures in a tree (i.e., XML DOM tree)beyond atomic labels. We define structural similarity, or treesimilarity, as a measure for structured XML – both schema and in-stance – documents. The tree similarity counts the commonalitybetween XML documents by taking into consideration the lexicalsimilarities of multiple, structurally related terms. Widely usedtree similarity measures include node, edge or path matching,inclusive path matching, tree edit distance (TED), tag similarity,and Fourier transformation-based one (Zhang, Li, Cao, & Zhu,2003; Flesca et al., yyy; Dalamagas, Cheng, Winkel, & Sellis,2006). It is noted that a tree must be uniquely labeled and orderednot to be NP-complete.

Recently, we have introduced a kernel-based similarity measurefor XML documents, which provides a more reliable estimate thanconventional ones do (Jeong, Lee, Cho, & Kulvatunyou, 2007). Thekernel method (Shawe-Tayler & Cristianini, 2004) implicitly com-putes the inner product of a pair of data in a feature space as thetree similarity between XML documents. Before that computation,we need to normalize XML documents into plain texts, while keep-ing the structural properties such as parent-to-child and left-to-right order. We serialize an XML document tree by visiting everynode in a depth-first traversal, and then apply a modified stringkernel to the serialized texts. The inner product of the texts be-comes the tree similarity between corresponding XML documents.

Furthermore, an information compatibility analysis further ex-ploits the semantic relationships in the item-to-item (or ele-ment-to-element) level, as well as in the document-to-document

level as described above. This implies that for two XML schemashaving a partial match (i.e., their similarity <1.0), the analysis mapseach information item in a document to the corresponding one inthe other. This information mapping is important because someinformation items are mandatory while others are conditional oroptional. Two documents are not an exact match, for example,when one has only mandatory items but the other has optionalitems as well, that is, the latter is more general. In that case, trans-formation from the latter to the former is seamless, but not viceversa. The reason this kind of analysis is necessary is that, as a ser-vice is an information transformer from inputs to outputs, the ser-vice composition (or connection) is an information transfer fromoutputs from services invoked already (or initial inputs from exter-nal applications) to inputs to services to be invoked.

3.3. Annotation mining: text similarity

The third attribute type is an unstructured textual descrip-tion9, which allows us to adopt statistical text mining techniques.This analysis may be optional only when annotations are available.A vector space model (VSM) (Salton, Wong, & Yang, 1975) is anumerical’document-by-term’ matrix V for a set of documents,where each element is an indicator of presence or absence of theword in the corresponding document either by a boolean valueor by a real-valued weight. For the real-valued model, TF–IDF (termfrequency–inverse document frequency) is often used that takesinto account the appearance frequency of a word in the documentset. The weight wij of the ith term in the jth document is deter-mined by wij = TFij � IDFi = TFij � log(n/DFi), where TFij is the num-ber of ith term’s occurrences within the jth document, and DFi isthe number of documents (out of n) in which the term appears.Using this model, the text similarity between two annotations, v1

and v2, is simply calculated as the cosine of the angle, i.e.,Simðv1; v2Þ ¼ VT

v1Vv2= kVv1k2kVv2k2. The original VSM often suffers

from inefficiency for massive real-world data. For this reason, thelatent semantic analysis/indexing (LSA/LSI) reduces the VSM’sdimension using singular value decomposition (SVD) (Landauer,Foltz, & Laham, 1998).

3.4. Integrated functional similarity

Using a single criterion is a better way to make a resolute deci-sion for service discovery, especially for automation, than usingdifferent criteria together. For this reason, it is natural to synthe-size various similarity measures above into an integrated one, asS = f(s1, s2, . . . ,sn). A typical function is the weighted average, de-fined by S = wTs, in which s 2 Rn is a column vector for individualsimilarity values and w 2 Rn is an aggregation weight vector. Amore sophisticated way is using a machine learning classifier(e.g., artificial neural networks, partial least square, support vectormachines) and that gives a more robust and accurate aggregationthan arithmetic computations do (Jeong, 2006). In addition, featureselection techniques offer a good estimate of the aggregationweights. The feature selection techniques also train a classifier,as shown in Fig. 3, and estimate the weight by quantifying theinfluence of each similarity measure on the final aggregation(Jeong & Cho, 2006). According to (Jeong, 2006), if an appropriateset of weights is given, the weighted average is a fast and reliablemethod. One disadvantage of using machine learning is that it re-quires a reliable set of training data, i.e., similarities of pairs of ser-vice descriptions and evaluations by human operators.

WSDL

Weight Determination

ws

1.….91..8

……………

.5….3.8.6

.7….6.41.

sT…s3s2s1

Evaluate

Training Data

FQoSmetrics

S = wTs

Fig. 3. Feature selection-based weighted aggregation.

InformationMapping ToolService A

Service B

External inputs and/oroutputs from previous services

Sources Destination

Fig. 4. Service connection via information mapping.

B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418 5415

3.5. Interoperable service composition: information compatibility andmapping

Take a further look at the information compatibility for it is crit-ical to seamless data integration and exchange among services. Anumber of services that offer the same functionality may be avail-able which are possibly labeled with same/similar names, but con-suming or producing different input or output, or both datadefinitions. This is the case when different vendors provide similarservices for different industry verticals, when one customizes astandard service to its running environment, or even when a ser-vice itself evolves over time. That it, the services to be connectedare desired to be compatible in data exchange – informationcompatibility.

The paper raises two practical issues to deal with the informa-tion compatibility and mapping. Before detailed, take a simple ser-vice connection as shown in Fig. 4, where the information mappingtool10 merges, fragments, and transforms external inputs and/or out-puts from previous services, namely source, into an input consum-able by Service B, namely destination. The first issue is the relationbetween sources and a destination, which is beyond the degree ofsimilarity between data definitions. The sources must be sufficientfor the destination. In other words, the sources must contain allthe information items required by the destination. The ideal case isthat a source is identical to the destination – Exact Match. The secondcase is that a source or several sources collectively cover the destina-tion – General Match. In this case, since the sources have more infor-mation or different structures from the destination, the mapping toolmerges, segments, and re-organizes the sources to be identical to thedestination. In addition, the third case is that the sources provide thedestination with insufficient information – Partial Match, therebyrequiring additional inputs from external applications.

The second issue is about information essentiality. For a specificinformation item I, if a source defines it as an optional item, theitem may not appear in an instance message. On the other hand,suppose that a destination defines the same item I as mandatory.In this case, even though these documents are an exact match inthe schema level, they tend to be a partial match in the instancemessage level due to the possible absence of the item I in thesource message. Therefore, a seamless data exchange is grantedafter such an information essentiality condition is also met. A min-imal interoperability condition between services is that all themandatory items in the destination are also defined as mandatoryin the sources. Although a source and the destination exactly sharethe same information items (their similarity = 1.0), the mismatchesin information essentiality may depreciate their information com-

10 The tool can use either static XSLT rules or ontology-based mappings.

patibility (<1.0). To sum up, the information compatibility is notjust measured by the degree of similarity between sources andthe destination, but also by the relations between essential infor-mation items.

An information compatibility index is compuated as follows:

(1) Visit the destination by a breadth-first traversal from theroot element. For each information item (or element), com-pare it with those in the source by the breadth-first traversaluntil matched using tree similarity. The sub-elements undermatched items, both in the source and the destination, areremoved from the further comparison candidates.

(2) For every pair of matched information items (and their sub-elements), compute the match score for information essenti-ality as MS = NE + a � NM, where NE is the number of essen-tiality matches, NM is the number of essentialitymismatches, and a 6 1.0 is a depreciation factor.

(3) Compute the information compatibility as a fraction of thetotal match score over the number of elements in the desti-nation document. Apparently, the information compatibilityof identical documents is 1.0.

3.6. FQoS-driven service discovery and composition

The service discovery is an activity to find potential servicesthat not only meet the functional requirements specified in a ser-vice query, but also are compatible with neighbor services. An inte-grated approach to the FQoS metrics as well as QoS metrics isneeded for correct discovery. The integration makes not only ser-vices of high quality discovered, but also them seamlessly con-nected. For computational efficiency, we design a successivediscovery procedure as follows:

(1) (Filtering) For a service query in each SICM node, look up allthe services belonging to service categories – both explicitcategories specified in the query and associated categorieswith the explicit ones. This look-up step compares the querywith tModel’s in the UDDI registry.

Registry(WSDL)

Query

FeatureEngineering

SimilarityComputation

Aggregation& Evaluation

• Service name

• Operation name

• Input message type

• Output message type

• Annotation

• Term Similarity

• Tree Similarity

• Text Similarity

S = wTs > S0

S

Fig. 5. Similarity-based web service discovery.

5416 B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418

(2) (Discovery) Using an integrated FQoS measure, sort servicesout that not only have the names semantically associatedwith the name specified in the query, but also have similarinput and output message types to those of the query. Anno-tations, if available, are also used here, and human interven-tion may be required. The following paragraph particularlydetails this similarity-based service discovery. The result isan SMCM.

(3) (Planning) Identify service(s) whose input and output defini-tions are highly compatible with those of connected servicesas well as similar to those specified in the service query. Thisstep may be iterative due to the intricate data transfersamong services. This produces a collection of executionplans.

(4) (Selection) Finally, optimize and select the best configurationfor the composition model using the quality attributes ofinterest.

Fig. 5 delineates the service discovery process using an inte-grated similarity measure, and the process is roughly made up offollowing steps:

Feature engineering: This step is to extract necessary pieces ofinformation from both WSDL descriptions and query and, con-sequently, to represent them in a consumable format by simi-larity computation. The necessary information for discoveryincludes service and operation names, input/output messagetype definitions, and descriptive annotations. We assume thatthe candidate WSDLs are already filtered by the service classifi-cation in the UDDI registry.Similarity computation: Depending on the type of information,this step compares the extracted information by means of term,tree, and text similarities.Aggregation and evaluation: The individual measures are aggre-gated via, for example, a weighted average function. Accord-ingly, the step determines whether candidate services matchthe query or not.

4. Validation

To evaluate the effectiveness of FQoS-based service discovery,we conduct an experiment using a collection of web services fromXMethods11 (Kokash, 2006; Wu & Wu, 2005). It has a total of 38 ser-vices from five categories: weather information finder (6), currencyrate converter (7), DNA information searcher (5), SMS sender (10),and ZIP code finder (10). The previous papers designed the experi-ment: (1) use each service description as a query, (2) compare thequery description with other descriptions, and (3) compute Precision

11 http://www.xmethods.com.

and Recall based on whether the responses have the services in thesame category including the query service itself. Indeed, this designis a clustering problem. Therefore, we will show how well the FQoSmeasures group the similar service descriptions together.

In this experiment, we use the service operation name, inputmessage type, and output message type as functional attributes.In particular, Wu and Palmer’s term similarity algorithm (Wu &Palmer, 1994) is used to accommodate the operation names, whilethe kernel-based measure (Jeong et al., 2007) is used to measurethe tree similarity of both input and output message types. Anaverage and a weighted average are also used for aggregation. Figs.6 and 7 visualize the pair-wise similarity matrices by each FQoSmetric. They apparently show us that web services in the same cat-egory tend to be more similar (dark color), whereas services be-tween categories are less. However, the region-based services(i.e., Weather and ZIP) are likely to flock together because theyshare common terms such as zip, city, and state. Another observa-tion is that the message types are a better service separator be-tween categories than the operation name is, because they havemore information about services. In addition, Fig. 7C and D implythat the weighted average better discriminates services than theaverage does.

Fig. 6. Pair-wise similarity matrix based on operation name: deeper color indicatesmore similar.

Fig. 7. Similarity matrices: (A) input message type, (B) output message type, (C) average, and (D) weighted average.

B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418 5417

Using the PAM (partitioning around medoids) algorithm, wecluster those web service descriptions, and the results are summa-rized in Table 1. Ratios of correct classifications are approximatelyturned into from 0.74 to 0.9 in Precision. These Precisions are atleast same as or better than those in (Kokash, 2006; Wu & Wu,2005). This implies that the service discovery is very biased tothe use of certain similarity measures and aggregation functions.In other words, the improvements come from the use of the ker-nel-based similarity measure and feature selection to determinethe aggregation weights. In addition, although each functionalattribute alone discriminates those services well, the integratedmeasures outperform individuals.

We have clustered other 209 web services in 21 categories(Kokash, 2006). From 447 service descriptions, we have removedsuch service categories that have a few descriptions. The ratio ofcorrect classifications is around 65%, while the average Precisionof the 209 services by (Kokash, 2006) is around 0.4. Note that theadjusted Precision metric used in (Kokash, 2006) is nearly sameas the ratio of correction discriminations in the clustering analysis.

Table 1Summary of service descriptions clustering

Name Input Output Average W. average

Weather 2/4a 4/2 6/0 6/0 6/0Currency 5/2 7/0 4/3 7/0 7/0DNA 5/0 5/0 5/0 5/0 5/0SMS 10/0 10/0 9/1 10/0 10/0ZIP 6/4 6/4 5/5 6/4 6/4Total 28/ 10 32/6 29/9 34/4 34/4

a No. of correct classifications/no. of mis-classifications.

5. Conclusion

The service orientation paradigm had already passed the criticalmass in integrating and deploying enterprise applications. Animperative challenge is to achieve interoperability among servicesby means of proper discovery and selection of services and supportfor seamless data exchanges. To this end, the paper identified acomprehensive list of functional attributes and their numericalmanipulations for better discovery and composition; and pre-sented the information compatibility and mapping analysis forinteroperable data exchanges. According to the type of the func-tional attribute, an FQoS metric incorporates one of term, tree,and text similarity measures. In addition, we addressed theFQoS-driven approximate service discovery procedure. The preli-minary simulation results demonstrate that the FQoS metrics aresuitable for discovering proper web services. It is worthy notingthat the simulations were not intended to evaluate the perfor-mance of particular similarity measures. The information mappingis critical both to service discovery and selection, and to interoper-able data exchanges among component services. We need to cus-tomize the general-purpose Naïve FQoS metrics to be morespecific to web service descriptions. Hence, the future works in-clude consolidating the information compatibility measure,advancing web service description similarity measures such as(Kokash, 2006; Dong et al., 2004), and conducting additional exper-iments with a larger collection of industrial services.

References

Anicic, N., Marjanovic, Z., Ivezic, N., & Jones, A. (2007). Semantic enterpriseapplication integration standards. International Journal of ManufacturingTechnology and Management, 10(2/3), 205–226.

5418 B. Jeong et al. / Expert Systems with Applications 36 (2009) 5411–5418

Chi, Y.-L., & Lee, H.-M. (2008). A formal modeling platform for composing webservices. Expert Systems with Applications, 34(2), 1500–1507.

Dalamagas, T., Cheng, T., Winkel, K., & Sellis, T. (2006). A methodology for clusteringXML documents by structures. Information Systems, 31(3), 187–228.

Day, J., & Deters, R. (2004). Selecting the best web service. In Proceedings of the 2004conference of the centre for advanced studies on collaborative research(CASCON’04), (pp. 293–307).

Do, H., Rahm, E. (2003). COMA – A system for flexible combination of schemamatching approach. In Proceedings of the 29th international conference on verylarge data base (VLDB), (pp. 610–621).

Dong, X., Halevy, A., Madhavan, J., Nemes, E., & Zhang, J. (2004). Similarity search forweb services. In Proceedings of the 30th VLDB conference, (pp. 372–383).

Flesca, S., Manco, G., Masciari, E., Pontieri, L., & Pugliese, A. Fast detection of xmlstructural similarity, IEEE Transactions on Knowledge and Data Engineering 17(2).

Jarmasz, M., & Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity. InProceedings of conference on recent advances in natural language processing(RANLP), (pp. 212–219).

Jeong, B. (2006). Machine learning-based semantic similarity measures to assistdiscovery and reuse of data exchange XML schemas, Ph.D. thesis, Department ofIndustrial and Management Engineering, Pohang University of Science andTechnology.

Jeong, B., & Cho, H. (2006). Feature selection techniques and comparative studies forlarge-scale manufacturing processes. International Journal of AdvancedManufacturing Technology, 28(9), 1006–1011.

Jeong, B., Kulvatunyou, B., Ivezic, N., Cho, H., & Jones, A. (2005). Enhance reuse ofstandard e-business XML schema documents. In Proceedings of internationalworkshop on contexts and ontology: Theory, practice and application (C& O’05) inthe 20th national conference on artificial intelligence (AAAI’05).

Jeong, B., Lee, D., Cho, H., & Kulvatunyou, B. (2007). A kernel method for measuringstructural similarity between xml documents. In Proceedings of the 20thinternational conference on industrial engineering and other applications ofapplied intelligent systems (IEA/AIE-2007), (pp. 572–581).

Jeong, B., Cho, H., Kulvatunyou, B., & Jones, A. (2007). A multi-criteria web servicescomposition problem. In Proceedings of the IEEE international conference oninformation reuse and integration (IRI 2007), (pp. 379–384).

Kokash, N. (2006). A comparison of web service interface similarity measures. InProceedings of the European starting AI researcher symposium (STAIRS), (pp. 220–231).

Landauer, T., Foltz, P., & Laham, D. (1998). An introduction to latent semanticanalysis. Discourse Processes, 25, 259–284.

Menasce, D. A. (2002). QoS issues in web services. IEEE Internet Computing,72–75.

Menasce, D. A. (2004). Composing web service: A QoS view. IEEE Internet Computing,8(6), 88–90.

O’Brien, L., Bass, L., & Merson, P. (2005). Quality attributes and service-orientedarchitectures, Technical report CMU/SEI-2005-TN-014, Carnegie MellonUniversity.

Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet::Similarity:Measuring the relatedness of concepts. In Proceedings of the 19th nationalconference on artificial intelligence (AAAI’04), (pp. 1024–1025).

Peltz, C. (2003). Web service orchestration and choreography: A look at WSCI andBPEL4WS. WSJ Feature, 1–5.

Resnik, P. (1995). Using information content to evaluate semantic similarity in ataxonomy. In Proceedings of the 14th international joint conference on artificialintelligence (IJCAI-95), (pp. 448–453).

Salton, G., Wong, A., & Yang, C. (1975). A vector space model for automatic indexing.Communications of the ACM, 18(11), 613–620.

Shawe-Tayler, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. NewYork, NY: Cambridge University Press.

Shvaiko, P., & Euzenat, J. (2005). A survey of scham-based matching. Journal of DataSemantics IV, 3730, 14–171.

Wang, H.-C., Lee, C.-S., & Ho, T.-H. (2007). Combining subjective and objective QoSfactors for personalized web service selection. Expert Systems with Applications,32(2), 571–584.

Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings ofthe 32nd annual meeting of the associations for computational linguistics, (pp.133–138).

Wu, J., & Wu, Z. (2005). Similarity-based web service matchmaking. In Proceedings ofthe 2005 IEEE international conference on service computing (SCC’05), (pp. 287–294).

Zeng, L., Benatallah, B., Dumas, M., Kalagnanam, J., & Sheng, Q. (2003). Qualitydriven web services composition. In Proceedings of WWW2003, (pp. 411–421).

Zhang, Z., Li, R., Cao, S., & Zhu, Y. (2003). Similarity metric for XML documents. InProceedings of workshop on knowledge and experience management (FGWM2003).