coin d5.2.1b-information interoperability services specifications m29 v1.0.pdf
Post on 03-Jun-2018
224 Views
Preview:
TRANSCRIPT
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
1/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
D5.2.1bInformation Interoperability Services
SpecificationsM29 issue
Author: Del Grosso TXT, Taglino CNR, Smith CNR
Contributors: Del Grosso TXT, Taglino CNR, Smith CNR
Dissemination: Public
Contributing to: WP 5.2
Date: 30.06.2010
Revision: V1.0
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
2/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 2/31
TABLE OF CONTENTS
1. EXECUTIVE SUMMARY 3
2. INTRODUCTION 4
2.1. WP 5.2Innovative Information Interoperability Services 42.2. Metrics and Indicators 42.3. Structure of this deliverable 4
3. INTEROPERABILITY SPACES 5
4. INNOVATIVE SERVICES FOR SEMANTIC RECONCILIATION 6
4.1. Semantic Reconciliation Approach 64.2. Semantic Annotation 74.3. Mapping Discovery Service 134.4. Semantic Reconciliation Rule Generation Service 194.5. Source2Target Mediator Generation Service 20
4.6. Semantic Reconciliation Suite Platform 204.7. State of the Art 21
5. DATA PAYLOAD INTEROPERABILITY SERVICE 25
5.1. Updated requirements 255.2. Rules application 255.3. JRuleEngine 265.4. The negotiation process 27
6. INNOVATIVE SERVICES FOR FEDERATED INTEROPERABILITY 28
6.1. The COIN approach 286.2. Requested features 28
7. CONCLUSIONS 29
8. REFERENCES 30
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
3/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 3/31
1. Executive Summary
This deliverable specifies the Innovative Information Interoperability Services (from now on
IIIS) that will be implemented and tested by month 24 of the COIN project.
The revised specifications of the IIIS have been defined starting from the results of the
previous set of services released at month 18.
Such services have been tested and evaluated by end users and as a result of such process the
new specifications have been designed.
The IIIS introduces the concept of Interoperability Space, where three different groups of
services have been created:
Innovative Services for Semantic Reconciliation: which start from the results obtained in EU
Project ATHENA (ATHENA IP 507849) and develop new features to extend and make
more performing the operation of translation between documents and ontologies.
Data Payload Interoperability Services: which analyzes and studies new ways to perform
communication, coordination and exchange of business documents in interoperability spaces
(1:1, 1:n, n:m communications).
Innovative Services for Federated Interoperability: which analyzes and studies new ways to
perform information interoperability in a federated space, where no common reference
models are available.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
4/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 4/31
2. Introduction
2.1. WP 5.2 Innovative Information Interoperability Services
SP5 is the COIN subproject which deals with the Enterprise Interoperability(EI) problems.
The Sub-Project (SP) started with WP5.1 where a baseline of services has been created
starting from results obtained in past projects.The next steps in SP5 are the analysis and development of the Innovative Services which will
cover three different fields of Enterprise Interoperability:
Information Interoperability in WP5.2,
Knowledge Interoperability in WP5.3, and
Business Interoperability in WP5.4.
The aim of WP5.2 is to develop new services which allow to:
Exchange business documents (BODs) written in UBL (Universal Business Language)
inside new interoperability spaces, evaluating several possibilities of exchange: 1:1,
1:n and n:m.Semantically annotate BODs in order to automatically derive the mediation rules
necessary for the semantic reconciliation of formats and contents.
Study how to create federated spaces where the business documents can interoperate
each other without the needs of a common reference models.
2.2. Metrics and Indicators
The COIN description of work, as well as deliverable D1.2.1 Quality Assurance Manual,
report the minimum number of services which must be implemented in the scope of WP5.2 in
order to reach a satisfactory base of services as a result of the work performed.
Such minimal requirements are summarized in the following table:
Table 1: Metrics and performance indicators for EI innovative services
Metric Milestone
M24
Milestone
M48
Number of Innovative Information Interoperability Services 1 2
2.3. Structure of this deliverable
This deliverable is structured into six chapters:
Chapter 1 is the executive summary of this deliverable.
Chapter 2introduces the objectives and structure of this deliverable.
Chapter 3 describes the interoperability spaces.
Chapter 4describes the innovative services for semantic reconciliation.
Chapter 5 describes the data payload interoperability service.
Chapter 6 describes the innovative services for federated interoperability.
Chapter 7 are the conclusions
Chapter 8 are the references.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
5/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 5/31
3. Interoperability Spaces
The work package 5.2 introduces the concept of Interoperability Space in the context of the
COIN project.
With the concept Interoperability Spaces we refer to a set of services whose purpose is totake into account all the possible kind of data transformation which can be applied to
documents.
Data Interoperability can be divided into two big branches:
Payload interoperability: refer to the transformations applied to the content of the
documents.
Schema interoperability: refer to the transformations applied to thestructureof the
documents.
Schema interoperability, in turn, can be divided into two braches according to the approach
that we want to follow for the transformation:
Unified approach: implies the use of a reference meta-model for managing the
transformations.
Federated approach: implies the absence of a reference meta-model for managing
the transformations.
Figure 5: Interoperability spaces structure
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
6/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 6/31
4. Innovative Services for Semantic Reconciliation
The goal of semantic interoperability is to allow the (seamless) cooperation of software
applications, which were not initially developed for this purpose, by using semantics-based
techniques. In particular, in this context, the focus is on the exchange of business documents,
bothbetween different enterprises, for instance, to automatically transfer a purchase orderfrom a client to a supplier, and
within the same enterprise, for instance, to share a certain document between differentdepartments which use different data organizations.
A relevant work on this field has been done during the Athena project, where a Semantic
Reconciliation suite for reconciliation of business documents was developed. Such a suite has
been brought into COIN as part of the Baseline Interoperability Services (D5.1.2).
Concerning the COIN innovative services for semantic reconciliation, the approach will be to
start from the Semantic Reconciliation suite, as part of the baseline services, and to improve
and enrich it with particular focus on providing a (semi)automatic support and optimize
certain steps of the reconciliation process.
4.1. Semantic Reconciliation Approach
The semantic reconciliation approach is based on the use of domain ontology as common
reference for the harmonization of heterogeneous data. This harmonization is accomplished
into two different moments: a preparation phase and a run time phase (see Error! Reference
source not found.1).
Figure 1: Semantic Reconciliation Approach
In the preparation phase, the schemas of the documents to be shared by the software
applications (say, RS1, RS2, RS3) are mapped against the reference ontology (RO). The
mapping is a two step activity which starts from the semantic annotation and ends with the
building of semantic reconciliation rules.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
7/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 7/31
The semantic annotation allows the description of the elements composing a document, in
terms of the reference ontology, by identifying conceptual correspondences between a schema
and the reference ontology. Nevertheless, semantic annotations do not contain enough
information for performing actual data reconciliation. For this reason, operational semantic
reconciliation rules are built. In particular, starting from the previous annotation, for each
document schema, a pair of reconciliation rules sets is built: a forward rule set and abackwardrule set, which allow data transformation from the original format into the ontology
representation and vice-versa, respectively. The run-time phase concerns the actual exchange
of data from an application to another. For instance, when an application, say, App 3, wants to
send a document to another application, say, App2, the reconciliation between the format of
RS3 and the format of RS2, is done by applying first the forward rules set of RS3, and after
the backward rules set of RS2.
The objective of this deliverables is to provide the final specification of the innovative
services for Semantic Reconciliation with the further aim of underlining the enhancement
with respect to the first release. In particular, we refer to the following services:
Semantic Mapping Discovery Service: The aim is to develop a powerful mapping
discovery service which can help in the identification of semantic correspondencesbetween a document schema and the reference ontology.
Semantic Reconciliation Rule Generation Service: like the semantic annotation,also the building of reconciliation rules was a manual activity. In this case, in order to
provide an automatic support, the rules generation are currently guided by the reuse of
the knowledge represented by the semantic annotation;
Source2Target Mediator Generation Service: this service allows the generation andthe publication of a specific web-service for the run-time translation of document
instances between a given source and target schemas.
4.2. Semantic Annotation
The specifications of the Semantic Annotation have received substantial changes if compared
to the previous version, for two main reasons: i) to increase the expressive power of such
expressions in order to cover more complex heterogeneities among the structures to be
matched; ii)to allow an automatic translation into abstract reconciliation rules, i.e. first order
logic rules (see section 5.4).
4..2.1. A classification of Semantic Mismatches
In the ATHENA project, a certain number of possible kinds of mismatch were identified and
classified. Such mismatches categories are recapped in the Table 2 and 3. The examples
reported in the tables are drawn from an e-procurementapplication.
Our classification follows to some extent the one in [7], but the kinds of mismatchhave been divided into two broad categories: lossless and lossy mismatches. Lossless
mismatches are cases in which annotation can fully capture the intended semantics, while
lossy mismatches represent cases where it is not possible to find a semantic annotation that
fully captures the intended semantics. Furthermore we focus only on Conceptualization
Mismatches, i.e. mismatches between two conceptualisations of the same domain, that differ
in the ontological concepts distinguished or in the way these concepts are related. We do not
address other classes of mismatch, like Explication Mismatches,related to the differences on
the way the entities are described, orLanguage Mismatches, related to the heterogeneity of
the formalisms used for the definition of the conceptualizations.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
8/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 8/31
Lossless mismatches Examples
Name Description Resource Schema Reference Ontology
Naming Different labels for the sameconcept.
Request for quotationindicated as:RFQuote.
Request for quotationindicated as:RFQ.
Abstraction Level of specialization/
refinement of the information.The same concepts are
recognized, but they are
defined at different levels of
abstraction.
A manager is anEmployee
who is the supervisor ofsome project.
The conceptManager is
recognized as aspecialization of
Employee.
Structuring The same set of concepts is
modeled, but it differ the way
these concepts are structured
by means of relations.
ADepartment is related to its
controlledProjectsand to theEmployee that is its manager.
AProjectis related to the
supervisorEmployee andto the controlling
Department .
SubClass -
Attribute
value
An attribute, with a predefined
value set, is represented by a
set of subclasses, one for each
value.
The typeRawMaterial can be
represented as an
enumeration:
(iron,copper)
or as two subclasses:
iron subClassOf
RawMaterial, copper
subClassOf RawMaterial.
Class-elation A concept is represented as a
relation.
AProductis related to a
Buyerby a relation.
The concept Sale is
related to aProductand
to aBuyer.
Attribute
Granularity
The same information is
decomposed into a different
number of attributes (or sub-
attributes)
Telephoneis represented as a
singlestring.
Telephoneis composed
by hasPartCountryCode,
hasPartAreaCode,
hasPartLocalPhoneNumb
er.
Attribute
Assignment
Two conceptualization differ in
the way they assign attributes
to concepts.
Department has the attribute
SupervisorName and controls
some project.
Project has the attribute
SupervisorName.
ComplexAttribute
A set of attributes is groupedand represented as a concept.
Name, Address andPhoneNumber are attributes
ofEmployee.
Name, Address andPhoneNumber are
grouped in the concepts
ContactDetails, related toEmployee.
Encoding Different formats of data or
units of measure.
Weightexpressed in ounces. Weightexpressed in
kilograms.
Table 2: Lossless mismatch categories
Lossy mismatches Examples
Name Description Resource Schema Reference Ontology
Overlapping There is an intersection
between the extensions of
different
concepts/attributes/roles.
Executive Manager
Subsumption There is an inclusion between
the extensions of differentconcepts/attributes/roles.
Person Employee
Categorization Two conceptualizationsdistinguish the same concept
but divide it into different and
incomparable sub-concepts.
PublicOrganization andPrivateOrganization.
ResearchOrganization andBusinessOrganization
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
9/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 9/31
Coverage Two conceptualizations do
not model all the entities or
information of a given
domain.
preferredDeliveryDate is
not considered in the RS.
preferredDeliveryDate is
present in the RO.
Precision The accuracy of information. Size of a pallet expressed
by three integer values:
height, length, width.
Size of a pallet expressed
by a constant conventional
value: (small, medium,large).
Table 3: Lossy mismatches categories
4..2.2. Mismatch Templates
In the following we adopt standard notion from First-Order Logic and Description Logic
theory [2]. Regarding the formalism supported for both the RS and RO representation, we do
not restrict ourselves to any particular ontology language in this work. Instead, we use a
generic conceptual model (CM), which contains common aspects of most semantic data
models, UML, ontology languages such as OWL, and description logics. In the sequel, we
suppose both RS and RO represented by using this generic CM. SpecificallyA, B ,C denote
atomic concepts, i.e. set of individuals; D denotes data types, e.g. String; P and Q denote
atomic roles, i.e. binary relations between individuals. U and V denote attributes, i.e. relations
between individuals and data values. Individuals are denoted by a and b, data values as d.
Concepts are organized in the familiar is-a hierarchy and can be specified disjointness
relations among them. Roles, and their inverses (which are always present), are subject to
constraints such as specification of domain and range. We shall represent a given CM using a
directed and labelled ontology graph, which has concept nodes labelled with concept names,
and edges labelled with role names. For each attribute of a concept, we create a separateattribute node. For expressive languages such as OWL, we also connect C1to C2byPif we
find (by reasoning over the ontology) an existential restriction stating that each instance of C1
is related to some (or all) instance of C2byP. General roles, denotedR, can be:atomic;
constructed as the inverse of a role, i.e.P-,
constructed as the composition of roles, i.e. that represents paths in the
ontology graph traversing the edges ,
constructed as the constrained composition of roles, i.e.
that represents the paths in the ontology graph
traversing the edges and the nodes .
General attributes, denotedZ, can be atomic or attribute chains, i.e. a composition of relations
, where is an eventually constrained composition of roles and
U is an attribute. Complex concepts, denoted by C, can be constructed as the intersection of
concepts, i.e. , or by restrictions over roles and attributes. In particular we consider:
, the set of individuals related to an individual instance of A by the general
relationR;
, the set of individuals related to a value ranging inDby the general attributeZ;
, the set of individuals related to the individual aby the general roleR;
, the set of individuals related to the value dby the general attributeZ.
Equivalence relation is denoted by , which FOL counterpart is ; subsumption (i.e.inclusion) is denoted by (resp. ), which FOL counterpart is (resp. ). Concept 2
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
10/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 10/31
Concept Mismatch Templates
Name Atomic Concept
Related M ismatches Naming
Description There is an overlapping between the instances of aAandB
Formal Notation
Example
Name Conjunctive Concept
Related M ismatches Naming, Abstraction
Description There is an overlapping between the instances of aAand the intersection of and
Formal Notation
Example
Name Role Restriction
Related M ismatches Naming, Abstraction
Description There is an overlapping between the instances of a Cand those instances ofAthat are related by the
roleRto instances ofB
Formal Notation
Example
Name Role Restriction by individual
Related M ismatches Naming, Abstraction
Description There is an overlapping between the instances of a Cand those instances ofAthat are related by the
roleRto the individul b
Formal Notation
Example
Name Attribute RestrictionRelated M ismatches Naming, Abstraction, Sub-class attribute
Description There is an overlapping between the instances of a Cand those instances ofAthat are related by the
attributeZto values of the typeD
Formal Notation
Example
Name Attribute Restriction by value
Related M ismatches Naming, Abstraction, Sub-class attribute
Description There is an overlapping between the instances of a Cand those instances ofAthat are related by the
attributeZto the value d
Formal Notation
Example
Role 2 Role Mismatch Templates
Name Atomic Role
Related M ismatches Naming
Description There is an overlapping between the instances of aPand Q
Formal Notation P
Example
Name Inverse Role
Related M ismatches Naming, Abstraction, Organization
Description There is an overlapping between the instances of aPand the inverse of Q
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
11/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 11/31
Formal Notation
Example
Name Chain Role
Related
M ismatches
Naming, Abstraction, Organization
Description
There is an overlapping between the instances of a Qand the composite role
Formal
Notation
Example
Name Constrained Chain Role
Related
M ismatch
es
Naming, Abstraction, Organization
Descripti
on
There is an overlapping between the instances of a Qand the constrained composite role
Formal
Notation
Example
Attribute 2 Attribute Mismatch Templates
Name Atomic Attribute
Related M ismatches Naming, Attribute Assignment
Description There is an overlapping between the instances of a Uand V
Formal Notation U
Example
Name Constrained Attribute
Related M ismatches Naming, Attribute assignment, Organization
Description There is an overlapping between the instances V and those instances of U defined over an instances
of the concept C
Formal Notation
C U
Example
Name Value transformation
Related M ismatches Naming, Attribute assignment, Encoding
Description There is an overlapping between the instances of a Uand V,but theirvalueshave to be transformed
by the application of a given functionf
Formal Notation
Example
Name Attribute composition/aggregation
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
12/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 12/31
Related
M ismatches
Naming, Attribute assignment, Granularity
Description A set of attributes have to be aggregated to be translated into instances of V
Formal
Notation
Example
Complex Mismatch Templates
Name Concept - Role
Related
M ismatches
Concept - Role
Description Instances of C are related by and to a pair of individuals a and b, that constitute the instances of
Q
Formal Notation
Example
Name Complex attribute
Related
Misma
tches
Complex attribute
Descri
ptionThere is an overlapping between the instances of the attributes defined over instances of and the
instances of defined over instances of related to instances of . Furthermore and are
matched to.
Forma
l
Notatio
n
Examp
le
Name Subclass by Attribute Value
Related
M ismatches
Subclass by Attribute Value
Description There is an overlap between instances of related to the value dby the attribute Uand the instances
of related to instances ofAby the roleP
Formal Notation
Example
Na
me
Attribute Assignment over chain roles
Related
M is
mat
ches
Attribute Assignment over chain roles
Des
crip
tion
There is an overlapping between the instances of the attributes and the attributes defined over
instances involved in the composite role and , respectively.
For
mal
Not
atio
n
Exampl
e
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
13/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 13/31
4..2.3. Semantic Annotation by Mismatch Templates
Given a Resource Schema (RS) and a Reference Ontology (RO) each describing a set of
entities (concepts, roles and attributes), the Semantic Annotation of RS in term of RO is a set
of relations holding, or supposed to hold, between such entities. Basically the Semantic
Annotation SemAnn(RS,RO) is an alignment [8] made up of a set of instantiations of the
templates discussed above. Such templates are grouped into four categories of annotations,namely Concept Annotations, Attribute Annotations, Path Annotations, Complex
Annotations. In the following we introduce these notions in details.
Concept Annotation (CA). A CA is a tuple , where
C1 is a concept of RS, (see atomic concept mismatch template) or a set of concepts of
RS intended as the conjunction of them (conjunctive concept mismatch templates);
C2 is a concept of RO, (see atomic concept mismatch template) or a set of concepts ofRO intended as the conjunction of them (conjunctive concept mismatch templates);
REL is the relation supposed to hold between C1 and C2. It may be ,
specifying if the mapping is unidirectional or bidirectional;
R1 (resp. R2) is a set of restrictions, each of one of the following form:
o (seeRole Restriction mismatch templates);o (seeRole Restriction by individual mismatch templates);o (seeAttribute Restriction mismatch templates);o (seeAttribute Restriction by value mismatch templates);
Attribute Annotation (AA). An AA is a tuple , where
U1 is an attribute of RS (see atomic attribute mismatch template) or a set of attributes
of RS (see attribute composition/aggregation mismatch template);
U2 is an attribute of RO (see atomic attribute mismatch template) or a set of attributes
of RO (see attribute composition/aggregation mismatch template);
FN is a function to be applied to U1 and U2 RS (see attribute
composition/aggregation and value transformation mismatch templates), e.g. SPLIT,EQ, CAST,CONVERT, COUNT;
CA is a concept annotation (optional) that can be specified to constrain the domain of
U1 and U2.
Path Annotation (PA). A PA is a tuple , where
P1 is a role of RS (see atomic role mismatch template), or the inverse of a Role (see
inverse role mismatch template), or a (constrained) composition of roles (see chain
role and constrained chain role mismatch template).
P2 is a role of RO (see atomic role mismatch template), or the inverse of a Role (seeinverse role mismatch template), or a (constrained) composition of roles (see chain
role and constrained chain role mismatch template).
REL is the relation supposed to hold between P1 and P2. It may be ,specifying if the mapping is unidirectional or bidirectional;
CA1 (resp. CA2) is a concept annotation that can be specified to constrain the domain
(resp., the range) of P1 and P2
Complex Annotations. Complex annotations are constituted by a Path Annotation and a set
of Attribute Annotations. Such expressions cover the complex attribute, subclass by attribute
value andattribute assignment over chain roles mismatch templates.
4.3. Mapping Discovery Service
The objective of this service is to provide a semi-automatic support to the discovery ofsemantic annotations of a structured business document schema (e.g., a purchase order
schema), here referred as resource schema (RS) against a reference ontology (RO). Mapping
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
14/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 14/31
discovery is a hard task since, given a fragment of the reality, there are infinite ways of
modelling it, by using different names, different relationships, different complex structures.
We assume that the observed (and modelled) reality is faithfully represented by the RO and
the RS is therefore a sub-model, in semantic terms, of the former. Then, the differences that
appear on the terminological, syntactic, and structural ground will be reconciled on the
semantic ground.In the literature the problem of mapping discovery (often referred as Ontology or Schema
Matching) has been widely addressed, however the existing proposals have a limited scope,
since they mainly address the mapping between individual elements (e.g., concepts and
relationships) and only a few address complex sub-models as a whole. Furthermore, we go
beyond the logic correspondence (e.g., subsumption), introducing a formalism to represent
mappings based on the instantiation of a set mismatch templates defining rule-based
transformations. Our ultimate goal is to discover in a (semi)automatic way the set of
operations that allow one structure to be transformed into another, without loss of semantics.
As a semi-automatic support, a final validation by a human actor will be needed.
The Mapping Discovery Service is a semi-automatic support for the definition of
Semantic Annotations. The user is involved in this task for the validation and revision of theintermediate and final proposed results. The strategy is depicted in Figure 2 and here
summarized:
1. In the first step of the matching process we consider only lexical knowledge. We startby processing the entity labels of the two graphs to build a term similarity matrix.The
similarity matrix reports a similarity value (between 0 and 1) for every couple of
elements , where Abelongs to RS and Bbelongs to RO. This is achieved by
running in parallel a string similarity algorithm and a linguistic similarity algorithm.
The former is based on the Monge-Elkan distance [1], while the latter is a slightly
modified version of the SemSim criteria [4], based on the Lin measure [3] applied to
the WordNet [5] lexical taxonomy (See section 5.3.1).
2. After the terminological analysis, relying on the similarity values computed in theprevious step, a set of evidencesis selected. An evidence is a pair of concepts (c1,c2),
where c1belongs to RS, c2belongs to RO, their similarity value is high and they are
adjacent to similar entities (See Section 5.3.2).
3. Semantic Annotation Expressions are finally build by the Mismatch Detectionalgorithm. The mismatch detection algorithm implement a search strategy for every
mismatch pattern provided as input, relying on the information collected in the
previous two steps (See Section 5.3.3).
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
15/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 15/31
Figure 2: Semantic Annotation Discovery Strategy
4..3.1. Label Similarity Lsim
In order to assign a similarity value to a pair of labels from the two graphs we combine the
results of both a string similarity measure and a linguistic similarity measure. In the former,
we consider labels as sequences of letters in an alphabet while in the latter, we consider labelsas bags of words of a natural language (English in our case) and we compute a similarity
value based on their meaning. The label similarity value Lsim between two labels is hence
obtained by taking i)the higher similarity value computed according to the two measures if is
greater than a threshold, ii)0 otherwise.
String Similarity. We experimented several string similarity measuresiand finally we selected
the Monge-Elkan distance [1], which was proposed as a field matching (i.e., record linkage)
algorithm. The Monge-Elkan distance measures the similarity of two stringsand tevaluating
recursively every substrings fromsand t; this approach is also capable to support the analysis
of abbreviations or acronyms. To improve the accuracy of the algorithm, the input strings arenormalized, i.e. characters are converted in lower case and special characters (digits,
whitespaces, apostrophes, underscores) are discarded.
Linguistic Similarity. Our approach is based on the Lin Information-theoretic similarity [3]
applied to the lexical database WordNet [5], that is particularly well suited for similarity
measures since it organizes synsets (a synset is a collection of synonyms denoting particular
meaning of a term) into hierarchies of ISA relations. Given two synsets of the WordNet
taxonomy, the Lin measure can be used to state their similarity depending on the increase of
the information content from the synsets to their common subsumersii. In order to use the Lin
measure to compare two stringssand t, we consider such strings as word entries of WordNet,
and apply the Lin measure to all the pairs of synsets related to sand t, respectively. We then
define Ssim(s,t)as the higher computed similarity value, since we expect that words used inthe RS and in the RO belong to the same domain sharing the same intended meaning.
Entity labels are considered bags of words since they are, in general, compound words
(LegalVerification, contact_details). To tokenize a string into a bag of words we realized a
label resolution algorithm looking (from right to left) for maximal substrings that have, after
stemming, an entry in WordNet; also some special characters (e.g., , _, -) are taken into
account. In this way, we can also filter the noise in the labels, deleting substrings
recognized as prepositions, conjunctions, adverbs and adjectives that are not included in the
WordNet taxonomy.
Finally we can define an algorithm to compute a linguistic similarity value given two
labels, represented as bags of strings:
begin
Double:Lmatch(bag_of_strings term1, bag_of_strings term2)minL=min(term1.lenght, term2.lenght)maxL=max(term1.lenght, term2.lenght)
while (term1andterm2notEmpty)score =maxSsim(s,t), foralls interm1andtinterm2totalscore += scoreremoves fromterm1 andt fromterm2
denum =max(minL,minL+log(maxL-minL+1))return totalscore/denum
end
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
16/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 16/31
The algorithm iteratively looks for the most similar string of term1andterm2, adding
their similarity value to the totalscore. Than it returns totalscore divided for the number of
words in the label with less number of words; if the input terms have different sizes, the
denominator is increased by a logarithmic function to reduce their similarity. For example,
given the labels contact_info and RepresentativeDetails, we see that
sim(contact,representative)=0.768 and sim(info,details)=0.746. Therefore wecalculate:
Lmatch([contact, info],[representative, details])= 1.514/(2) = 0.75
where the denominator is the number of words (after the label resolution) contained in the
shortest label.
4..3.2. Neighbour Similarity and Evidence Selection
The goal of this step is to discover the set CEof conceptevidences, i.e., pairs of concepts that
exhibit an high level of semantic similarity and that will be used as input for the mismatch
detection.
Neighbour Matching
The Neighbour Matching algorithm Nsim has been designed to overcome the limits of the
pure lexical approach followed by Lsim. Nsim compute the similarity of two concepts by
assigning a score to the similarity of their neighbours by a wedding approach. Given to
conceptAandB, Nsim is defined as follows:
Where:
NA (resp.NB) is the set of neighbour entities ofA(resp.B), i.e.o the incident roles ofA(resp.B) in the ontology graph;o the adjacent concepts ofA(resp.B) in the ontology graph;o the attributes of the adjacent concepts ofA(resp.B) in the ontology graph.
Each entity ofNAandNBcan participate in one pair exclusively.On the basis of LsimandNsim we define the following measure Sim, that takes into account
three different aspects in stating the similarity of two concepts:
1. The similarity between their labels;2. The similarity between their neighbours;3. The percentage of the similar entities among their neighbours.
Where:
NA (resp.NB) is the set of neighbour entities ofA(resp.B);
M is theset of the pairs ( considered in the computation ofNsim(A,B);
and are constants.
A first set CE is computed according to the criteria described above. Sim is computed for
every pairs of concepts belonging to RS and RO respectively, and the pairs (A,B) with
Sim(A,B)greater than a threshold are added to CE.
Taxonomic Mismatch Patterns
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
17/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 17/31
The set of concept evidences discovered in the previous step is not aware of the semantic
consistency with respect to the constraints (i.e. axioms) asserted in the two conceptualizations
to be matched. In particular there are two situations that may lead to undesired consequences:
1.Inconsistent evidences, e.g. the concept Ais matched with bothB1andB2, but B1isdeclared to be disjoint from B2. This may lead to state that some individuals of A1
are individuals of bothB1andB2.2.Cross-subsumtion evidences, e.g. the concept A1 is matched B1and the concept A2withB2,but A1subsume A2, while B2 subsume B1.This may lead to state a cyclic
inclusion amongA1, A2, B1, B2.
These situations may be in some case detected by searching for the following taxonomic
pattern, that only approximate the consistency of the evidences, trying to avoid potential
dangerous (and hence incorrect) evidences.
Sibling Concepts.This pattern require that and there exists a concept
C such that are asserted axioms. constitute the siblingconcepts set ofA.
Inconsistent Correspondences. Given a concept evidence (A,B), the set of evidences
inconsistent with it is defined as
)}()(|),{( XAAXBYYBBYAXCEYXIEAB
Cross Subsumption.Given a concept evidence (A,B), the set of cross-subsumtion evidences
of (A,B), is defined as:
)}()(|),{( XABYYBAXCEYXCCEAB
The set CE computed in the previous step is then refined by the search of taxonomic
mismatch patterns discussed above, according to the following strategy:1. Sibling Concepts sets are identified2. CE is ordered for decreasing values of similarity3. For every(A,B) in CE (iterating over decreasing values of similarity)
a. Remove from CE the setb. Remove from CE the set
4. Sibling Concepts sets are added to CEBasically we start from the evidences showing higher confidence, and we delete from CEthe
set of evidences that may conflict with them.
4..3.3. Mismatch Detection
In this step SemAnn(RS,RO)is populated by searching the following mismatch templates in
the given order.
Atomic Concepts.For every (A,B) inCE, If (A,B) is a 1:1 match in CE, is
added to SemAnn(RS,RO).
Conjunctive Concepts. For every A in RS (resp. RO), if A is matched with other concepts
in CE, and they are not included in the sibling concepts set of A, the concept
annotations .... (resp. < , _, , _>.... < ,, _, ,_>) are added to SemAnn(RS,RO).
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
18/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 18/31
Disjunctive Concepts.For every A in RS (resp. RO), if A has an associated sibling concepts
set , the concept annotations .... (resp. < ,
_, , _>.... < ,, _, , _>) are added to SemAnn(RS,RO).
Constrained Attributes.For every concept annotation ca= in SemAnn(RS,RO),
given an attribute U having domain A and an attribute V having domain B, the attributeannotation is added to SemAnn(RS,RO)if:
1. Lsim(U,V) > lth;2. There is not an attribute X having domainAsuchthatLsim(X, V)> Lsim(U,V);3. There is not an attribute Y having domainBsuchthatLsim(U, Y)> Lsim(U,V).
Constrained Roles.For every pair of roles (R1,R2)such that:
1. there is a concept annotation CA1involving the domains ofR1andR2;2. there is a concept annotation CA2involving the ranges ofR1andR2;3. LSim(R1,R2) > lth;
a path annotation is added to SemAnn(RS,RO).
Class-Attribute.This mismatch is identified if the following conditions hold:
1. (A1,A2)is involved in the concept annotation CA1;2. An attribute Uwith domain A1(resp. A2) is not involved in any attribute annotation
related to CA1;
3. There is a concept Chaving attributes V1...Vnadjacent toA1 (resp.A2) by means ofthe roleR;
4. Lsim(U,R)> lth orLsim(U,C) > lth;5. C is not matched with any concept, except forA1(resp.A2) in any concept annotation.
If this mismatch is identified, the following complex annotation is built:
(resp. ) (resp. )
If present the concept annotation between CandA1 (resp.A2)is removed.
Complex Attribute.This mismatch can be considered as more general than the class-attribute
mismatch. It is identified if the following conditions hold:
1. (A1,A2)is involved in the concept annotation CA1;2. A set of attributes U1...Un with domainA1(resp.A2) are not involved in any attribute
annotation related to CA1;
3. There is a concept Chaving attributes V1...Vnadjacent toA1 (resp.A2) by means ofthe roleR;
4. C is not matched with any concept, except forA1(resp.A2) in any concept annotation;5. Every Ui can be matched with a Vi withLsim(Ui,Vi) > lthIf this mismatch is identified, the following complex annotation is built, where every Uiand
Viparticipate only to the match with higher similarity:
(resp. ) (resp. ) .
(resp. )
If present the concept annotation between CandA1 (resp.A2)is removed.
Chain Role. This mismatch is identified if the following conditions hold:
1. A roleR of RS (resp. RO) is not involved in any path annotation;2. The domain ofR is involved in a concept annotation CA1 and the range in a concept
annotation CA2;
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
19/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 19/31
3. R1....R2 is the shortest path in the ontology graph representation of RO (resp. RS)such that the domain of R1 is involved in CA1 and the range of R2 is involved in
CA2;
If this mismatch is identified, the following complex annotation is built:
(resp.
Attribute Assignment over chain roles. This mismatch is identified if the following conditions
hold:
1. pis a path annotation previously discovered;2. U1...Un are the attributes having as domain some concepts involved in the path
(eventually composed by an atomic role) ofRSand V1...Vnare the attributes having as
domain some concepts involved in the path (eventually composed by an atomic role)
of RO;
3. U1...Un and V1...Vn are not involved in any attribute annotation;4. EveryUi can be matched with a Vi withLsim(Ui,Vi) > lth.
If this mismatch is identified, a complex annotation is built adding topan attribute annotation
of the form for every (Ui,Vi) such that every Ui and Vi participate only to the match
with higher similarity.
Other heuristics. The last step aims at enriching the 1:n concept annotation with restrictions
over roles and attributes. To this end we adopt the algorithms described in [9] to discover role
restrictionand attribute restriction templates.
4.4. Semantic Reconciliation Rule Generation Service
The objective of the Semantic Reconciliation Rule Generation service is to provide a semi-
automatic support to the definition of backward and forward reconciliation rules (i.e.,
operational mappings) starting from the previously defined declarative mappings (Semantic
Annotations). A Semantic Annotation is not able to fully represent how to actually transform
data from a format to another. Nevertheless, the knowledge carried by these declarative
mappings is extremely useful for generating actual transformation rules.
The service works according to the following steps:
Abstract rules generation. Starting from the previously defined semantic annotations,
an abstract representation of transformation rules (i.e., FOL rules) is generated,
following the FOL grounding presented for every Mismatch Templates in Section
5.2.2;
Rule validation and completion by a human user. Not all the knowledge needed for
generating a transformation rule is contained in the annotation. For instance, splittingone strings (e.g., name) into two strings (e.g., firstname and surname) needs the
specification of a separator to identify the two substrings. In this phase, the human
user operates through a graphical user interface, which has the objective to shield the
user from the complexities of a formal syntax, by showing the rules in a friendly way.
Here we intend as abstract reconciliation rules First Order Logic formulas of the form:
where and are conjunctive formulas defined:
in Forward Rules over the Alphabets of RS and RO, respectively; in Backward Rules over the Alphabets of RO and RS, respectively.
This kind of logic-based representation of schema mappings are known in literature as GLAV
mappings or TGDs [6].
),(.)),(..( wxwyxyx
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
20/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 20/31
In Figure 3 an example of forward reconciliation rule generation is shown. In the
upper part is graphically represented a Complex Semantic Annotation, representing the
instantiation of a Complex Attribute + Attribute Composition mismatch templates, together
with the corresponding abstract reconciliation rule.
Figure 3: Example of abstract reconciliation rule generation
4.5. Source2Target Mediator Generation Service
The Source2Target Mediator Generation Service allows the generation and the publication of
a specific web-service for the run-time translation of document instances between a given pair
of source and target schemas. It works according to the following steps:
A source resource schema (SRS) and a target resource (TRS) schema are provided ininput to the service, together with i) the reference ontology (RO) to be used in the
reconciliation process, ii)the abstract forward rules defined between SRS and RO, iii)
the abstract backward rules defined between TRS and RO.
An executable representation of the two set of rules is compiled. In particular abstractrules are serialized into Jena2 rules, in order to allow the execution by the
reconciliation engine SIRE based on the Transitive Rule Reasoner of the Jena2iii
toolkit.
A web-service S is automatically generated and published. This service takes as inputthe URI of an instance file conform to SRS, and returns an instance file conforms to
TRS. S represents an indirection level, wrapping a customized execution of the
reconciliation engine SIRE.
4.6. Semantic Reconciliation Suite Platform
In Error! Reference source not found.4, a functional view of the Semantic Suite is depictedand its components are here recalled:
Athos, is the ontology management system; in the reconciliation process acts as the
Ontology Repository trough the Ontology Catalog interface. Athos is an
autonomous system, provided with a Web user interface.
Semantic Annotation Tool (SAT), exposes the Annotation Definition interfacethat provides functionalities to define and edit Annotations between resources. The
Semantic Mapping Discovery Service is implemented within this module (Section 5.2
and 5.3).
Semantic Abstract Rule Builder (SARB),is the reconciliation rule building tool; itexposes the Rule Building interface that provides functionalities to define
transformation rules (backward and forward). The Semantic Reconciliation RuleGeneration Service is implemented within this module (Section 5.4).
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
21/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 21/31
Semantic Interoperability Reconciliation Engine (SIRE), is the reconciliationengine; it exposes the Rule Execution interface that provides functionalities to
perform the actual data reconciliation between resource schemas.
Semantic Interoperability Mediator Generator (SIMEG), exposes in the S2TMediator Generation interface the functionalities of the Source2Target Mediator
Generation Service (Section 5.5).Resource Repository, stores the schemas and the instances of the resources that has
to be reconciliated.
Annotation Repository and the Rule Repository store Semantic Annotations and
Transformation Rules respectively.
Reconciliation Suite Web App, is the server-side web application that provides a
unified User Interface for the services of SARB, SAM, SIRE and SIMEG.
User Web Browser, is a user web client.
Figure 4: Functional View of the Reconciliation Suite Platform
The Reconciliation Suite has being implemented as a Java application. In particular the web
application is based on the Google Web Toolkit. With respect to the first release of the
Reconciliation Suite Platform some functionalities have been added, regarding user
administration and the management of the persistent resources.
4.7. State of the Art
This section presents some existing results addressing the problem of interoperability among
software applications. The section is divided into three sub-sections, which present standards
for document exchange, semantics-based platforms for document reconciliation, and methods
for mapping discovery, respectively.
International standards for business documents exchangeUniversal Business Language (UBL)
iv is a library of standard electronic XMLbusiness
documents such aspurchase ordersandinvoices.UBL was developed by anOASISTechnical
Committee with participation from a variety of industry data standards organizations. The
UBL 2.0 Standard includes 31 documents in total, roughly grouped into the following
categories: Presale Ordering, Delivery, Invoicing, and Payment. The Core Components
Technical Specificationvdefines meta models and rules necessary for describing the structure
and contents of conceptual and physical/logical data models, process models, and information
http://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Purchase_orderhttp://en.wikipedia.org/wiki/Purchase_orderhttp://en.wikipedia.org/wiki/Purchase_orderhttp://en.wikipedia.org/wiki/Invoicehttp://en.wikipedia.org/wiki/Invoicehttp://en.wikipedia.org/wiki/Invoicehttp://en.wikipedia.org/wiki/OASIS_(organization)http://en.wikipedia.org/wiki/OASIS_(organization)http://en.wikipedia.org/wiki/OASIS_(organization)http://en.wikipedia.org/wiki/OASIS_(organization)http://en.wikipedia.org/wiki/Invoicehttp://en.wikipedia.org/wiki/Purchase_orderhttp://en.wikipedia.org/wiki/XML -
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
22/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 22/31
exchange models. Therefore, CCTS describes an approach for developing a common set of
semantic building blocks that represent the general types of business data in use today. This
approach provides for the creation of new business vocabularies as well as restructuring of
existing business vocabularies to achieve semantic interoperability of data.
Open Financial Exchange (OFX)vi
is a unified specification for the electronic
exchange of financial data between financial institutions, businesses and consumers via theInternet. In particular, it defines the request and response messages used by each financial
service as well as the common framework and infrastructure to support the communication of
those messages.
The e-GIFvii
defines the technical policies and specifications governing information
flows across government and the public sector. They cover interconnectivity, data integration,
e-services access and content management. The e-GIF is presented as a set of policies,
technical standards, and guidelines, which cover ways to achieve interoperability of public
sector data and information resources, information and communications technology (ICT),
and electronic business processes. The aim is to enable any agency to join its information,
ICT or processes with those of any other agency using a predetermined framework based on
open (i.e. non-proprietary) international standards.The adoption of standards to face the problems of document exchange implies a strong
effort in the refactoring of legacy software applications, which is exactly what the semantic
reconciliation suite here described wants to avoid. However, the existence of standards is very
relevant, because they represent an important result in terms of description and organization
of business documents. As such, they are a crucial resource in the construction of the
reference ontology that is at the basis of the usage of the semantic reconciliation suite. For
instance, in the ATHENA project, the e-procurement ontology, concerning purchase order
and invoice, has been built considering some standards like UBL and RosettaNetviii
.
Semantic reconciliation platforms
AMEF, the ARTEMIS Message Exchange Framework [11] for document reconciliation is the
result of the ARTEMIS projectix. It allows the mediation of two OWL ontologies whichrepresent the schemas of the documents to be reconciled. For this reason, the schemas of the
documents to be reconciled are previously transformed into OWLx by using a lift and
normalization process. The semantic mediation is realized in two phases: (i) Message
Ontology Mapping Process, where the two ontologies are mapped one to another, in order to
build Mapping definitions (i.e., transformation rules), with the support of the OWLmt
ontology mapping tool; (ii) Message Instance Mapping, where XML instances are first
transformed into OWL instances, and then Mapping definitions are applied to transform
messages from the original to the destination format.
The MAFRA (MApping FRAmework) [21] is a framework for mapping distributed
ontologies. It is based on the definition of Semantic Bridges as instances of a SemanticBridging ontology which represents the types of allowed bridges. Such bridges represent
transformation rules. Also in this case a lift and normalization activity is performed to
transform the original documents (schemas and data) into the ontology format. Afterwards,
considering the transformed schemas, the Semantic Bridges between the two parties are
created on the basis of the Semantic Bridge Ontology (SBO). With respect to the automatic
support provided by the platform, AMEF does not provide any facility: mapping and rules
have to be created manually. Concerning the MAFRA platform, a very limited automatic
support is provided. Conversely, the main goal of our work is to provide an effective
automatic support to those activities that are error-prone and time consuming (i.e., semantic
annotation and transformation rules building).
The Web Service Modeling Toolkit (WSMT) [18] is an integrated developmentenvironment for Semantic Web Services that enables developers to develop Ontologies, Web
Services, Goals and Mediators through the Web Service Modeling Ontology (WSMO)
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
23/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 23/31
formalism. The WSMT is implemented as a collection of plug-ins for the Eclipse framework
that cover several areas of functionality; among them the Mapping Perspective [24] is a tool
for defining mappings between ontologies. Mappings are defined trough a formal model,
linked to a logic-based Abstract Mapping Language that does not commit to any existing
ontology representation language. Such mappings are then grounded to a concrete and
executable representation language, the WSML-Rule language designed for instancetransformation. In this approach operational mapping rules are seen as a set of WSML axioms
that are evaluated by a WSML reasoner. WSMT offers also an automatic support for mapping
discovery, accomplished by using a set of suggestion algorithms for both lexical and
structural analysis of the concepts. Concerning methodological aspects of the resource
mapping process, WSMT shares some analogies with the semantic reconciliation suite
approach. Anyway the WSMT logical architecture is designed to create mediation mappings
between a service requester and a service provider that use different conceptual models
(ontologies) to describe the same domain. On the other hand, the purpose of the semantic
suite is to allow interoperability within a network of software applications through the
adoption of a common and shared conceptualization of the domain (the reference ontology)
that provide a common view over the heterogeneous data sources . Furthermore, the WSMT isreleased as an Eclipse plug in which means a standalone application. On the contrary, the
semantic reconciliation suite is being implemented as a web application and consequently in a
more service oriented logics.
The Interoperability Service Utility (ISU) [16] developed within the scope of the
iSURF projectxi
provides interoperability between different UN/CEFACT CCTS based
document standards (i.e., OAGIS, UBL, GS1). The proposed approach is centred on the
notion of Harmonized Ontology that contains two types of OWL-DL ontologies: (1) the
Upper Ontology that describes the CCTS artefacts, as generic classes; (2) the Document
Schema Ontologies that describe the actual document artefacts for each electronic business
document standard as subclasses of the classes in the upper ontology. A Description Logic
Reasoner and a Rule Reasoner are used to identify the equivalence and subsumption relationsin the Harmonized Ontology. The discovered similarities among the document artefacts are
then used to generate XSLT definitions for xml instance translation. The overall
methodological framework is very relevant to the proposed reconciliation suite; however the
ISU focus on the integration of xml data defined with respect to CCTS-based standards, while
we intend to be more general aiming at the integration of heterogeneous data without any
assumption regarding the semantics and the structure of the resource schemata. Furthermore
the definition of the relations (DL-axioms or logic rules) between artefacts belonging to a
Document Schema Ontology and to the Upper Ontology is mainly a manual activity, while
our aim is to provide an effective support to these activity, that is basically a mapping
discovery task.Mapping discovery methodsIn the recent period, the automation of the mapping discovery (i.e., finding correspondences
or relationship) between entities of different schemas (or models) has attracted much attention
in both the database (schema matching) and AI (ontology alignment) communities (see
[17,26] for a survey). Schema matching and ontology alignment use a plethora of techniques
to semi-automatically finding semantic matches. These techniques are based on different
information sources that can be classified into: intensional knowledge(entities and associated
textual information, schema structure and constraints), extensional knowledge (content and
meaning of instances) and external knowledge (thesauri, ontologies, corpora of documents,
user input). We can divide the methods/algorithms used in existing matching solutions in:
rule-based methods, where several heuristics are used to exploit intensionalknowledge, e.g. Prompt [25], Cupid [20];
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
24/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 24/31
graph analysis, where ontologies are treated as graphs and the corresponding sub-graphs are compared, e.g. Similarity flooding (Melnik, Garcia-Molina, & Rahm,
2002), Anchor-prompt [25];
machine learning based on statistics of data content, e.g. GLUE [13];
probabilistic approaches that combine results produced by other heuristics, e.g.
OMEN [23].Complex approaches, obtained by the combination of the above techniques, have been
also proposed (e.g., OLA [14]), as well as frameworks that provide extensible libraries of
matching algorithms and an infrastructure for the management of mappings (e.g., COMA
[12]). At the best of our knowledge, most of the previous matching approaches focus on
finding a set of correspondences (typically 1:1) between elements of the input schemas,
enriched eventually by some kinds of relationship (equivalence, subsumption, intersection,
disjointness, part-of, merge/split).
The construction of operational mapping rules (i.e., directly usable for integration and
data transformation tasks) from such kind of correspondences is another challenging aspect.
Clio [15], taking in input n:m entity correspondences together with constraints coming from
the input schemas (relational schema or XSD), produces a set of logical mappings with formalsemantics that can be serialized into different query languages (e.g., SQL, XSLT, XQuery).
MapOnto [10] can be viewed as an extension of Clio when the target schema is an ontology.
Most previous mapping constructors concentrate on creating executable mappings rules
between particular data-models; data sources, however, are of many different data models,
(e.g., XML, RDF, Relational, OWL). In our framework we allow general and rich
relationships (Semantic Annotations) that allow the mapping between a wide variety of data
models. Such general declarative mappings can then be used for the construction of model-
dependent operational mapping rules (e.g. conjunctive queries, SQL views, XSLT
transformations).
The main difference with respect to ontology matching and mapping construction
approaches present in literature is that these two steps are not seen as two separated tasks to
be executed in sequence. On the contrary in the proposed mapping discovery algorithm, the
matching phase is driven by the search of templates that can be seen as abstraction of
mapping rules (and hence a mapping construction task). The output of the mapping discovery
service is then a declarative mapping, closed to the output of an ontology matching algorithm,
but capable of capturing complex correspondences that are directly interpreted as complex
mapping rules, to be used in a mediation task.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
25/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 25/31
5. Data Payload Interoperability Service
The specifications of theData Payload Interoperability Service have received substantial
changes if compared to the previous version.After the evaluation of the first set of services, described in deliverable D5.2.1a the majority
of the comments were that the services were too collaboration oriented.
Moreover in the set of services dealing with Data Interoperability there was a lack of tools for
managing the document payload transformations (those transformations that act on the
content of the document, rather than the format).
For these reason we decided to change the specification and to develop a service whose focus
was the content transformationof business documents.
5.1. Updated requirements
The new service gets a very specific requirement which is: automation.
The service needs to be extremely automated in its procedures.
After a needed step of setup to define the environment of the user, the service should be able
to apply such environment to the submitted document and get the results.
In the following table are summarized other technical requirements expressed for the Data
Payload Interoperability Service:
ID Name of the feature Description of the feature
REQ 1 Creation of negotiation The service should allow the possibility to
create a new negotiation, by providing at
least name and description.REQ 2 1:1 and 1:N negotiation
scenarios
The service should provide the possibility to
select multiple users for the negotiation, in
order to enable the 1:N scenario.
REQ 3 Creation of business rule The service should provide the possibility to
create business rules.
REQ 4 Management of business rules The service should allow managing the
business rules: look at the content and delete
them.
REQ 5 Definition of user roles The service should give the possibility to
define the rules for different roles, depending
if the user is the creator of the negotiation or
just a participant
REQ 6 Application of rules The service should give the possibility to
select which rules to apply to specific
negotiations and see the results.
5.2. Rules application
The most efficient way to deal with the automation requirement is the application of business
rules.
Business rules are expression in the form IF-THEN-ELSE, which allow to define thebehaviour of the software when some conditions are (or not) verified.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
26/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 26/31
Such rules are executed by an engine rule and can be written in a structured language which
can be standard (like XML) or proprietary of the engine.
The syntax of the rule is always proprietary, since every engine understands only its language.
There are a lot of business rules engine available in the net, here are some example of those
evaluated:
Drools -http://drools.org/OpenRuleshttp://openrules.com
Mandaraxhttp://mandarax.sourceforge.net/
SweetRuleshttp://sweetrules.projects.semwebcentral.org
TermWarehttp://www.gradsoft.ua/products/termware_eng.html
JRuleEnginehttp://jruleengine.sourceforge.net/
JLisahttp://jlisa.sourceforge.net/
JEOPShttp://sourceforge.net/projects/jeops/
Provahttp://comas.soi.city.ac.uk/prova
Open Lexiconhttp://openlexicon.org
Zilonishttp://www.zilonis.org
Hammurapihttp://www.hammurapi.biz
For the development of the Data Payload Interoperability Service we decided to use
JRuleEngine.
5.3. JRuleEngine
JRuleEngine (http://jruleengine.sourceforge.net/)JRuleEngine is java rule engine, based on
Java Specification Request (JSR) 94.
It has been selected among the other engine candidates because of its simplicity to use and
fact that the rules can be written in a very simple format XML based.
Another very useful functionality is the possibility to wrap the execution of the rules on some
methods defined at code level.
This offers the possibility to improve the behavior of the rules by adding some logic on the
execution of the IF-THEN statements.
The code below represents an example of definition of a rule in JRuleEngine language
RuleExecutionSet1Rule Execution Set
http://drools.org/http://drools.org/http://drools.org/http://openrules.com/http://openrules.com/http://openrules.com/http://mandarax.sourceforge.net/http://mandarax.sourceforge.net/http://mandarax.sourceforge.net/http://sweetrules.projects.semwebcentral.org/http://sweetrules.projects.semwebcentral.org/http://sweetrules.projects.semwebcentral.org/http://www.gradsoft.ua/products/termware_eng.htmlhttp://www.gradsoft.ua/products/termware_eng.htmlhttp://www.gradsoft.ua/products/termware_eng.htmlhttp://jruleengine.sourceforge.net/http://jruleengine.sourceforge.net/http://jruleengine.sourceforge.net/http://jlisa.sourceforge.net/http://jlisa.sourceforge.net/http://jlisa.sourceforge.net/http://sourceforge.net/projects/jeops/http://sourceforge.net/projects/jeops/http://sourceforge.net/projects/jeops/http://comas.soi.city.ac.uk/provahttp://comas.soi.city.ac.uk/provahttp://comas.soi.city.ac.uk/provahttp://openlexicon.org/http://openlexicon.org/http://openlexicon.org/http://www.zilonis.org/http://www.zilonis.org/http://www.zilonis.org/http://www.hammurapi.biz/http://www.hammurapi.biz/http://www.hammurapi.biz/http://jruleengine.sourceforge.net/http://jruleengine.sourceforge.net/http://jruleengine.sourceforge.net/http://jruleengine.sourceforge.net/http://www.hammurapi.biz/http://www.zilonis.org/http://openlexicon.org/http://comas.soi.city.ac.uk/provahttp://sourceforge.net/projects/jeops/http://jlisa.sourceforge.net/http://jruleengine.sourceforge.net/http://www.gradsoft.ua/products/termware_eng.htmlhttp://sweetrules.projects.semwebcentral.org/http://mandarax.sourceforge.net/http://openrules.com/http://drools.org/ -
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
27/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 27/31
The engine allows the possibility to define multiple IF statements (connected with AND
logical operator) and multiple THEN statements.
5.4. The negotiation process
The Data Payload Interoperability Services core is represented by the creation andmanagement of business rules to enable the negotiation of business documents in 1:1 and 1:N
scenarios.
The documents used within the service are UBL orders.
The service gives the users the possibility to define business rules and select which ones to
apply to specific negotiations.
The service is composed of three main parts:
Negotiation creation: it allows the creation of new negotiations, the user can specify
name and description of the negotiation and select the participants (one or many)
allowed to participate in the negotiation.
Rules creation: it allows the creation and management of business rules.
Rules are created with a specific role (sender and\or receiver) according to theforeseen use of the rule.
Sender (the person who creates the negotiation) rules are used when the negotiation is
created by the current user, while Receiver (the participant to the negotiation, selected
by the sender) rules are used when the current user is participating in a negotiation he
doesnt own.
Rules application: this is the core part of the service. It allows the selection of the
specific negotiation to evaluate and the selection of which rules to apply to the
negotiation.
The service decomposes the 1:N negotiation into several 1:1 negotiation for better
management inside of the service logic.
When selecting the negotiation to manage, the system automatically applies all the rules of
the participant user, without giving visibility of this to the current user.
This means, for example, that if person X is the creator of a negotiation and person X is
currently evaluating one, the service will automatically apply all the rules defined by the
participant for that negotiation.
Since in the rules application system the order of the rules is important because the
application of one rule can change the content for the next one, the logic of the service is to
apply BEFORE the rules of the participants, and then the rules of the current user.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
28/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 28/31
6. Innovative Services for Federated Interoperability
The federated approach is characterized by the absence of any reference meta-models which
can be used to reconciliate documents from one format to another.The Federated Interoperability service works on the structure of UBL documents, and gives to
the user the possibility to choose how to perform transformation and reconciliation of
different formats.
6.1. The COIN approach
The approach adopted in COIN is the composition of micro-services.
The UBL document is analyzed at schema level and decomposed into several parts, each one
representing a single main node of the document.
For each part several possibilities of transformations exists, according to the target document
format or even just uploaded by end users.
For this version of the service the target domain will be the transformation of the Swedish
invoice (UBL 1.0) to the Turkish invoice (UBL 2.0).
The system will offers a default set of micro services which perform transformations based on
XSLT, but the user has the possibility to upload different ones according to his specific needs.
The users can then select and combine different micro services to get a more complex and
complete transformation.
6.2. Requested featuresIn this chapter are summarized the features required to the service.
Selection of document formats: the user must be able to select which kind ofdocuments he wants to apply the transformation to.
Presentation of single parts: the service must be able to decompose the structure ofthe selected document into smaller parts independent one from the others.
Default transformation: the system must provide a basic way of performing
transformation.
Manual transformation: the system must give the user the possibility to upload
private transformation for the selected parts.
Testing: each transformation (default or private) must be testable by the user.Composition of micro-services: the user must be able to select different parts and use
a set of micro-services to compose a more complex and complete transformation.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
29/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 29/31
7. Conclusions
The WP5.2 introduces the concept ofInteroperability Space.
This is a set of services whose purpose is to take into account all the possible kind of data
transformation which can be applied to documents.
Data Interoperability can be divided into two big branches:Payload interoperability: refer to the transformations applied to the content of thedocuments.
Schema interoperability: refer to the transformations applied to thestructureof the
documents.
Schema interoperability, in turn, can be divided into two braches according to the approach
that we want to follow for the transformation:
Unified approach: implies the use of a reference meta-model for managing thetransformations.
Federated approach: implies the absence of a reference meta-model for managingthe transformations.
For each of the three main groups (payload, unified and federated) WP 5.2 has developed a
set of services.
The Innovative Services for Semantic Reconciliation group in a unified environment a set
of functionalities to provide an effective automatic support to the definition and execution of
expressive mappings between heterogeneous resources, with the aim of providing a
reconciliation framework for eBusiness resources exchange.
The Data Payload Interoperability Serviceworks on the content of the documents and it isused in the negotiation process of UBL orders.
The InnovativeServices for Federated Interoperabilityworks on the structure of the
documents and proposes an approach where no reference meta-models are available, but
instead the users are free to provide personal transformations.
-
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
30/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
COIN Consortium Dissemination: Public 30/31
8. References
1. The field matching problem: Algorithms and applications. Monge, A. and Elkan, C. 1996. InProceedings of the Second International Conference on Knowledge Discovery and Data Mining.
pp. 267--270.
2. D. Nardi, R. J. Brachman. An Introduction to Description Logics. In the Description Logic
Handbook, edited by F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider,Cambridge University Press, 2002, pages 5-44.3. An information-theoretic definition of similarity. Lin, D. 1998. In Proc. 15th
InternationalConference of Machine Learning (ICML). pp. 296-304.4. Formica A., Missikoff M., Pourabbas E., Taglino F.. Weighted Ontology for Semantic Search. R.
Meersman and Z. Tari (Eds.): OTM 2008, Part II, LNCS 5332, pp. 12891303, 2008.
5. Fellbaum, C. WordNet An Electronic Lexical Database.The MIT Press, 1998.6. Schema mappings, data exchange, and metadata management. Kolaitis, P. G. 2005. PODS. pp. 61-
-75.7. P. R. S. Visser, D. M. Jones, T. J. M. Bench-Capon, M. J. R. Shave, An analysis of ontological
mismatches: Heterogeneity versus interoperability, in: AAAI 1997 Spring Symposium on
Ontological Engineering, Stanford, USA, 1997.
8. Euzenat, J., Shvaiko, P.: Ontology Matching. 2007.9. Ritze, D., Meilicke, C., Zamal, O., Stuckenschmidt, H.: A pattern-based ontology matching
approach for detecting complex correspondence. Chantilly 25.10.2009. In: Ontology Matching
2009. CEUR-WS, 2009.
10.An, Y., Borgida, A., & Mylopoulus, J. (2005). Constructing Complex Semantic MappingsBetween XML Data and Ontologies. International Semantic Web Conference, (pp. 6-20). Galway,
IE.11.Bicer, V., Laleci, G., Dogac, A., & Kabak, Y. (2005). Providing Semantic Interoperability in the
Healthcare Domain through Ontology Mapping. eChallenges Conference. Ljubljana, Slovenia.
12.Do, H., & Rahm, E. (2002). COMA - A system for flexible combination of Schema MatchingApproaches. Very Large Databases, (p. 610-621). Hong Kong, CN.
13.Doan, A., Madhavan, J., Domingos, P., & Halevy, A. (2003). Ontology Matching: A MachineLearning Approach. In S. Staab, & R. Studer (Ed.), Handbook on ontologies (p. 397-416). Berlin,
DE: Springer.14.Euzenat, J., & Valtchev, P. (2004). Similarity-based ontology alignment in OWL-Lite. European
Conference on Artificial Intelligence , (p. 333-337). Valencia, ES.
15.Haas, L. M., Hernandez, M. A., Ho, H., Popa, L., & Roth, M. (2005). Clio grows up: fromresearch prototype to industrial tool. In Proceedings of the international conference on
Management of data, (p. 805-810). Baltimore, US.16.Kabak, Y., Dogac, A., Ocalan, C., Cimen, S., & Laleci, G. B. (2009). iSurf Semantic
Interoperability Service Utility for Collaborative Planning, Forecasting and Replenishment .
eChallenges Conference. Instanbul, Turkey.
17.Kalfoglou, Y., & Schorlemmer, M. (2003). Ontology mapping: the state of the art. The
Knowledge Engineering Review , 18 (1), 1-31.18.Kerrigan, M., Mocan, A., Tanler, M., & Fensel, D. (2007). The Web Service Modeling Toolkit -
An Integrated Development Environment for Semantic Web Services. European Semantic WebConference. Innsbruck, AU.
19.Kolaitis, P. G. (2005). Schema mappings, data exchange, and metadata management. Principles ofDatabase Systems, (p. 61-75). Baltimore, US.
20.Madhavan, J., Bernstein, P. A., & Rahm, E. (2001). Generic schema matching with Cupid. VeryLarge Data Bases, (p. 49-58). Roma, IT.
21.Maedche, A., Motik, B., Silva, N., & Volz, R. (2002). MAFRA - a mapping framework fordistributed ontologies. International Conference on Knowledge Engineering and Knowledge
Management, (p. 235-250). Siguenza, ES.
22.Melnik, S., Garcia-Molina, H., & Rahm, E. (2002). Similarity Flooding: A Versatile Graph
Matching Algorithm and its Application to Schema Matching. International Conference on DataEngineering, (p. 117-128). San Jose, CA US.
http://www.inf.unibz.it/~franconi/dl/course/dlhb/dlhb-01.pdfhttp://www.inf.unibz.it/~franconi/dl/course/dlhb/dlhb-01.pdf -
8/12/2019 COIN D5.2.1b-Information Interoperability Services Specifications M29 V1.0.pdf
31/31
COIN- Collaboration & Interoperability for Networked EnterprisesProject
N.216256
Deliverable D5.2.1b Information Interoperability Services Specifications M29 issue
Date 30/06/2010
23.Mitra, P., Noy, N. F., & Jaiswal, A. R. (2005). Ontology Mapping Discovery with Uncertainty.International Conference on the Semantic Web (ISWC), (p. 537-547). Galway, IE.
24.Mocan, A., & Cimpian, E. (2007). An Ontology-b
top related