the creation and use of ontologies ... - information...
TRANSCRIPT
1
THE CREATION AND USE OF ONTOLOGIES FOR COMPUTERISED INFORMATION SYSTEMS:
A CRITICAL EXPLORATION
A study submitted in partial fulfilment of the requirements for the degree of
Master of Science in Information Management
at
THE UNIVERSITY OF SHEFFIELD
by
BRYN LOBAN
September 2003
1
2
TABLE OF CONTENTS
1. The Tower of Babel: What is an Ontology? 1
1.1. A Parasitical Relationship: Legacy Systems 5
1.2. Ontological Form: Expression 6
1.3. Current Ontological Projects 9
2. The Babel Syndrome: Why use Ontologies? 11
2.1. Human Interoperability: Semantic Integration 14
2.1.1. The Promise of Better Information Management 17
2.2. Machine Interoperability: Semantic Intelligence 19
2.2.1. The Semantic Web 21
2.2.2. The Formal Aspect 25
3. Building the Tower of Babel: Ontological Engineering 27
3.1. Engineering of Ontologies: General Guidelines 28
3.1.1. Gathering and Identification of the Raw Material 29
3.1.2. Shared Conceptualisation 30
3.1.3. Conceptual Scope: Multiplicity and Singularity 35
3.2. Methodology: Representational Ontology Languages 40
3.2.1. Representational Ontology Languages 42
3.2.1.1. DAML+OIL and OWL 42
4. Conclusion: Towards Babel: Will Ontologies Work? 45
4.1 Mapping Complexity: the Need for Tools 50
4.2. Ontological Building Environment 54
4.3. OntoP2P: Consensus-Building Environment 58
2
3
Abstract: The dissertation undertakes a critical exploration of the role, creation
and use of ontologies for Computerised Information Systems (CIS) and seeks to
provide a framework for the issues relevant to an ontological conception of CIS.
Based on the data obtained from the main research projects currently being carried
out, as portrayed in the literature, this dissertation will critically explore ontologies
for CIS, for which research and practice on a major scale is still in its infancy. No
significant comparative data on either the usability or the construction aspects of
ontologies for CIS is yet available. Bearing these points in mind, the investigation
will be based on a critical analysis of various international projects and research
groups. Ultimately, the critical exploration will focus on mapping the topology of
current research and drawing conclusions from an analysis of the data obtained, in
order to establish the nature and usability of ontologies for CIS. The methodology
used is inductive, i.e. reasoning from particular evidence arising from case
situations, as reported in the literature, and drawing general conclusions based on a
systematic analysis of the evidence. The dissertation starts with a definition of the
subject and thus provides a detailed outline of the nature of an ontological approach
to CIS. Then an in-depth exploration is carried out of the reasons why it is
considered necessary to develop and construct ontologies, supplemented with a
detailed discussion of the problematic which the ontological approach is attempting
to overcome. In addition, a systematic analysis is made of how the ontological
approach is being built and achieved, both conceptually and in practice. The main
critical conclusions will be drawn in the last section as the various aspects of
ontological expression defined are interdependent. Several conclusions will be
drawn by identifying the problems encountered and, by implication, deducing
potential solutions for a way forward that may be of interest for future development
and research.
3
4
1. The Tower of Babel: What is an Ontology?
The term “Ontology” was developed in the field of “artificial intelligence”,
which took the concept from philosophy and applied it to computerised or robotic
systems. Subsequently, the rise of highly powered, networked and online data
repositories has, in recent years, prompted increased research interest in “ontology”
applied to the more circumscribed world of “computerised information
systems”(CIS) (Guarino, 1998).
On the most basic and philosophical level, ontologies are nothing new. Any
system, the function of which is to classify and manage information, by describing
and thereby representing data as an “abstract” model expression, can be viewed as an
“ontology”.
As such, the major “ontologies” in use pre-date the present use of the concept
of ontology. There are many “classification” systems in use, without which it would
be impossible to manage, search and retrieve information residing in various data
repositories. These information management systems are expressed mostly in the
form of thesauri, taxonomies, controlled vocabularies and directories.
These so-called established “ontologies” (what would now be called
“lightweight ontologies”) range from comprehensive library systems, like the
“Dewey Decimal Classification” system, the Yahoo and DMOZ Web directories and
schemas for databases, to current metadata systems like “Dublin Core” and others,
which manage information online as well as offline.
4
5
A clear, unequivocal distinction of what an ontology is, and how it differs
from other information management systems, is not easy to define. This is due to the
fact that, in practice, the expression of an ontology (i.e. the form) is not so different
from the traditional information management forms mentioned above.
However, ontology, in the ambit of CIS, has recently (since the early 1990’s)
taken on a new and more distinct definition and role. This shift of emphasis has
taken place due to the increased accessibility and speed of retrieval brought about by
new information technologies, such as the WWW, which have simultaneously
brought the “information overload” problem to critical levels.
As Guarino (1998) expresses it, an ontology:
“… in the simplest case…describes a hierarchy of concepts
related by subsumption relationships; in more sophisticated
cases, suitable axioms are added in order to express other
relationships between concepts and to constrain their intended
interpretation.”
It is the latter, more sophisticated, case that is presently used as being a more
appropriate definition of the theory and practice of ontology and which distinguishes
it from other “classifying” systems in existence. Ontology enhances traditional
thesauri and information management systems, by trying to develop a deeper
semantics within “digital objects”, both conceptually and relationally.
The current interest in ontology lies in its potential to describe or represent
more of the “semantics” of a field of knowledge (or “metadata” that explicitly
represents the “semantics” of a data domain) in both a human-understandable way
(by establishing consistency and consensus) and, more importantly, in a computer-
processible way. It is this latter sphere which is one of the more ambitious and
critical aspects of ontological practice for CIS.
5
6
The most widely used definition of ontologies for “Computerised Information
Systems” is given by Gruber (1993:199): “An ontology is a formal, explicit
specification of a shared conceptualization” .
Conceptualization means an abstract model of data in a CIS “world”, which
identifies the relevant and, therefore, main concepts of the specific information
domain. Formal refers to the fact that the ontology must be machine-understandable
(i.e. the data has to be structured in a “logical content” way). Explicit means that the
type of concepts used and the constraints on their use are explicitly or axiomatically
defined. Finally, Shared signifies that an ontology is about consensual “knowledge
representation”, which is not restricted to an individual but must be accepted by a
group of “agents”.
Another similar definition of an ontology is: “a shared formalization of a
conceptualization of a domain” (Grüninger and Usehold, 1996:111). A domain
refers to a specific subject area of human knowledge/information, such as medicine,
real estate, finance, business, etc.
An ontology is a classification methodology for formalising a knowledge
domain in a structured way. The true core of an ontology revolves around the theory
and practice of the “shared conceptualization”: a “neutral” description or theory of a
given domain which is acceptable to, and can be reused by, all information gatherers
in a particular domain. An ontology is designed with the aim of identifying and
establishing consensus through “semantic commonalities” within a circumscribed
knowledge domain.
In other words, a definition of an ontology as a “shared conceptualisation” is
an extreme, or extended, form of the information management practices previously in
use. The major distinction, however, is that an ontology seeks to explicitly
represent/model any domain of knowledge contained in CIS in a far more complex
6
7
manner by capturing more of the meaning/semantics than has hitherto been achieved.
An ontology is engineered to permit a higher expressive structure or “knowledge
representation”, i.e. to have the ability to express relationships between data entities,
as well as the entities/concepts themselves.
An ontology’s “shared conceptualisation” is an “engineering artefact”, which
is basically: “constituted by a specific vocabulary used to describe a certain reality,
plus a set of explicit assumptions regarding the intended meaning of the vocabulary
words” (Guarino, 1998). It is not enough for a “real” ontology to describe
data/content with metadata tags and controlled vocabularies, etc. An ontology also
needs to represent the background or context and, therefore, the relations between the
data units (i.e. the “semantics” involved). Furthermore, this needs to be expressed in
a formally explicit manner in order to be readable by machines.
Ultimately, if an ontology is going to be an improvement over traditional
information management systems, for humans and, especially, for machine
processing, it must capture the intended meaning of concepts and statements in any
domain of interest. To achieve this an ontology must aim to bring out the semantic
commonalities within extensive bodies of detailed and specialised knowledge.
Furthermore, it must be able to express/represent some degree of tacit or “meta-level
knowledge” (i.e. implicit and background knowledge).
A simple example serves to illustrate this. In Newton's law of physics, the
meaning of the expression “F=ma” states that force is the product of mass and
acceleration. However, this association can only be made by representing a web of
background/context knowledge. The contextual semantics needs to be represented
explicitly when engineering the ontology: it is a mathematical expression with its
related concepts of equations, parameters and variables, which in addition invokes a
significant body of expert knowledge in terms of the concept called ‘‘mass’’; or that
the formula “F=ma”, in this context, is not the same as the expression which means
electrical voltage (Ohm’s law), and so on, along the semantic chain.
7
8
The knowledge that is required to understand Newton’s law, which “means”
and describes the acceleration of an object under the influence of a force, is not
obvious, either for a computer for which it merely represents a string of characters, or
for humans who need to explicitly represent the main conceptuality of the physics
domain, to which they, as a group, belong and on which they have to reach
consensus.
Therefore, an ontology of the Physics domain would need to represent, in an
explicit manner, a selection of the “principal” concepts and semantic background
underlying the system of Physics: for instance, the understanding that a formula can
be a mathematical description of a process or that mathematical variables represent
certain quantities related to Physics.
Basically, an ontology is designed to be a reusable construct or information
management system, which represents the “common” semantic assumptions
underlying a “knowledge base” within the same domain and/or sub-domains.
1.1 A Parasitical Relationship: Legacy Systems
Another way to define an ontology is that, in practice, it is rarely built from
“free”, non-structured documentation. Most current CIS retrieve and manage
information with the aid of taxonomies, thesauri and controlled vocabularies, etc.
Currently the main role and function of ontologies is to synthesise/integrate these
various information management structures. In essence, the true nature of an
ontology is to operate at this higher meta-level, due to the fact that it homogenises
and/or merges pre-existing knowledge bases.
8
9
An ontology acts as a “neutral” meta-level description/representation of a
knowledge base, by engineering shared/common assumptions underlying that base.
By operating on this high meta-level, an ontology should provide a more synthetic,
“neutral description” or theory of a given knowledge domain, which can thus be
accepted and reused by all the information gatherers using that particular domain.
1.2 Ontological Form: Expression
In theory, an ontology should represent, to the extent possible, the semantic
complexity of human knowledge. However, in practice, this is extremely difficult to
achieve. Gruninger and Usehold (1996) describe the various “forms” of ontologies
across a spectrum, from more informal (expressed in natural language vocabularies
with the relations between them) at one extreme, to rigorously formal (refined terms
with theorems and axioms, etc.) at the other extreme, and various combinations in
between.
Usehold has a useful definition of what form or expression an ontology may
take in practice:
[an ontology] “may take a variety of forms, but necessarily it will
include a vocabulary of terms, and some specification of their
meaning. This includes definitions and an indication of how
concepts are inter-related which collectively impose a structure
on the domain and constrain the possible interpretations of
terms.” (Usehold et al, 1998:32)
9
10
An ontology, like any other “taxonomic” structure in use, may be expressed
as an interface mapping out concepts, instances and the relations between them.
Moreover, an ontology may have a more linguistic aspect to its taxonomic structure,
resembling previous CIS “controlled vocabularies” and thesauri formats.
A commercial company, “Applied Semantics” (AS, 2003) illustrates this with
its CIS named CIRCA: “…the soul of CIRCA is its ontology”, which is described as:
“... consisting of millions of words, meanings, and their
conceptual relationships to other meanings in the human
language…with more than 1.2 million words, half-a-million
concepts, and tens of millions of relationships - CIRCA matches
words and phrases to its ontology, performs linguistic analysis,
disambiguates them into meanings, and weighs those meanings
by importance….” (Marla, 2002:35)
Typically, an ontology identifies the important classes (categories of objects)
of a domain and organises these classes in a sub-class hierarchy. Each class is
characterised by properties that are shared/inherited by all elements in that class,
which, taken together, provide detailed, consistent distinctions in a holistic structure.
Thus, an ontology, like any traditional taxonomic form, illustrates associative
and hierarchical relationships among concepts, with the major distinction that it
embodies a far more elaborate taxonomic framework. In theory at least, this
ontological taxonomy should incorporate machine processing powers, i.e. automatic
inference and deduction rules.
10
11
Tim Berners-Lee sums this up succinctly:
“The most typical kind of ontology for the Web has a taxonomy
and a set of inference rules… the taxonomy defines classes of
objects and relations among them. For example, an address may
be defined as a type of location, and city codes may be defined to
apply only to locations, and so on.” (Tim Berners-Lee et al,
2001)
The goal of an ontology is to seek to offer a collection of terms and
relationships within a homogeneous structure, i.e. an explicit encoding of knowledge
in a domain, or that spans sub-domains. Such knowledge-representation needs to
have the potential to be shared and reused by different information systems
communities (acting as a “dictionary” of the specific domain).
Ultimately, this “ontological representation” is the consensual agreement on
the concepts and relations characterising the specific information of the domain in
question, resulting in a shared understanding of a domain that is processible by both
computers and human beings. This dual aspect will be explored in more detail
further on.
In conclusion, it is the purpose for which the ontology will be used that will
eventually determine its final engineered form. Recent research offered in the
literature focuses purely on the more developed and complex forms of ontologies, as
they are crucial for covering the wide scale needed for current CIS applications. This
is what is referred to in Gruber's initial theoretical definition of ontologies.
11
12
1.3 Current Ontological Projects
At this stage, a review of some of the current research projects in course is
necessary, in order to understand the implications of ontologies for CIS --- it would
be no exaggeration to say that the praxis of ontology is an attempt to map out all
fields of human knowledge containable within a CIS.
The potential of ontologies to provide an “objective” specification of a
domain of information, by representing a consensual agreement on the concepts and
relations characterising the way knowledge in that domain is expressed, is aimed at
supporting information management in various fields.
The following is a brief outline of various ontological research projects,
spanning several knowledge domains (some features of these projects will
subsequently be explored in greater detail).
As reported in the literature, several research groups are currently involved in
the creation of ontologies in the biomedical field. The current most ambitious
project is “IFOMIS” (2003), the goal of which is to develop a “formal ontology” that
will be applied to the “whole” domain of medical science (Smith, 2002).
Another development in the biomedical domain is the “Gene Ontology
Consortium” (GO, 2003), which aims to produce a “controlled vocabulary” that can
be applied to all organisms, taking into account the fact that knowledge of gene and
protein roles in cells is in a perpetual state of evolution. Representing genetic data
necessitates a flexible ontological structure --- a structure that is capable of changing
as the knowledge of genetics changes. The Gene Ontology (GO) is used to
“annotate/describe” gene data, at various levels, mainly in three knowledge domains:
the molecular function, the biological process and the cellular component.
12
13
Several biomedical research groups have annotated their databases according
to the “controlled vocabularies” contained in the ontologies produced by the GO
Consortium.
In a similar vein, the goal of the “Plant Ontology Consortium” (PC, 2002) is
to produce an ontology that can be applied to plant-based database information.
A medical ontology project is currently underway, with the aim of building a
wide scale ontology called the “UMLS Semantic Network” (McCray, 2003).
The “Agricultural Ontology Service Project” (FAO, 2003b) includes several
ontological projects based on the task of aggregating agricultural information.
One related project is the “Fishery Ontology Service” (FOS) which is a FAO
based project (Pisanelli et al, 2002) for the “fishery” information domain.
Research has been carried out on a potential ontology for the geographical
domain (Kokla and Kavouras, 2001).
“Applied Semantics” (AS, 2003) is selling its various information
management services to the pharmaceutical, biotechnology and financial service
industries, etc., based on its ontology called “CIRCA”.
One interesting aspect of the CIRCA ontology is that it has been bought by
the most effective search engine to date, Google, in April of this year (Google,
2003). This recent development indicates the importance that ontologies seem to be
acquiring for the future of CIS.
The existence of such projects leaves the proponents of ontologies with the
idea that no future computerised information systems can be designed without
adopting an ontological approach, or as, Nicola Guarino, the President of one of the
main ontological research groups, OntoLab (2003), says, without binding CIS “to the
perspective of ontology driven information systems.” (Guarino, 1998).
13
14
2. The Babel Syndrome: Why use Ontologies?
In the ambit of CIS: “We design ontologies so we can share knowledge with
and among...agents.” (Gruber, 1995:908).
In the same way that a group of people, working together, need to have an
“agreement” on the basic meaning and definition of the words they employ in their
communications, CIS, as a community of “agents” (whether human or machine),
need a common mechanism for agreeing on the precise semantics of terms, in order
for them to be able to effectively communicate/exchange their data/content.
Data repositories use specific classification terminology and concepts to
represent and process the information or data/content that is received and stored within
a given knowledge domain, but this is not necessarily understood by all “agents”.
Indeed, CIS have always faced the problem of what might be termed the “Tower
of Babel syndrome”: the multiplicity of languages creates the problem of
communication and understanding, making the act of “translation” necessary. In the
ontological context, this takes the meaning of constructing an intermediary or bridge,
i.e. a representational “lingua franca” or one “meta-language”.
“Ontological driven” CIS are necessary because “if no man is an island”, “no
system is an island” either. What could be deduced from this is that the phenomenon of
“globalisation” in the economic and social sphere can only continue to develop if there
is an equivalent form of “globalisation” in the CIS sphere. In order to share and reuse
data, from one CIS data repository to another, there is a growing need for data or
knowledge integration.
14
15
To be more precise, CIS interoperability has two aspects related to the
capability of information systems to exchange/share data. One is purely syntaxical,
i.e. technical/functional interoperability. The other, more important, aspect of CIS
interoperability is the semantic component, i.e. the need for “semantic
interoperability” (the representation of the content of the data repositories).
“Semantic interoperability” is essential in order for CIS to exchange information
received from other systems. This is the main role for which ontologies are
designed --- to be used by a group of people which shares an essential need to
communicate information through the medium of CIS data repositories in their
specific field of interest.
In summary, based on the above observations, two principle applications for
“semantic interoperability” in CIS can be highlighted: “semantic integration” at the
human level and “semantic intelligence” at the machine level. Both these inter-
related aspects will now be explored in detail.
2.1 Human Interoperability: Semantic Integration
In practice, the effective demand for interoperative CIS comes from groups or
organisations, such as the medical, scientific or business domains, which need to
share a considerable quantity of content/data within their specific domain.
Human knowledge creation is caught in an eternal tension between the
effectiveness of specialist groups, acting independently, and the need for them to
integrate with the wider domain to which they belong. Most commonly, a specialist
group may develop “innovations” and this tends to produce a sub-domain. How the
resulting data is classified by their information management system is not necessarily
understood, or interpreted, in the same way by other information systems in the
domain.
15
16
In more general terms, when, for the sake of accessibility (mainly to facilitate
the search and retrieval process), an attempt is made to integrate information from
different CIS repositories, incompatibilities arise in the terminological or conceptual
representation of the content/data. Partridge (2002) states that:
“More and more enterprises are currently undertaking projects
to integrate their applications. They are finding that one of the
more difficult tasks facing them is determining how the data from
one application matches semantically with the data from the
other applications.”
Semantic heterogeneity may occur in CIS in instances where there are two or
more systems of concepts/terms covering, more or less, the same universe of
information, or when information expressed in one system cannot be exactly
expressed in the other, or when other semantic mismatches occur.
“Not only is there a need to determine when different databases
are talking about the same thing (within a paradigm) - but also
of determining how to map the things that exist according to one
paradigm into the ‘same but different things’ that exist according
to another (across paradigms). There is also the task of
determining which paradigm underlies each database and which
should underlie the unified database.” (Partridge, 2002)
In the final analysis, an ontology is a unified framework or paradigm, which
acts as a common reference “taxonomy” As such, it is designed to resolve (or
reduce) the conceptual or terminological problems encountered when integrating
disparate data. An ontology aims to develop an homogeneous base from existing
heterogeneous terminological sources, by establishing a semantically explicit
conceptual consensus.
16
17
“Semantic interoperability”, by the means of “semantic integration”, is
simultaneously the aim and result of a successful ontology, which is, itself, the
outcome of an engineered shared conceptualisation. Thus, the underlying premise of
this engineering praxis is that it should be possible to construct a classification
system by “cross-calibrating” disparate CIS data residing in a specific domain, as if
translating from one language into another, and establishing a benchmark taxonomic
framework.
In practice, this “semantic integration” process would need to involve a
substantial “semantic equaliser” element, in order to provide “relations” and
equivalences, as well as ensuring the absence of polysemy, with the direct result of
achieving a certain degree of semantic compatibility with the other CIS content
repositories in the domain. This would facilitate communication and sharing
between different CIS within the domain, enabling them to be accessed by different
applications/agents.
The following examples highlight, in greater detail, the nature of the
problematic which an ontology is designed to resolve:
In the medical domain, as in many other fields, nomenclatures (e.g.
standardised, controlled vocabularies) have been used for the purpose of managing
and retrieving its knowledge base. The domain of medicine has historically used:
“…many terminological systems, i.e. classifications, thesauri, vocabularies,
nomenclatures and various unstructured coding systems, each of them being
designed for a specific purpose.” (Burgun et al, 2001:96). Each of the above formats
for accessing and retrieving information in the medical domain reflects: “the
diversity of goals, approaches, and achievements across the different medical
communities according to their main orientation, which may be patient’s care,
research or public health.” (Burgun et al, 2001:96).
17
18
These various perspectives make it difficult to agree on even accepted
medical terminological classifications. This has the implication that, in most cases,
not only do medical terms and definitions differ between groups but, more
importantly, different groups use identical terms with different meanings. Even a
concept such as “gene”, which is common to all groups, is used with a different
semantic focus by the major genomic databases (Burgun et al, 2001). In summary,
different databases may use identical “labels” but with different meanings;
alternatively the same meaning may be expressed using different labels.
The same fundamental problem arises in the geographical data domain:
“In order to achieve information exchange between different
geographical databases, it is necessary to develop suitable
methods for formally defining and representing geographical
knowledge. However, the plethora and diversity of data
standards and terminologies representing different geographical
concepts further complicate the problem of geographical
information sharing and reuse. Semantic differences occur
between heterogeneous geographical data standards and raise
problems during the integration process.” (Kokla and Kavouras,
2001:683)
This problematic is currently being tackled in the development of the
“Fishery Ontology Service” (FOS), which is a FAO based project (FAO, 2003a),
designed for merging several “fishery terminologies”, in order to support an
integrated information search and retrieval process.
A fishery portal is one of the main goals that FOS is designed to achieve, by
semantically integrating the pre-existing fishery taxonomies and terminologies
(directories, reference tables and online thesauri), used by different groups in the
fishery domain, into one accessible portal.
18
19
For example, the concept “aquaculture” is classified differently according to
the “fishery groups” involved. The ASFA (FAO, 2002a) and AGROVOC
(FAO, 2002b) thesauri place the term “aquaculture” in different relational
hierarchies. Pisanelli et al (2002) sum up the dilemma encountered in this regard:
“two different contexts relating respectively to species and environment points of
view.” This has the direct implication that, without an ontology (in this case FOS),
the fishery search and retrieval process related, for instance, to the term
“aquaculture” (beyond the traditional keyword-match IR system) will produce
inconsistent results.
Another example of the type of semantic complexity, which an ontology’s
“shared conceptualisation” would need to address, is the “UMLS semantic network”
(McCray, 2003) project currently under development. The role of the UMLS
“semantic network” is to engineer an ontology out of the US national Library of
Medicine information management system, the UMLS, which:
“currently interrelates some 60 controlled vocabularies in the
biomedical domain. The vocabularies vary in nature, size and
scope and have been created for widely differing purposes.
Some vocabularies have been created for document retrieval
systems, others for coding medical records for billing and
administrative purposes, and yet others have been created for
use in medical decision support systems.” (McCray, 2003:82)
The UMLS “semantic network” ontology is designed to: “ provide an
overarching conceptual framework for all UMLS concepts.” This in order to
develop: “ a system whose goal it is to provide integrated access to a large number
of biomedical resources by unifying the domain vocabularies that are used to access
those resources.” (McCray, 2003:81).
19
20
The aim of ontologies, in general, is to provide integrated access in such a
way as to achieve consensus throughout a knowledge intensive community. Thus, an
ontology should be designed to resolve the representational disparities arising from
the different perspectives within a domain. Existing classification information
management systems have proved inadequate for such a task. They have been
unable to fully synthesise and do justice to the complexity and contextually-rooted
knowledge of human expertise, e.g. the differences in perspective or emphasis of
specialist groups which have the need to search and retrieve one common knowledge
base.
While the technology for running CIS has reached an impressive state of
maturity and power, the classification information systems, upon which the CIS
technology relies, are based on a myriad of singular “ad hoc” classifying systems: the
interpretation of the terms or categories is still left to the intuition of each individual
management system. In addition, the level of expressiveness of each system is quite
restricted (e.g. broader term, narrower term, synonymous term, related term).
2.1.1 The Promise of Better Information Management
A successful ontology, by providing CIS interoperability through engineered
“semantic integration”, could enhance the information management for CIS in many
ways. Better organised and explicit information would, by definition, support a more
targeted information retrieval or “extraction”, i.e. delivering the right information, in
the right amount and at the right time.
The search and retrieval process, aided by an ontology, would retrieve only
those “documents” that refer to the explicit concept defined, instead of using the
ambiguous and poly-semantic keyword match of traditional IR systems, i.e. the
explicit conceptuality of the ontology is used when querying the domain
20
21
For example, “plant data” residing in CIS databases needs inter-base querying
between different plant databases. However, the querying process is presently
severely hampered:
“terms used to describe comparable objects within and between
databases are sometimes quite variable and limit the ability to
accurately and successfully query information in and across
different databases. One solution to this problem involves the
development of an ontology.” (PC, 2002:138)
To this end, the “Plant Ontology Consortium” is currently developing such an
ontology.
The development of an ontology for plant data would integrate and provide a
richer structure, which would ensure more precise and complex querying. Due to the
explicit nature of the relationships between types of entities and features,
consultation of the ontology during the search and retrieval process would ensure
accurate recall of the specialist terminology.
The “Gene Ontology” (GO, 2003) is developing an interoperable
standardisation project for the biological domain. The aim of the GO is to facilitate
more sophisticated queries by synthesising the wide ranging biological-genetic
information domain. An ontology, in this context, could determine that the process
of photosynthesis occurs in plants but not in mammals (as explicitly pre-defined in
the ontology) and thus bind the query to that “represented conceptuality”.
21
22
If data is stored in an unstructured, “free text” form, as in the Google CIS
database, the search engine capability is inherently limited. For instance, in the
context of the search and retrieval of biological data, Harris and Parkinson
(2002:119) observe that:
“…In contrast, if ontologies are used to describe the species,
compound and developmental stage, structured queries, such as
‘what experiments use compound Y ?’ are possible…the source
ontology provides an unambiguous definition for Y.”
In summary, the human search and retrieval process is aided by an ontology
which permits a higher “information extraction” (i.e. targeted retrieval of data) than
is currently possible.
2.2 Machine Interoperability: Semantic Intelligence
An ontology, by describing a set of assumptions about a domain, is designed
not only to help humans communicate but also to enable different computer agents to
communicate and “manipulate” a domain’s “conceptual consensus”. This
constructive anteriority leads to another inter-related aspect of CIS “semantic
interoperability”, that of “semantic intelligence” (i.e. machine driven applications).
According to the proponents of “ontology-driven” CIS, ontologies would
permit the emergence of a more “intelligent” information system. Ontologies, in
addition to providing a common semantic structure, would allow for computers to
process data semantically, in a more “aware” manner.
Ontologies would offer an active “content-driven” role, as opposed to the
passive keyword “context blind” role of current information retrieval systems, thus
providing targeted retrieval of data in an “automatic” machine-driven manner.
22
23
An “intelligent” application, enabled by an ontologically driven CIS, would
have the potential to enhance the performance of everyday information systems, such
as e-mail and “calendar plus”.
Currently these types of applications are capable of filtering out unwanted
messages, or alerting the user to conflicts in meeting schedules, or identifying
meeting room availability, as the case may be. However, the functions that these CIS
applications are capable of performing are restricted, due to the lack of an
“understanding” of the context. In other words, the system’s information processing
has a “passive” reaction towards its content.
In order to play a more active, content-driven role, these CIS applications
would need to “act” upon the information content retrieved, i.e. by automatic
inference and/or deduction. This more powerful processing, however, would need to
be endogenous to the context, i.e. the actual “meaning/semantics” intended by the
content/users.
A series of automatic actions could then be generated in sequence by the CIS:
e.g. determining availability of participants and meeting rooms, scheduling meetings,
sending out automated messages, and so on along the “semantic chain”. In order to
achieve this there may be the need for creating an ontology of appointments (the
concepts of dates and available time slots in the context of the particular
organisational structure).
At present, most CIS use standard keyword based methodologies, which
fundamentally create the problem of information overload. In general, any query
process produces the effect of displaying a quantity of inappropriate data: the CIS
interrogated is “contextually blind”. As the number of information units grows, and
the specificity of specialist needs increases, the ability of current CIS to find the most
appropriate data is strained to the limit.
23
24
Typically, computerised information systems, supporting complex tasks and
domains, do not automatically possess the body of knowledge necessary for
generating adequate interpretations, but instead rely on the fact that the user performs
this function. Thus, inevitably most of the burden is placed on the user. Ontological
“intelligent” support implies that this burden should, to a maximum degree, be
shifted away from the user towards the CIS.
2.2.1 The Semantic Web
One of the main advocates of this semantic/intelligent vision is Tim Berners-
Lee, in his W3C “Semantic Web Activity” research project (W3C, 2002). The
“Semantic Web” project entails adding an extra ontological layer or infrastructure to
the current HTML/XML World Wide Web (WWW), permitting Web resources to be
more readily accessible to automated processes. In essence, Web content would be
structured and formalised in such a way as to provide content which could be
accessible and interpretable by machines.
The current WWW environment, as already indicated in general for CIS, is
entirely aimed at “human readers”. Machines are “blind” to the actual information
content --- Web enabled CIS, such as browsers, servers and search engines, do not
semantically distinguish between information sectors (the weather forecast, academic
papers, personal home pages, or corporate information, and so on). The “Semantic
Web vision” is set to change this.
The “inventor” of the WWW considers the development of ontologies to be
the core foundation for building a more “intelligent” Internet, i.e. the “Semantic
Web”. This could best be represented as a “gigantic electronic brain” which
“understands” Web resources. The “Semantic Web” would understand the
“meaning” of a Web page by following hyperlinks, from Web documents to domain-
specific ontologies (which could be seen as the “neurons” of the WWW). This
24
25
could make it possible to offer cross-references and automatic “inferences”, in order
to enable a computer to “understand”, for instance, that different words/terms are
different expressions of the same concept (e.g. “movie”, “film” and “motion
picture”).
“The Semantic Web is an infrastructure that will bring structure
to the meaningful content of Web pages, creating an environment
where software agents roaming from page to page can readily
carry out sophisticated or ‘intelligent’ tasks for different users.”
(Berners-Lee et al, 2001)
Berners-Lee illustrates how agents, supported by semantic information, could
be used to conduct research into everyday tasks, such as investigating health care
provider options, prescription treatments, or available appointment times. Each of
these tasks is now usually conducted by a human researcher but, with the “Semantic
Web”, thanks to the development of ontologies, this could be done by machines.
This important semantic feature would permit intelligent agents, and other
ontological driven Web-based applications, to not only passively swallow
representations/descriptions but to act on them as well.
“...an agent coming to the clinic's Web page will know not just
that the page has keywords such as ‘treatment, medicine,
physical, therapy’ (as might be encoded today) but also that Dr.
Hartman works at this clinic on Mondays, Wednesdays and
Fridays and that the script takes a date range in yyyy-mm-dd
format and returns appointment times.” (Berners-Lee et al,
2001)
The application/agent in the Semantic Web CIS environment would be able to
“know” and, therefore, act upon what it “knows”, because the “semantics” were
encoded by the Web site creator using appropriate “semantic software”, in much the
same way as Web page software is currently used to create HTML Web pages.
25
26
As Berners-Lee further explains, the ontology/semantics would encode data
such as the following:
“…professors work at universities and they generally have
doctorates. Further markup on the page (not displayed by the
typical Web browser) uses the ontology's concepts to specify that
Hendler received his Ph.D. from the entity described at the URI
http://www. brown.edu... and so on. All that information is
readily processed by a computer and could be used to answer
queries (such as where Dr. Hendler received his degree) that
currently would require a human to sift through the content of
various pages turned up by a search engine.” (Berners-Lee et al,
2001)
Berners-Lee’s intelligent Web CIS application would be composed of many
ontologies with links to other ontologies. The Semantic Web is designed to rely on
many decentralised ontologies, rather than one centralised, monolithic ontology.
These decentralised ontologies would be made available by their specialised domain
creators (and formalised by use of appropriate “markup”).
In order for intelligent agents to process Web-based information, it is
essential that they operate in a pre-defined human context of agreement, as to the
meaning of the terminology used within any particular domain. This ontological
agreement, at the human level of “semantic integration”, as detailed earlier, needs to
be further elaborated in a formal structure, in order to allow inference from the
ontological integration.
Intelligent Web services would use multiple ontologies in parallel (as Web
information is, by definition, located in numerous sources) and subsequently make
inferences from them.
26
27
If, for instance, the owners of different Web sites containing medical
information or providing medical services were to share and publish their underlying
ontologies, with their specialist variations of the terms they use, computer agents
would be able to extract and aggregate information from these different sites. As
such, the intelligent automated agent would “parse” the ontologies linked to the sites
and process and deliver the information accordingly, rather than merely matching
keyword “patterns”, as at present on the Web.
An ontology could build its hierarchic/taxonomic form with a focus on
“automatic” inferred inheritance. Thus, machines would correctly understand a
number of relationships among entities by assigning properties to higher classes and
then assuming that sub-classes inherit these properties. For example, if “Britney
Spears” is a type of “Pop Star” in a hierarchy marked “Singers”, a software
programme could make assumptions about the singer even if the details of her
biography are not explicitly known. An ontology may express the rule: “If a singer
has an agent or manager and released an album last year, then assume he or she has a
fan club”.
An agent/programme could then readily infer, for example, that Ms Spears
has a fan club and process information accordingly. Software-agents would not truly
“understand” the meaning of the particular information at hand, but inference
capabilities allow applications to effectively use or circumscribe language/concepts
in ways that are contextually significant to human users.
Furthermore, due to the fact that the Semantic Web is composed of Web
pages hyperlinked to their respective ontologies, it would allow a more semantically
rich environment to emerge, thereby providing an enhanced framework for
27
28
expressing human content/data on the Web (and simultaneously allowing a more
specific and complex querying process), for, on the Semantic Web:
“…we can incrementally add meaning and express a whole new set of
relationships (hasLocation, worksFor, isAuthorOf, hasSubjectOf, dependsOn, etc)
among resources, making explicit the particular contextual relationships that are
implicit in the current Web.” (Berners-Lee and Miller, 2002)
Thus the domain mapped out on the Semantic Web would allow such
statements as “all journal papers are publications” or “the authors of all publications
are people”. When combined with facts, these definitions allow other facts to be
inferred.
2.2.2 The Formal Aspect
In order for the Semantic Web, or any other “intelligent” driven CIS, to
function, an ontology must be structured in a formal manner to allow the expressions
of a given ontology to be processed unambiguously by a machine. This
“programming act” of formalisation is the essential process which would allow
software agents to manipulate/infer the terms in the ontology.
Existing knowledge representations for communicating an ontological
vocabulary and structure to humans, such as Yahoo's “lightweight” taxonomy, are
too informally expressed for automatic machine processing/inference abilities. No
machine processing leverage can be derived from such a form, or the unstructured,
“free” data common to most information on the Web.
The Extended Markup Language (XML) is a clear example of a formal
expression that could be applied to ontologies. XML is currently being proposed by
the W3C’s “Semantic Web Research Group” as the “carrier” body-syntax for the
ontologies making up the “Semantic Web”: a computer readable structure.
28
29
XML has already been accepted as the emerging standard for data
interchange on the Web. XML allows authors to create their own markup. The
formal data structure of XML “replaces” the format-based markup structure of
HTML. For instance, <B> My name is Bond, James Bond </B> is replaced by the
formal data structure of XML markup that provides a context of understanding for
the semantics of the data, i.e. <author> Ian Fleming </author>.
However, from a computational perspective, “semantic tags” like <author>
carry as little semantics as a tag like HTML’s <B>. A computer simply does not
know in what semantic sense “author” is intended.
XML markup allows users to structure a formal hierarchy of information in a
document, shows the relationship between items and is able to encode metadata in a
more “semantically unambiguous” way. However, it is not possible to know in
which semantic sense tags, such as “publisher” or “editor”, are being used. While
XML has been designed to be a more successful binding for highly structured,
complex metadata, the same problematic arises as with HTML’s original “meta-
tags”.
“XML lets everyone create their own tags …. Scripts, or
programs, can make use of these tags in sophisticated ways, but
the script writer has to know what the page writer uses each tag
for. In short, XML allows users to add arbitrary structure to
their documents but says nothing about what the structures
mean.” (Berners Lee et al, 2001)
This is why XML markup, in order to fulfil its machine processing potential
to the full, needs to be supplemented by an ontology, for which XML could be the
formal and body expression.
29
30
3. Building the Tower of Babel: Ontological Engineering
To summarise briefly, ontologies of CIS data are designed by establishing an
explicit/formal agreement to use the same terms with the same meanings and thus
“engineer” an ontology within a knowledge domain. The fundamental difference
between an ontology and a conventional representational vocabulary system lies in
the fact that it is machine understandable and, most importantly, that it provides an
enhanced semantic expression.
The shape and form of the constructed “shared conceptualisation” varies
considerably. In cases where the main need for CIS interoperability is “semantic
integration”, the ontological framework to be constructed would only be concerned
with how the particular community of users searches and queries the domain. To this
end, the engineering process would necessitate building a “conceptual consensus”
among the members of a group, representing what they “truly” intend, and
consolidating this into an explicit agreement. In other words, the ontological
information structure would solely focus on mapping the topic for the benefit of a
synthesised and accurate search and retrieval process, throughout the domain.
Thus the ontological engineers would focus on the role that ontologies play in
the reuse and exchange of data, to help the protagonists locate and interpret their own
information/domain.
If, on the other hand, the application is required to be principally machine-
driven, the ontological engineering would need to allow its “shared
conceptualisation” to be expressible in a formal manner. This would involve
translating into a formal expression (or fitting into a formal framework) each of the
ontological components/concepts agreed upon by the group.
30
31
Bearing these points in mind, ontologies are best understood through their
functional role, as opposed to adopting a purely format-based definition: “The
purpose to which the ontology will be put determines the nature and type of ontology
that is created.” (McCray, 2003:84)
But how is an ontology’s “shared conceptualisation” engineered? How does
the relevant conceptual distillation and abstraction emerge? How is the abstraction
constructed by a group possessing the relevant sphere of knowledge?
3.1 Engineering of Ontologies: General Guidelines
It is relatively easy to design an ontology based on concrete facts, such as
names, birthdates, etc. However, human knowledge is rarely that simple to represent
and to render the assumptions of a domain explicit is a complex task.
While no precise methodological procedure for “ontology generation” or
design can be deduced at present, a core streamlining process can, nevertheless, be
identified:
Raw material: acquiring and extracting domain knowledge
Collection of the “relevant” information resources and domain expertise that
will define, with consensus and consistency, the terms used to explicitly
describe things in the domain of interest.
•
•
Shared Conceptuality: purpose, scope, and organisation of the ontology
The design of the overall conceptual structure of the domain involves
identifying the domain's principal concepts and their properties (commitment
to the basic terms) and the relationships among the concepts.
31
32
Iterating this identification (inductive and deductive) process by adding
further concepts and relations at more taxonomic levels of detail or
resolution.
•
•
Identification of the key concepts and relationships has to be established in an
explicit manner, by defining each conceptual taxonomic layer/level.
Ontological engineering is a consensual methodology and hence involves a
thorough negotiating process, with the added complexity of continual monitoring of
the syntax, logic and semantics of each element constituting the ontology.
3.1.1 Gathering and Identification of the Raw Material
As previously mentioned, the “meta-level” nature of ontologies inevitably
places them in a “parasitical” relation to other CIS knowledge bases. The “building
blocks” or “raw material”, from which the ontology is constructed, stem from the
pre-existing knowledge base of the domain to be “ontologised”. An “ontological
analysis” or “mining” needs to be made of the domain, in order to subsequently
extract the key concepts from either “free” documentation or previous information
management systems: taxonomies, thesauri, etc. (Guarino and Welty, 2000).
A degree of arbitrariness is, by definition, involved if an ontology is built
“from scratch”; the researchers need to obtain information from forms,
questionnaires, free-text, etc., which, all combined, the group uses for the
formulation of the “conceptualisation”. While, in theory, this is possible, current
practice is based on the “conceptual anchorage” provided by “legacy” information
management systems.
32
33
For example, in the case of the fishery ontology, FOS, the previous
knowledge bases are the AGROVOC and ASFA online thesauri, as well as the
OneFish (2003) directory and “reference tables” of the FIGIS portal. The fishery
ontology FOS is constituted by, and created from, these heterogeneous fishery
information management systems (Pisanelli et al, 2002).
In the case of the “UMLS semantic network” biomedical ontology, the
“Unified Medical Language System” (UMLS) is used to carry out the ontological
analysis/mining necessary to build the ontology (McCray, 2003).
3.1.2 Shared Conceptualisation
The conceptualisation underlying the ontology is the fruit of a highly
collaborative and interdisciplinary process. The group responsible for engineering an
ontology is principally composed of computer scientists and the domain experts and
users.
In order to achieve “semantic interoperability”, the group must first agree on
the scope of their semantic interoperation and then reach a consensus, entity by
entity, relationship by relationship, attribute by attribute, and finally identify the
main concepts which give a “schematic representation” of the domain. This process
is called “ontological commitment” and is the essence of the ontological engineering
practice.
“Ontological commitment” is the agreement by multiple parties to adopt
particular ontological elements, when communicating about the domain of interest,
even though they do not necessarily have the same experience, theory, or perception
about the domain. For instance, all financial services practitioners agree that trade
execution and trade settlement exist and that execution precedes settlement.
However, there may be disagreement about whether the time limit should be two
days or six days, and so on.
33
34
Ultimately, there is no one correct way to model or conceptually represent a
domain; there are always viable alternatives. As such, the ontology is always
developed through a process of mediation, with the goal of ensuring, to the extent
possible, that all its users will find its characteristics to be sufficiently semantically
complex, clear and unambiguous, to be of practical use.
The resulting schematic representation thus achieved is based on the level of
granularity agreed upon and must be consistent, extendible (i.e. able to scale) and
updatable. Data sources that commit to the same ontological schema explicitly agree
to use a standardised set of terms with an explicit description/definition of those
terms.
The ontological conceptual process is analogous with creating a descriptive
language or grammar: constituted by nouns/adjectives (objects/concepts), verbs and
conjunctions/prepositions (relationships), expressed in an array or taxonomy of
classes, within which the main concepts of the domain are placed.
For example, an ontology of wine would be based on the overall class of wine
(which represents all wines), within which different types of wine would be instances
of this class, i.e. sub-classes. Bordeaux wine would be an instance of the red wine
sub-class. A class will, subsequently, have further sub-classes that represent
concepts that are more specific than the super-class. The classes could be further
divided to include properties such as red and white wines, further divided into
sparkling and non-sparkling, and so on, until the “wine ontology language” is
sufficiently representational for all the protagonists involved.
The important elements in the “wine ontology language” could include
“nouns-concepts” such as: grapes, location, colour, body, flavour and sugar content,
as well as adding the relevant “verbs-relations”, which are primarily responsible for
rendering the semantics of the whole domain explicit.
34
35
To take another field, in the biological domain, the main concepts needed for
representing “protein interactions” in developing a “protein-interaction ontology” is
described as follows: “we can identify two main concepts needed for the
representation of protein interactions: the interacting compounds and the
interactions themselves.” (Ratsch et al, 2003:86)
Another case in point is the “Plant Ontology Consortium” (2002), which is
attempting to develop a “plant ontology” to represent current and future
understanding of relationships among various plant knowledge domains. The
following example shows the conceptual complexity and degree of detail that the
plant ontology would need to represent:
“This protein can be equally described as a ‘DNA binding
protein’ and as a ‘catalyst’ (enzyme). Consequently, this protein
should be a ‘child’ of both ‘parents’ within an ontology. The
presence of such multiple ‘parent’ situations in biology requires
that they be accurately represented in a conceptual framework.”
(“Plant Ontology Consortium”, 2002:139)
The ontological engineering process is the development of a class hierarchy,
as well as the “properties” of those “classes”, while ensuring that an explicit
definition of those concepts in the hierarchy, and the properties contained within
them, is present and codified throughout the construction process.
There are several possible approaches to developing and constructing this
class hierarchy:
A top-down engineering process starts with the definition of the most general
concepts in the domain and progressively divides the conceptual definitions into
more specific sub-classes. For instance, the creation of a class for the general
concept of “class: wine” is described, then the class is further defined by generating
35
36
sub-classes and instances (white wine, red wine etc.), proceeding with categorisation
of the “red wine class” into Burgundy, Bordeaux, etc., and continuously refining
each sub-class by adding the corresponding properties/characteristics, until the
domain hierarchy is fully mapped out.
A bottom-up engineering process starts with a definition of the most specific
sub-classes of the domain (which is how most domain information is initially
presented) and progressively constructs the hierarchy by expanding these sub-classes
into more general concepts, i.e. super-classes.
A third approach is to combine both the top-down and bottom-up formats, by
defining the more salient concepts first and carrying out a continuous process of
switching from general, to middle or specialised levels, and vice versa, as
appropriate, to generate the ontological hierarchy.
The principle difficulty of engineering an ontological hierarchy (of classes
and sub-classes) is to achieve a high level of semantic relationships to be mapped out
between the concepts/classes. Numerous relations need to be extracted and made
explicit, such as: “is-a”, “part-of”, “manufactured-by”, “owned-by”, etc. and other
types of relational “constraints”.
The following is an illustration of some of the semantic complexity of the
relationships to be mapped out for an ontology: in this case the UMLS semantic
network ontology (McCray, 2003):
is a associated with physically related to part of consists of contains connected to interconnects tributary of ingredient of spatially related to location of
36
37
adjacent to surrounds traverses functionally related to affects manages treats disrupts complicates interacts with prevents occurs in process of uses manifestation of indicates result of temporally related to co occurs with precedes conceptually related to evaluation of degree of, etc.
The following type of concepts for describing/defining persons working in an
academic environment (with their characteristics) could be used for an “Academic”
or “Person Ontology” (Benjamins et al, 1999):
Academic staff class hierarchy (class definitions):
Lecturer Researcher Administrative-Staff Secretary Technical-Staff Student Phd-Student, etc.
37
38
Some relation definitions: Address, Affiliation, Cooperates-With, Editor-of, Email, First-Name, Has-Publication, Head-Of-Group, Head-Of-Project, Last-Name, Member-Of-Organization, etc.
In the same academic environment, a publication ontology could be
developed with the following classes and relations:
Publication class hierarchy
Conference-Paper Journal-Article Technical-Report Workshop-Paper, etc.
Some relation definitions:
In-Conference, In-Journal, In-Organization, In-Workshop, Journal-Editor, Journal-Number, Journal-Publisher, Journal-Year, Last-Page, etc.
3.1.3 Conceptual Scope: Multiplicity and Singularity
To engineer an ontology, which would achieve the goal of “semantic
interoperability”, outlined previously, is a difficult process to put into practice (some
would say an impossible one) because, as touched upon earlier, there are multiple
ways to model the same information (due to differences in perspectives, different
organisations and different professions within the same domain). There is no single,
canonical view of a particular domain of knowledge or the conceptual elements
within.
Furthermore, in order to uphold Gruber’s original theoretical definition
(Gruber, 1993), an ontology must have the resources to sustain the many levels of
representation/classification which any “authentic” human semantics calls for. In
other words, any “true” ontology must sustain representation/classification of
38
39
“entities” at different resolutions. An ontology, if it is to be of any use within a
domain, and at least fulfil its “higher” knowledge mission (as compared to traditional
systems), would have to permit a multitude of representational perspectives to be
reused by multiple groups within a domain through a single access point (i.e. the
synthesised engineered consensus, the ontology itself).
For example, an overarching medical ontology would need to represent an
anatomical ontology, both at the level of organs within the structure of the human
body, and also at the cellular, protein, genetic and molecular levels, constituting
ontological representations/classifications at successively finer resolutions. Thus the
extreme semantic complexity of medical information calls for a corresponding
ontology, capable of supporting numerous applications, from the perspectives of
doctor, patient, pharmacologist, geneticist, etc.
The degree of mediation achievable between different levels of granularity
and human perspectivism (kaleidoscopic subjectivity) is the essential requirement for
whether an ontology, as both discipline and praxis, will be successful in years to
come.
As Pisanelli et al (2002) state, a “full” ontology would:
“… provide the framework to integrate in a meaningful and
intersubjective way different views on the same domain, such as those
represented by the queries that can be done to an information system….”
The following examples reflect some of the semantic ambiguity arising from
the problematic of perspectivism:
Specialist ontologies are essential to the acquisition and expression of domain
knowledge. On the other hand, the more detailed and/or specialised an ontology, the
more “ontological commitments” are made to particular or specialist tasks (i.e.
creating further sub-domains), which may not be taxonomically compatible with the
classification of the parent domain.
39
40
Semantic problems may arise in the situation where two ontological
structures, representing the same domain, refer to the same concept in different ways,
such as zip/postal code or gender/sex. A “tumour”, for example, can be defined at
the same time as an “anatomical structure” and a “pathological phenomenon”.
This type of ambiguity creates severe problems for the ontological
representations of the “shared conceptual” engineering process. This problematic
could be resolved if ontologies were to provide equivalence/mapping relations: one
or both ontologies may contain the information that a term is equivalent to another
term or that it comes under a particular conceptual heading (equivalence between
classes and properties).
One solution to tackle this type of problematic, which is endogenous to the
knowledge endeavour, would be to establish an integrated consensus on the scale
needed, by building a multiplicity of linked ontologies, known as an “ontology-
library” (Ding and Fensel, 2000). This would have the effect of creating synergy in a
domain, mainly by means of a cross-referencing process between different types of
ontologies.
In the engineering process, a substantial number of ontologies may need to be
created to fully represent the semantics of a domain, at the level of detail required.
Thus, ontologies would be taxonomically integrated or linked within the domain (i.e.
primitives/relationships that allow ontologies to map terms to their equivalents in
other ontologies).
This could be best described as a “Russian doll” taxonomic modality,
whereby an ontology can be layered according to different perspectives/requirements
and resolutions, and thus has the ability to conceptually represent specialised data
(i.e. sub-domain within sub-domain).
40
41
As mentioned before in the case of the “Semantic-Web” scenario, an
ontology of a domain is rarely a monolithic unit but is rather constituted of several
ontologies that compose the overall ontology, i.e. an “ontology-library”.
In essence, an ontology-library of a given domain is a combination of
“domain ontologies” and “core-ontologies” and, more rarely, the possibility of
including domain independent “top-level” ontologies (Guarino, 1998) and (Smith,
2002).
The most prominent types of ontologies are “core-ontologies”, describing
more specialist fields of endeavour, such as anatomy, clinical guidelines or
diagnosis, etc. in the medical sphere, and “domain-ontologies” which operate at a
more general level, in the numerous fields of medicine itself.
Thus, the “subject” of an ontology may be the area pertaining to a single
specialist task (core ontology) or a particular subject domain (domain ontology). The
latter allows “communication” with other related domains and contains more general
conceptual elements, while the “core-ontology” contains the representational
elements needed for conceptualising a domain, or sub-domain, according to some
specialist task or data.
The actual ontology results from the combination of several such types of
ontologies, which are linked under an overarching ontological umbrella, i.e. a further
domain ontology which provides a framework of the domain to be mapped out. This
“higher” ontological framework provides the singular ontologies with the appropriate
links or taxonomic subsumption relations and equivalencies.
Basically, in order to fully capture the representations of a knowledge
domain, an ontology-library could be viewed as a synergetic zooming in and out
platform, which would facilitate the search and retrieval process, not only at different
levels of resolution but also, to some degree, across specialist domains. Thus, a “full”
ontology has an in-built modular structure and approach, and each “module” is in
effect created and maintained by different working groups.
41
42
Such a multiple, “modular ontology-library” approach (Gangemi, 2003) is
necessary to achieve the standardisation/integration process, which provides the
degree of “semantic integration” required, as described in section 2 above.
To this end, even more ambitiously, it has been speculated (Smith, 2002) that,
at the other extreme, it could even be possible to engineer upper or “top-level”
ontologies, which are independent of any domain. This type of ontological
construction is called “formal ontology” (Guarino, 1998).
Formal ontology seeks to provide a robust general foundation, by focusing on
categories common to all domains. The development of “top-level” ontologies
engineers theories or specifications of such highly general (domain-independent)
categories as time, space, inherence, instantiation, identity, processes, events, etc. A
top-level ontology seeks to cover a wide range of areas, by providing a foundational
ontological framework for defining categories to be reused by different domains and,
therefore, to be applicable to large communities of users.
This would be useful for integrating multiple specialist sub-domains into
wider knowledge domains, for instance in the context of geographical data:
“an ontology may contribute to the unification of different
conceptualizations of geographical space into an ultimate
geographical ontology. However, this integration can be
accomplished only if these ontologies are embedded within a
more general, top-level ontology, which provides a solid
framework for more specialized applications.” (Guarino, 1998)
Such a formal ontology could consist of an integrated interchange between
“top-level” ontologies, which are domain independent and could thus be reused by
many non-related groups (consisting of general notions such as “species”,
“organism” or even descriptors for space , time, etc.) and “core ontologies” or
“domain ontologies”.
42
43
A top-level ontology is a further attempt to integrate sub-disciplines of a wide
domain, such as physics, biology or medicine. At this level, high-level concepts,
such as ‘organism’ (animal, plant, virus), ‘process’ (photosynthesis) or ‘structure’
(animal or plant anatomy), have to be appropriately organised so that they link to the
more specific, concrete domain descriptions in a structured way.
For instance, models for many different domains need to represent the notion
of time. This representation includes the notions of time intervals, points in time,
relative measures of time, and so on. If one group of ontological engineers develops
such an ontology in detail, others can simply reuse (or adapt) it for their own
domains.
In creating an ontology to manage a bookstore inventory, the ontological
engineers could begin by defining a class of objects (in this case books) which have a
temporal extent (life within the bookstore), a position (such as on a particular shelf)
and physical characteristics (format and size).
In this sort of situation, it would be ideal to adopt and reuse a standard
“upper-level ontology”, which is not created by the book domain experts, and this
would involve significant time and labour saving effects, as well as providing
guidance for adapting categories or classifications, for more specific use.
3.2 Methodology: “Representational Ontology Languages”
As is apparent by now, while in theory, building ontologies would be possible
without aid (a purely manual process), the representational complexity and scale to
be represented, call for a precise streamlining methodology and/or, more accurately,
“representational languages” --- in order to provide a structure or guide for building
the descriptive content and help to make it explicit.
43
44
For representing the shared conceptuality, “ontological languages” and
methodologies have recently been developed to guide the ontological engineering
process. However, as Michael Denny (XML, 2002) accurately sums up the situation,
a complete and standard ontological methodology and/or “ontological language” is
still lacking:
“The problem is that these procedures have not coalesced into
popular development styles or protocols…to the degree one
expects in other software practices. Further, full support for the
latest ontology languages is lacking.”
“The Laboratory for Applied Ontology” (OntoLab, 2003) is a
multidisciplinary research group, which works in collaboration with other
international ontology developers, such as the “W3C” “Semantic Web Activity”, to
develop languages and methodologies for “Ontological Engineering”
Research Scientist, Gangemi (2003) of OntoLab addresses the nature of a
methodology which could guide the engineering process:
“For example, a domain ontology in biology may contain
definitions of ‘species’, ‘organism’, ‘pathway’, ‘anatomical
structure’, ‘biological process’, etc”. Our tools help the encoder
of the ontology decide whether his/her meaning of ‘species’ is
about organisms or classes of organisms; whether the meaning
of ‘function’ is about substances or processes involving
substances.. A user of that ontology (or a software agent using it)
will then be aware of the encoder’s meaning on a transparent
basis.” (p105)
44
45
3.2.1 Representational Ontology Languages
To be able to specify a “shared conceptualisation” explicitly and formally,
some representational languages have recently begun to be developed. Thus, the
“shared conceptuality” can be guided and developed with the aid of “representational
ontology languages”.
An “ontological language” is the process of streamlining and defining data
and, thereby, structuring it into classes, attributes, instances, functions and relations
(along the lines of the process described in section 3).
These representational languages are designed to act as a “template
language”, in order to provide a more precise content framework or conceptual slot,
with the aim of increasing the consistency and logicality of the ontological building
process.
3.2.1.1 DAML+OIL and OWL
Two of the most powerful ontology languages to date, which are considered
to have a high degree of machine-processible capability, are currently being
developed: “DAML+OIL” (DAML, 2003a) and the “Web Ontology Language”
(WC3, 2002). The latter is designed especially for use in the CIS Semantic Web type
scenario.
The DAML ontology “semantic mark-up language” (the DARPA Agent
Markup Language) has been created by the Defense Advanced Research Projects
Agency (DARPA), which has recently been combined with the EU-based ontology
language, OIL: “Ontology Interface Layer” (OntoKnowledge, 2002). Together they
form the DAML+OIL ontology standard.
45
46
The recently established “Web Ontology Working Group” W3C (2002) is
working towards defining a “semantic markup ontology” standard for creating and
managing ontologies within Web documents. W3C’s “Ontology Web Language”
(OWL) is a “language” for defining “Web ontologies” (providing the “soul” or
“neurons” of the Semantic Web).
All three ontological languages, OWL, DAML, and OIL, whether used
together or separately, provide sets of modelling primitives and “conceptual
containers” for creating ontologies.
“In order to write an ontology that can be interpreted unambiguously and
used by software agents we require a syntax and formal semantics” (W3C, 2003c),
for which all three ontology standards provide specifications.
In general terms, “representational ontology languages”, such as DAML+OIL
and OWL have the following modelling features:
“Description logics”, which describe knowledge in terms of classes, or
frames. The meaning of any expression in a description can be described in a
mathematically precise way.
•
• Logical/formal type of “markup” framework, which allows users to operate
within a consistent “tagging” framework, in which to fit or structure their
content/data. The goal is to define a machine-readable markup knowledge
representational language in a formal semantics that clearly delineates what is
entailed in any particular content construct.
46
47
A taxonomic structure for elaborating the codification for organising the
interaction between concepts, relations and attributes, etc. --- hierarchies of
classes and properties based on sub-class and sub-property relations which,
taken together, describe the domain. Classes are built from other classes,
using combinations of intersection (AND), union (OR) and complement
(NOT).
•
•
The semantic markup is designed to provide a basic infrastructure that would
allow a machine to make simple inferences. One of the most important
characteristics of such ontology languages is a degree of support for inference
power. Any expression would entail a certain set of conclusions from any
information system that conforms to that ontology language. The following
is an example of the automatic taxonomic/inference parsing power of an
ontology:
“Parenthood is a more general relationship than motherhood”
and “Mary is the mother of Bill” together allow a system
conforming to DAML to conclude that “Mary is the parent of
Bill”. Accordingly, if a user poses a query such as “Who are
Bill’s parents?” to a DAML search system, the system can
respond that Mary is one of Bill's parents, even though that act is
not explicitly stated anywhere.” (DAML, 2002)
In a formal expression, the ontological statement becomes:
(motherOf subPropertyOf parentOf) (Mary motherOf Bill)
47
48
A DAML+OIL compliant system can conclude: (Mary parentOf Bill)
based on the logical definition of "subPropertyOf", as given in the DAML+OIL
specification.
In summary, the semantic markup languages of OWL, DAML+OIL provide a
framework for “annotating” content/data. They structure domain statements into an
array of classes and properties and are able to represent a complex range of
relationships between them, such as subclassOf, subPropertyOf, inverseOf and, in
addition, a set of defining restrictions, such as oneOf, disjointWith and
intersectionOf.
As such, ontologies built with these ontology languages could represent
axioms describing a domain, such as “all newspapers or magazines are publications”
or “the authors of all publications are writers”. Thus content providers in a domain
would be able to “annotate” their content at successive levels of detail, as required.
Inevitably, such a process relies on content providers annotating their data
with these ontology languages. However, it can be assumed that it is to the benefit of
the content providers that their content be accessed as widely as possible and that
they would, therefore, be willing to make the necessary effort to engineer the
ontology.
4. Conclusion: Towards Babel: Will Ontologies Work?
As can be concluded from the ontological projects discussed above, there is
no doubt that, if successful, an ontological approach for CIS would improve all
aspects of information management and the retrieval process and that it is, therefore,
a desirable enterprise and research endeavour. This is all the more true as the
48
49
problems that an “ontological approach” are designed to tackle are real and destined
to increase with time, severely hampering the effective use of CIS. It is evident that
it is impossible for traditional information management systems to deal with the ever
increasing problem of information overload. It is, therefore, essential for a
practicable solution to be found if the information age is to continue to flourish and it
is to this end that ontologies have been conceived.
Nevertheless, two main critical areas need to be further explored, both related
to the ontological aspect of CIS semantic interoperability: “semantic intelligence”
and “semantic integration”, if future research is to be pursued:
• to what degree is an ontology constructible?
• to what degree and extent is an ontology workable?
Machine processing: the formal aspect
An ontology must be sufficiently formal to enable it to be processed by
computers, while at the same time allowing a sufficiently complex/comprehensive
knowledge representation of the domain. Clearly this is a difficult balancing act,
which the formal ontological languages of OWL and DAML+OIL do not in any way
resolve.
It is difficult to see how a trade-off can be avoided between the two sides of
the equation: gaining in formality and losing in semantic expressibility or vice versa.
Any formal ontological representation, due to its axiomatic nature, will gain
in automatic inference power but will, simultaneously, limit the range of human
semantic expression, endogenous to natural language (where most human knowledge
resides).
In other terms, the creators of an ontology, supporting machine processing
powers, face the dilemma of reconciling two extremes when engineering the
taxonomic ontological structure. At one extreme, there is the need to capture the
49
50
web-like complexity of human semantics. However, this gives rise to the situation
that the ontological conceptual representations are too ambiguous to be “inferred” by
software applications. At the other extreme, explicit formal ontologies are too
inflexible and insufficiently rich and expressive to convey the range and intricacy of
human knowledge --- yet, paradoxically, this is the prime purpose for which an
ontology has been devised.
Working on the assumption that it is feasible for a complete ontology to be
explicitly formalised with “knowledge representation languages”, such as OWL and
DAML+OIL, they would, by virtue of their minimal expressivity, only be capable of
supporting a limited and simplistic set of inferences, which would be far from
adequate to express the human knowledge context.
In the case of the Semantic Web scenario, while it is relatively easy to
surmise its application for simple, automatic tasks, it is daunting to conceive how it
could work in contexts where more complex inferences are demanded by users of
any knowledge domain. However, Tim Berners-Lee seems to imply that, not only is
the semantic integration created by an ontology easily achievable, but that
formalisation is possible without a significant loss of meaning.
The fundamental problem for engineering an ontology is to try to encode the
semantics in a formal expression without sacrificing flexibility, i.e. the “semantic
range” of unstructured data. Basically computability (and simplicity) is, to some
degree, incompatible with semantic expressiveness (depending on the semantic range
to be captured).
Ultimately, a great deal of knowledge communication between humans is not
formally explicit but has a contextual background and is, furthermore, of a tacit
nature. An attempt could be made to make this contextual communication formally
explicit but this is a problem of infinite regress and, therefore, the question arises as
to whether it would even be feasible. The formalisation needed for intelligent
applications for CIS will always be limited and partial, due to the intrinsic social
nature of knowledge and language.
50
51
Semantic Consensus
As far as the second part of the semantic interoperability equation goes, i.e.
semantic integration at the human level, it poses the opposite problem to the limited
expressivity of the formal aspect of machine processing. The semantic complexity
and richness of any domain to be represented is boundless. It is for this reason that it
is important to understand that a conceptualisation has to be shared, not only in the
sense of making it possible for an ontology constructed by different parties to be
reused, but also in the sense of building a consensus. This process of consensus
building, which is extremely difficult to achieve in practice, is the major obstacle
encountered when engineering an ontology.
If, as some might assume, constructing an ontology is purely an objective
process of classifying representations of the knowledge domain, consensus building
would be a relatively easy matter. However, ultimately, all human
knowledge/information is based, to a large degree, on social conventions and
agreements, which relate to a particular knowledge community and its dynamics.
The “Tower of Babel” problem, within CIS knowledge domains, which the
creation of ontologies is intended to resolve, still remains, but at a higher meta-level.
The founder of “Infomis”, Barry Smith, sums the problem up as follows: “Ironically,
the very Tower of Babel conditions which the ontological project was initially
designed to address have been recreated within ontology itself” (Smith, 2002).
The “Babel” factor means that the same domain may be “semantically” sliced
in different ways when addressed from different perspectives of interest/use, as
described earlier. However, it is difficult to see, even with an “ontology-library”
approach, how this multitude of perspectives could be fully and completely
integrated by an ontological framework and, thus, how the pitfalls of previous
information management systems could be avoided.
51
52
It is thus equally hard to imagine how the issues of “semantic heterogeneity”
would be resolved to the satisfaction of all the parties concerned. The conceptual
representation of any CIS knowledge domain does not repose on a neutral
framework, from which to deduce all descriptions and fit them into an explicit array
of classes. This is due to the fact that not all conceptualisations are equal to each
other.
Every aspect of conceptual representation is highly context-dependent. This
contextual background needs to be made explicit, in order for all parties to describe
and define the terminology in a structured way. Without this explicit mapping out
process, a large amount of poorly articulated or ambiguous knowledge would
severely impede the necessary consensus building. The question then remains as to
how to distinguish between the fundamental ontological commitments, arising from
the contextual background, and the more superficial issues, which are relatively easy
to negotiate.
Based on the “Babel” factor, in practice the following consensus engineering
problems can be highlighted:
• Different authors create substantially different conceptualisations of any
domain, despite the fact that their purposes are similar. Differences in
ontological conceptualisation by members of a group do not necessarily
reflect differences in the concepts identified --- it is not so much a difference
in kind but a difference of “degree”. Conceptualisations vary in focus and the
type of emphasis and priority accorded to them by each member of the
engineering team.
• It is difficult to reach an agreement even with regard to what constitute the
most elementary building blocks of any knowledge domain, especially when
representing relationships, due to the different focus or perspectives, from
which the ontological creators generate their conceptualisations.
52
53
• Terms defined in an ontology vary in their reusability across sub-domains.
Some terms are reusable across all sub-domains, whilst others are particular
to a specialised knowledge domain (and thus not reusable).
If, for all the above reasons, ontologies are not as “completely” feasible as the
proponents of ontologies might wish, it is still essential to focus on the “degree” to
which ontologies could still be constructed and used for mapping out knowledge
domains.
While an ontological approach could prove highly useful as a consolidator of
existing specialist CIS knowledge bases, thereby creating a small-scale ontology, the
main problem would seem to arise when such specialist sub-domain ontologies need
to be integrated into the overarching representation of the domain.
In the final analysis, whatever the scale represented, the problems
encountered are no doubt intrinsic to the human knowledge domain. However, more
importantly, they are also contingent on the lack of the right “tools” and environment
for consensus building.
4.1 Mapping Complexity: The Need for Tools
As Ding and Foo (2002a:18) conclude: “It is evident that much needs to be
done in the area of ontology research before any viable large scale system can
emerge to demonstrate ontology’s promise of superior information organization,
management and understanding.”
If the ontological approach for CIS is to achieve a degree of success, and
hence usability, it is essential to develop accessible and efficient tools, that would
provide, at minimum, some “semi-automatic” help. Without such tools, ontological
engineering is an unduly complex, laborious, and time-consuming process --- this is
for some critics the major detraction of the ontological approach.
53
54
“The majority of existing ontologies have been generated
manually. Generating ontologies in this manner has been the
normal approach undertaken by most ontology engineers.
However, this process is very time-intensive, error-prone, and
poses problems in maintaining and updating ontologies. For this
reason, researchers are looking for other alternatives to
generating ontologies in a more efficient and effective way.”
(Ding, and Foo, 2002a:135 )
One of the main problems of manually constructed ontologies is that they are
particularly subject to significant delay in updating their content and maintaining
currency.
The more complex the ontological construction to be made of the domain, the
more essential is the necessity for tools which would ideally streamline the process,
by helping for instance to:
• guide, in a consistent and cohegent way, the “shared conceptualisation” and
representational (and inference) complexity required to cover the range and
appropriate formalisation, as well as providing verification and validation
throughout the engineering process.
• enhance the deductive and inference powers of the creators of the ontology
and thus facilitate consensus building amongst members of the group at each
conceptual stage of the construction process.
• capture the complexity involved in constructing a “true” ontology with a
consistent “template measure” (and formalisation guide), for all participants
throughout the ontological process. In other terms, to acquire, organise and
conceptually visualise the domain knowledge, before and during the building
of an ontology.
54
55
• enable a number of different users to create machine-readable content
without being experts in logic, which is a crucial aspect to the potential
success of the Semantic Web. In other words, “formalisation” or “semantic
mark-up” should be a by-product of normal computer use. Ultimately, much
as in the case of current Web content, a small number of tool creators and
web ontology designers will need to know the details, but most users will not
even be aware that ontologies exist.
It is of the utmost necessity to develop tools which would incorporate such
features. These tools would support and accelerate the ontological process in a
consistent and less ad-hoc manner: inspecting/browsing, codifying, modifying and
maintaining the ontology.
Recently, a range of editing software tools (XML, 2002) have been developed
to help to accomplish some aspects of ontological engineering. Michael Denny
(XML, 2002) portrays the current situation, in what is probably the first concise
survey of ontology tools, termed “ontology editors”:
“Despite the immaturity of the field, or perhaps because of it, we were able to
identify a surprising number of ontology editors -- more than 50 overall”. Although,
as the author states, some are general data modelling tools, such as “Microsoft’s
Visio”, and not specifically designed for the purpose of “ontological editing”, most
of the other editing tools are specific to a particular ontological purpose and/or
domain.
In general, one could conclude that the design of an ideal “ontology editor”
or, better still, viewer, would need to focus on supporting the most laborious and
complex parts of ontological engineering, such as assisting with the arduous task of
maintenance which needs to be done on a regular and ongoing basis, as well as
provididing an “easy” mechanism for the validation process. Both tasks would need
55
56
to be provided by the “ontology editor” in a semi-automatic manner. In fact, all such
tools would need to provide a high degree of semi-automatic mechanisms which, all
combined, would help to consolidate and, therefore, streamline the ontological
engineering praxis.
At present this lack of consistent and coordinated streamlining of the
ontological process is a major drawback. Bearing in mind the scale to be
represented, it would seem impossible that, without some form of semi-automatic
mechanism, semantic consistency/coherence could be maintained throughout the
building of the ontology.
The key to the usability of an “ontological editor” or viewer would be the
ability to organise and manage the taxonomic structure of an ontology through an
accessible interface, which would give a visual representation and enable
manipulation of the ontology's framework (the interlinking concepts and relation
hierarchies, etc.). The use of a multiple tree viewer with expanding and contracting
levels could be a solution.
In effect, this would be similar to a homepage-creation tool like
Dreamweaver or Frontpage. A user could thus choose from a menu to add
information about a person and then choose a relative or a professional colleague,
etc. Users, with the support of the “ontological editor”, could build the semantic
elements of their ontological structures through a visual display.
An “ontological editor”, Protégé 2000, is being developed by Stanford
University (2002), to provide an ontological editing tool along the lines described.
“OILEd” is another “ontological editor”, which has been developed by
Manchester University (2003), and is ready for full use. OILEd allows the user to
build ontologies using the ontological language DAML+OIL.
56
57
In general, as Ding and Foo (2002a/b), in their survey of ontological
engineering aided by tools, where they examine current research groups related to
semi-automatic (and even automatic ontology generation), come to the conclusion
that human input is essential at each stage of the ontological building process and
that tools can only support, and not replace, the expertise of humans.
4.2 Ontological Building Environment
Ultimately, if ontologies are to be practicable, a new field of research into the
ontological approach for CIS could be explored beyond the development of singular
tools. Ontological tools are a useful aid but, by themselves, are limited in
effectiveness for the users in a group. Another aspect needs to be developed: the
building environment itself.
Because of the highly collaborative process that the creation of ontologies
demands, in order to cover the wide scale and semantic mapping out needed, it is
imperative for all the participants to have a propitious environment, in which to
encourage a maximum of collaboration. The members of the group need to work in
close collaboration with each other to make full use of the various streamlining tools
used for specific aspects of the ontological engineering.
In other terms, the process of ontological engineering needs to be embedded
in an environment which would foster what is essentially a collaborative, consensus
building endeavour. Without an accessible “arena” within which to perform the
ontological building tasks, the ontological project is in effect severely impaired, no
matter what tools each member of the group may use. However automated the tools
may become in the future, they cannot suffice on their own without the human
collaborative element.
57
58
Basically, the technology needs to be harnessed to create an environment
which would encourage and cultivate a synergy and synchronisation between the
participants in the building process.
The Ontolingua Server (Farquhar et al, 1997) is probably the first practical
example of what form such an “ontological environment” might take.
The “ontolingua server” provides an interface for users and applications/tools
to access or manipulate ontology-libraries. The “ontololingua server” stores
extensible libraries of sharable/reusable ontological structures which can be browsed
and conveniently accessed. In addition it can integrate editing tools for ontological
engineering. The WWW accessible client/server architecture forms the core of this
“ontology server” type of ontological environment.
The main function of an “ontology server” is to edit, evaluate, publish,
maintain and enable reuse of ontologies, within a remotely accessible environment.
Thus, the main significance of this type of technology is its ability to support and
facilitate collaborative work through the decentralised medium of the World Wide
Web.
In fact, one of the most important aspects of the “ontology server” ontological
environment is that numerous collaborators, regardless of geographical location, can
contribute to the ontological engineering process in a manner analogous to the
“Open-Source” methodology for engineering software. In the ontological context,
the engineering process would be applied not to the “writing” of computer code, but
rather to the semantic representational elements of the “shared conceptualisation”.
58
59
As such, this Web-based “distributed” engineering process could overcome
some of the semantic mapping out problems previously identified. The distributed
nature (from the perspective of the participants accessing the WWW) of ontological
engineering, epitomised by the ontology server, should be encouraged and
decentralised even further, which is precisely the essence of open-source
methodology.
Eric S Raymond (1998) has produced two guiding metaphors to describe the
phenomenon of open-source software, exemplified by the incredible success of the
Linux operating system (which like Microsoft Windows involves millions of lines of
computer code). The “bazaar” style of software development, characterised by
decentralised cooperation, which can lead to increased productivity, reliability and
quality, as opposed to the “cathedral” centralised style, characteristic of Microsoft
software.
With the bazaar style of open-source, the software producer relinquishes
certain intellectual property rights over its software in exchange for other engineering
benefits: “Given enough eyeballs, all bugs are shallow.” This is what Raymond
(1998) calls “Linus' Law” after the creator of Linux. The basic advantage of open-
source methodology is extremely simple: putting a programme into open-source
mode is the ultimate code review, producing an ongoing “peer review” of
suggestions and code revision.
The “bazaar” model of development and engineering has another essential
feature. The code of a programme is made available and “free”, i.e. open code, to
any interested party, thereby allowing many users to contribute, modify and improve
59
60
it, without proprietary restrictions. The philosophy of open-source became so
powerful and successful because:
“Linux was the first project to make a conscious and successful
effort to use the entire world as its talent pool. I don't think it's a
coincidence that the gestation period of Linux coincided with the
birth of the World Wide Web...Linus was the first person who
learned how to play by the new rules that pervasive Internet
made possible.” (Raymond, 1998)
The supporters of the open-source paradigm argue that from this distinct
philosophy emerges a methodology which is responsible for producing software,
superior in quality to that of proprietary applications.
To draw a final conclusion, it would seem that the wide scale and complexity
necessary for mapping out ontologies, could profit from the open-source paradigm.
The open-source concept of software engineering could be beneficially transposed to
the ontological semantic engineering of the conceptual representations required, as it
involves a similar collaborative and “open” process to support the consensus building
mechanism (as opposed to programming computer code).
An “open-source” ontological environment would in effect be constituted of
three main components:
• Remote access by a vast number of collaborators, who would contribute, review
and update the material at hand, on a far larger scale than is currently possible.
• Multiple parties can contribute online from wherever they are located and can be
automatically informed of each other’s activities.
60
61
• The ontology, both during the construction process and after, should be made
“freely” available, i.e. “open”, in the non-proprietary sense of a lack of
centralised ownership, to encourage modifications and updates from as many
contributing sources as possible.
The next section is a brief proposal of how such an open-source ontological
environment might work in practice.
4.3 OntoP2P: Consensus-Building Environment
Maybe it would be feasible to use a peer-to-peer (P2P) technological driven
methodology to support the consensus driven “shared conceptualisation” process as
well as the maintenance of ontologies.
The aim of this “OntoP2P” system would be to assist the “ontological
commitment” process amongst the members of a domain (i.e. to facilitate the inter-
subjective negotiation procedure) and aid the building of more semantically complete
types of ontologies, at the “core” and “domain” levels.
This “OntoP2P” is an intensively collaborative methodology. The more
individuals (i.e. “peers”) involved in the process, the more complete the ontological
construction becomes.
In basic terms, this “OntoP2P” approach relies on building one centralised
ontology from multiple decentralised ontologies, which are located in singular
computers/“peers”, which together form an overall “OntoP2P” network.
61
62
As such, the decentralised ontologies located in the various “peer” units,
which make up the network, could be built using such ontological languages as OWL
or DAML+OIL. The subsequent ontologies produced by all the participating “peers”
(whatever their “taxonomic levels” or degrees of completeness) would be entered
into the “OntoP2P” network to “compete” with each other.
In addition, a “neural network” behaviour would be embedded in this
“OntoP2P” framework. That is to say, the ontological content/input of each
individual “peer” entering the network would be monitored and logged. The
“OntoP2P” neural network would then output an overall index of the ontological
content of all the competing “peers”. This index could be consulted to see if a
consistent/consensus order emerges (what conceptual terminology is common to all,
new fundamental terminologies, repetitions, etc.).
In short, one ontology of a particular domain would be synthesised,
maintained and continuously updated throughout this “OntoP2P” networked
environment, which would easily allow an iterative process.
Ontologies could be built and treated more in the manner of philosopher
Wittgenstein's “Language-Games” (1953) vision of the constitution of language, as
being essentially not monolithic but rather a multiple structure (in use and task).
Word count: 14,896
62
63
Bibliography (AS) Applied Semantics (2003). About Applied Semantics (Online). www.appliedsemantics.com (Accessed 27 May 2003). Ashenhurst, R.L. (1996). Ontological aspects of information modelling. Minds and Machines, 6, 287-394. Bateman, J.A. (1995). On the relationship between ontology construction and natural language: a socio-semiotic view. International Journal of Human-Computer Studies, 43, 929-944. Berners-Lee,T. and Miller, E. (2002). The Semantic Web lifts off (Online) http://www.ercim.org/publication/Ercim_News/enw51/berners-lee.html (Accessed 22 May 2003). Benjamins, V.R. et al (1999). (KA)2: building ontologies for the Internet -- a mid-term report International Journal of Human-Computer Studies, 51, 687-712. Berners-Lee, T. et al (2001). The Semantic Web. Scientific American. (Online) "http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21" (Accessed 22 May 2003). Berners-Lee, T. and Fischetti, M. (1999). Weaving the web. Harper San Francisco: USA. Borst, W.N. et al (1997). Engineering ontologies. International Journal of Human-Computer Studies, 46, 365-406. Burgun, A. et al (2001). Issues in the design of medical ontologies used for knowledge sharing Journal of Medical Systems, 25(2), 95-106. DAML (2003a). DAML+OIL (Online). http://www.daml.org/about.html (Accessed July 11 2003).
63
64
DAML (2003b). DAML tools (Online). http://www.daml.org/tools/#all (Accessed July 11 2003). DAML (2002). Why use daml (Online). http://www.daml.org/2002/04/why.html (Accessed 13 July 2003). DAML (2001a). DAML+OIL index (Online). http://www.daml.org/2001/03/daml+oil-index (Accessed 13 July 2003). DAML (2001b) DAML examples: http://www.daml.org/2001/03/daml+oil Ding, Y. and Foo, S. (2002a). Ontology research and development Part 1: a review of ontology generation Journal of Information Science, 28 (2), 123/136. Ding, Y. and Foo, S. (2002b). Ontology research and development Part 2: a review of ontology mapping and evolving Journal of Information Science. 28 (5), 375/388. Ding, Y. and Fensel, D. (2000). Ontology library systems: the key to successful ontology re-use.(Online). www.cs.vu.nl/~ying,~dieter (Accessed 30 June 2003). Everett, J.O. et al (2002). Making ontologies work for resolving redundancies across documents Communications of the ACM. 45(2), 57 69. FAO (2003a). FAO FI (Online). http://www fao.org/fi (Accessed 29 May 2003). FAO (2003b) AOS (Online). http://www.fao.org/agris/aos/ (Accessed 29 May 2003). FAO (2002a) FAO Asfa (Online). http://www fao.org/asfa (Accessed 29 May 2003). FAO (2002b). FAO Agrovoc (Online). http://www fao.org/agrovoc (Accessed 29 May 2003).
64
65
Farquhar, A. et al (1997). The Ontolingua Server: a tool for collaborative ontology construction. International Journal of Human-Computer Studies. 46, 707-727. Gaines, B. (1997). Using explicit ontologies in knowledge-based system development. International Journal of Human-Computer Systems. 46, 181-9. Gangemi, A. (2003). Some tools and methodologies for domain ontology building. Comparative and Functional Genomics. 4, 104–110. Google (2003). Google acquires Applied Semantics. (Online). http://www.google.com/press/pressrel/applied.html (Accessed 24 July 2003). GO (2003). Gene Ontology Consortium (Online). http://www.geneontology.org/ (Accessed 1 July 2003). Gruber, T.R. (1995). Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies. 43(5/6), 907-928. Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition. 5,199-220. Gruninger, M. and Lee, J. (2002). Ontology applications and design: Introduction. Communications of the ACM. 45(2), 39-41. Grüninger, M. and Usehold, M. (1996). Ontologies: principles, methods and applications. Knowledge Engineering Review. 11(2), 110-127. Guarino, N. and Welty, C. (2000). A formal ontology of properties. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 30 June 2003). Guarino, N. et al (1999). OntoSeek: content-based access to the web. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 30 June 2003).
65
66
Guarino, N. (1998). Formal ontology and information Systems. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 30 June 2003). Guarino, N. (1997). Understanding, building and using ontologies. International Journal of Human-Computer Studies. 46, 293-310. Guarino, N. (1995). Formal ontology, conceptual analysis and knowledge representation. International Journal of Human-Computer Studies. 43(5/6), 625-640. Harris, M. and Parkinson, H. (2002). Conference report: standards and ontologies for functional genomics: towards unified ontologies for biology and biomedicine. Comparative and Functional Genomics. 4, 116–120. Holsapple, C.W. and Joshi K.D. (2002). A collaborative approach to ontology design. Communications of the ACM Vol. 45(2), 42-47. Infomis (2003) (Online). The institute for formal ontology and medical information science. http://www.ifomis.uni-leipzig.de/ (Accessed 2 June 2003). Kokla, M. and Kavouras, M. (2001). Fusion of top-Level and geographical domain ontologies based on context Formation and complementarity. International Journal of Geographical Information Science.15(7), 679-687. Kwasnik, B. (1999). The role of classification in knowledge representation and discovery. Library Trends. 48(1), 22-47. Manchester University (2003) OILED (Online). http://oiled.man.ac.uk/ (Accessed 7 July 2003). Marla, M. (2002). Applied Semantics: making meaning matter. E-Content Magazine. June 2002, 34-39. McCray, A.T. (2003). An upper-level ontology for the biomedical domain Comparative and Functional Genomics. 4, 80–84.
66
67
Nilsson, N. (1991) Logic and artificial intelligence. Journal of Artificial Intelligence. 3, 31-55. Onefish (2003). Onefish org (Online). http://www onefish.org (Accessed 23 April 2003). OntoKnowledge (2002). Welcome to oil (Online). http://www.ontoknowledge.org/oil/ (Accessed 13 July 2003). OntoKnowledge (2000) Harmelen, F,V. and Horrocks, I. (Online). FAQ: OIL, the Ontology Inference Layer for the semantic web. www.ontoknowledge.oil/faq (Accessed 12 July 2003). OntoLab (2003). Laboratory for Applied Ontology, Institute of Cognitive Sciences and Technology, National Research Council. (Online) http://www.ladseb.pd.cnr.it/ (Accessed 30 June 2003). OntoWeb (2003) About OntoWeb (Online). www.ontoweb.org (Accessed 30 May 2003). Paling, S. and Qin, J.(2001). Converting a controlled vocabulary into an ontology: the case of GEM. Information Research. 6(2),120-38. Partridge, C. (2002). The role of ontology in integrating semantically heterogeneous databases. http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Online). (Accessed 5 July 2003). PC: The Plant Ontology Consortium (2002). The Plant Ontology Consortium and plant ontologies. Comparative and Functional Genomics. 3, 137–142. Pisanelli, D.M. et al (2002). Ontologies and nformation Systems: the marriage of the century? Proceedings of Lyee Workshop, Paris. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 3 July 2003).
67
68
Pisanelli, D. M. et al (2000). The role of ontologies for an effective and unambiguous dissemination of clinical guidelines (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 12 July 2003). Ratsch, E. et al (2003). Developing a protein-interactions ontology. Comparative and Functional Genomics. 4, 85–89. Raymond, E. S. (1998). The Cathedral and the Bazaar. (Online). http://www.firstmonday.org/issues/issue3_3/raymond/index.html (Accessed 23 June 2003). Schlosser, M. et al (2002). A Scalable and ontology-based P2P infrastructure for Semantic Web Services IEEE International Conference on Peer-to-Peer Computing. (Online). http://citeseer.nj.nec.com/schlosser02scalable.html. (Accessed 7 July 2003). Smith, B. (2002). From classical metaphysics to medical informatics (Online). http://ontology.buffalo.edu/smith (Accessed 5 July 2003). Standford University (2002). Protege editor (Online). http://protege.stanford.edu/ (Accessed 12 July 2003). Udo, H. (2003). Turning informal thesauri into formal ontologies: a feasibility study on biomedical knowledge re-use. Comparative and Functional Genomics. 4, 94–97. Usehold M. et al (1998). The enterprise ontology. Knowledge Engineering Review.13, 31–89. W3C (2003a). OWL, Web Ontology Language: overview (Working Draft 31 March 2003) (Online). http://www.w3.org/TR/2003/WD-owl-features-20030331/ (Accessed 30 June 2003). W3C (2003b). OWL, Web Ontology Language: test Cases (Working Draft 28 May 2003). (Online). http://www.w3.org/TR/2003/WD-owl-test-20030528/ (Accessed 30 June 2003).
68
69
W3C (2003c). OWL Web Ontology Language guide (Working Draft 31 March 2003). (Online). http://www.w3.org/TR/2003/WD-owl-guide-20030331/ (Accessed 30 June 2003). W3C (2002). Web-Ontology (WebOnt) Working Group (Online) http://www.w3c.org/2001/sw/WebOnt/ (Accessed 20 July 2003). W3C (2001a). Annotated daml-oil ontology markup (Online). http://www.w3.org/TR/daml+oil-walkthru/ (Accessed 25 June 2003). W3C (2001b). Semantic Web (Online). http://www.w3.org/2001/sw/ (Accessed 3 June 2003) Weinstein, P.C. and Birmingham, P. (1998). Creating ontological metadata for digital library content and services. International Journal on Digital Libraries. 2(1), 19-36. Weinstein, P. (1998). Ontology-based metadata: transforms the MARC legacy. In Akscyn, F. and Shipman, F,M. (Edit). Digital Libraries 98: third ACM Conference on digital libraries. ACM Press. 254-263. Wittgenstein, L. (1953) Philosophical Investigations Blackwell Ltd Oxford 1991. XML.Com (2002). Denny, M. Ontology building: a survey of editing tools (Online). http://www.xml.com/pub/a/2002/11/06/ontologies.html (Accessed 20 July 2003). XML.Com (2000). Dumbill, E. Tim Berners Lee's lecture, Conference Xml 2000. (Online). http://www.xml.com/pub/a/2000/12/xml2000/timbl.html (Accessed 23 June 2003).
69