the creation and use of ontologies ... - information...

1

THE CREATION AND USE OF ONTOLOGIES FOR COMPUTERISED INFORMATION SYSTEMS:

A CRITICAL EXPLORATION

A study submitted in partial fulfilment of the requirements for the degree of

Master of Science in Information Management

at

THE UNIVERSITY OF SHEFFIELD

by

BRYN LOBAN

September 2003

1

2

TABLE OF CONTENTS

1. The Tower of Babel: What is an Ontology? 1

1.1. A Parasitical Relationship: Legacy Systems 5

1.2. Ontological Form: Expression 6

1.3. Current Ontological Projects 9

2. The Babel Syndrome: Why use Ontologies? 11

2.1. Human Interoperability: Semantic Integration 14

2.1.1. The Promise of Better Information Management 17

2.2. Machine Interoperability: Semantic Intelligence 19

2.2.1. The Semantic Web 21

2.2.2. The Formal Aspect 25

3. Building the Tower of Babel: Ontological Engineering 27

3.1. Engineering of Ontologies: General Guidelines 28

3.1.1. Gathering and Identification of the Raw Material 29

3.1.2. Shared Conceptualisation 30

3.1.3. Conceptual Scope: Multiplicity and Singularity 35

3.2. Methodology: Representational Ontology Languages 40

3.2.1. Representational Ontology Languages 42

3.2.1.1. DAML+OIL and OWL 42

4. Conclusion: Towards Babel: Will Ontologies Work? 45

4.1 Mapping Complexity: the Need for Tools 50

4.2. Ontological Building Environment 54

4.3. OntoP2P: Consensus-Building Environment 58

2

3

Abstract: The dissertation undertakes a critical exploration of the role, creation

and use of ontologies for Computerised Information Systems (CIS) and seeks to

provide a framework for the issues relevant to an ontological conception of CIS.

Based on the data obtained from the main research projects currently being carried

out, as portrayed in the literature, this dissertation will critically explore ontologies

for CIS, for which research and practice on a major scale is still in its infancy. No

significant comparative data on either the usability or the construction aspects of

ontologies for CIS is yet available. Bearing these points in mind, the investigation

will be based on a critical analysis of various international projects and research

groups. Ultimately, the critical exploration will focus on mapping the topology of

current research and drawing conclusions from an analysis of the data obtained, in

order to establish the nature and usability of ontologies for CIS. The methodology

used is inductive, i.e. reasoning from particular evidence arising from case

situations, as reported in the literature, and drawing general conclusions based on a

systematic analysis of the evidence. The dissertation starts with a definition of the

subject and thus provides a detailed outline of the nature of an ontological approach

to CIS. Then an in-depth exploration is carried out of the reasons why it is

considered necessary to develop and construct ontologies, supplemented with a

detailed discussion of the problematic which the ontological approach is attempting

to overcome. In addition, a systematic analysis is made of how the ontological

approach is being built and achieved, both conceptually and in practice. The main

critical conclusions will be drawn in the last section as the various aspects of

ontological expression defined are interdependent. Several conclusions will be

drawn by identifying the problems encountered and, by implication, deducing

potential solutions for a way forward that may be of interest for future development

and research.

3

4

1. The Tower of Babel: What is an Ontology?

The term “Ontology” was developed in the field of “artificial intelligence”,

which took the concept from philosophy and applied it to computerised or robotic

systems. Subsequently, the rise of highly powered, networked and online data

repositories has, in recent years, prompted increased research interest in “ontology”

applied to the more circumscribed world of “computerised information

systems”(CIS) (Guarino, 1998).

On the most basic and philosophical level, ontologies are nothing new. Any

system, the function of which is to classify and manage information, by describing

and thereby representing data as an “abstract” model expression, can be viewed as an

“ontology”.

As such, the major “ontologies” in use pre-date the present use of the concept

of ontology. There are many “classification” systems in use, without which it would

be impossible to manage, search and retrieve information residing in various data

repositories. These information management systems are expressed mostly in the

form of thesauri, taxonomies, controlled vocabularies and directories.

These so-called established “ontologies” (what would now be called

“lightweight ontologies”) range from comprehensive library systems, like the

“Dewey Decimal Classification” system, the Yahoo and DMOZ Web directories and

schemas for databases, to current metadata systems like “Dublin Core” and others,

which manage information online as well as offline.

4

5

A clear, unequivocal distinction of what an ontology is, and how it differs

from other information management systems, is not easy to define. This is due to the

fact that, in practice, the expression of an ontology (i.e. the form) is not so different

from the traditional information management forms mentioned above.

However, ontology, in the ambit of CIS, has recently (since the early 1990’s)

taken on a new and more distinct definition and role. This shift of emphasis has

taken place due to the increased accessibility and speed of retrieval brought about by

new information technologies, such as the WWW, which have simultaneously

brought the “information overload” problem to critical levels.

As Guarino (1998) expresses it, an ontology:

“… in the simplest case…describes a hierarchy of concepts

related by subsumption relationships; in more sophisticated

cases, suitable axioms are added in order to express other

relationships between concepts and to constrain their intended

interpretation.”

It is the latter, more sophisticated, case that is presently used as being a more

appropriate definition of the theory and practice of ontology and which distinguishes

it from other “classifying” systems in existence. Ontology enhances traditional

thesauri and information management systems, by trying to develop a deeper

semantics within “digital objects”, both conceptually and relationally.

The current interest in ontology lies in its potential to describe or represent

more of the “semantics” of a field of knowledge (or “metadata” that explicitly

represents the “semantics” of a data domain) in both a human-understandable way

(by establishing consistency and consensus) and, more importantly, in a computer-

processible way. It is this latter sphere which is one of the more ambitious and

critical aspects of ontological practice for CIS.

5

6

The most widely used definition of ontologies for “Computerised Information

Systems” is given by Gruber (1993:199): “An ontology is a formal, explicit

specification of a shared conceptualization” .

Conceptualization means an abstract model of data in a CIS “world”, which

identifies the relevant and, therefore, main concepts of the specific information

domain. Formal refers to the fact that the ontology must be machine-understandable

(i.e. the data has to be structured in a “logical content” way). Explicit means that the

type of concepts used and the constraints on their use are explicitly or axiomatically

defined. Finally, Shared signifies that an ontology is about consensual “knowledge

representation”, which is not restricted to an individual but must be accepted by a

group of “agents”.

Another similar definition of an ontology is: “a shared formalization of a

conceptualization of a domain” (Grüninger and Usehold, 1996:111). A domain

refers to a specific subject area of human knowledge/information, such as medicine,

real estate, finance, business, etc.

An ontology is a classification methodology for formalising a knowledge

domain in a structured way. The true core of an ontology revolves around the theory

and practice of the “shared conceptualization”: a “neutral” description or theory of a

given domain which is acceptable to, and can be reused by, all information gatherers

in a particular domain. An ontology is designed with the aim of identifying and

establishing consensus through “semantic commonalities” within a circumscribed

knowledge domain.

In other words, a definition of an ontology as a “shared conceptualisation” is

an extreme, or extended, form of the information management practices previously in

use. The major distinction, however, is that an ontology seeks to explicitly

represent/model any domain of knowledge contained in CIS in a far more complex

6

7

manner by capturing more of the meaning/semantics than has hitherto been achieved.

An ontology is engineered to permit a higher expressive structure or “knowledge

representation”, i.e. to have the ability to express relationships between data entities,

as well as the entities/concepts themselves.

An ontology’s “shared conceptualisation” is an “engineering artefact”, which

is basically: “constituted by a specific vocabulary used to describe a certain reality,

plus a set of explicit assumptions regarding the intended meaning of the vocabulary

words” (Guarino, 1998). It is not enough for a “real” ontology to describe

data/content with metadata tags and controlled vocabularies, etc. An ontology also

needs to represent the background or context and, therefore, the relations between the

data units (i.e. the “semantics” involved). Furthermore, this needs to be expressed in

a formally explicit manner in order to be readable by machines.

Ultimately, if an ontology is going to be an improvement over traditional

information management systems, for humans and, especially, for machine

processing, it must capture the intended meaning of concepts and statements in any

domain of interest. To achieve this an ontology must aim to bring out the semantic

commonalities within extensive bodies of detailed and specialised knowledge.

Furthermore, it must be able to express/represent some degree of tacit or “meta-level

knowledge” (i.e. implicit and background knowledge).

A simple example serves to illustrate this. In Newton's law of physics, the

meaning of the expression “F=ma” states that force is the product of mass and

acceleration. However, this association can only be made by representing a web of

background/context knowledge. The contextual semantics needs to be represented

explicitly when engineering the ontology: it is a mathematical expression with its

related concepts of equations, parameters and variables, which in addition invokes a

significant body of expert knowledge in terms of the concept called ‘‘mass’’; or that

the formula “F=ma”, in this context, is not the same as the expression which means

electrical voltage (Ohm’s law), and so on, along the semantic chain.

7

8

The knowledge that is required to understand Newton’s law, which “means”

and describes the acceleration of an object under the influence of a force, is not

obvious, either for a computer for which it merely represents a string of characters, or

for humans who need to explicitly represent the main conceptuality of the physics

domain, to which they, as a group, belong and on which they have to reach

consensus.

Therefore, an ontology of the Physics domain would need to represent, in an

explicit manner, a selection of the “principal” concepts and semantic background

underlying the system of Physics: for instance, the understanding that a formula can

be a mathematical description of a process or that mathematical variables represent

certain quantities related to Physics.

Basically, an ontology is designed to be a reusable construct or information

management system, which represents the “common” semantic assumptions

underlying a “knowledge base” within the same domain and/or sub-domains.

1.1 A Parasitical Relationship: Legacy Systems

Another way to define an ontology is that, in practice, it is rarely built from

“free”, non-structured documentation. Most current CIS retrieve and manage

information with the aid of taxonomies, thesauri and controlled vocabularies, etc.

Currently the main role and function of ontologies is to synthesise/integrate these

various information management structures. In essence, the true nature of an

ontology is to operate at this higher meta-level, due to the fact that it homogenises

and/or merges pre-existing knowledge bases.

8

9

An ontology acts as a “neutral” meta-level description/representation of a

knowledge base, by engineering shared/common assumptions underlying that base.

By operating on this high meta-level, an ontology should provide a more synthetic,

“neutral description” or theory of a given knowledge domain, which can thus be

accepted and reused by all the information gatherers using that particular domain.

1.2 Ontological Form: Expression

In theory, an ontology should represent, to the extent possible, the semantic

complexity of human knowledge. However, in practice, this is extremely difficult to

achieve. Gruninger and Usehold (1996) describe the various “forms” of ontologies

across a spectrum, from more informal (expressed in natural language vocabularies

with the relations between them) at one extreme, to rigorously formal (refined terms

with theorems and axioms, etc.) at the other extreme, and various combinations in

between.

Usehold has a useful definition of what form or expression an ontology may

take in practice:

[an ontology] “may take a variety of forms, but necessarily it will

include a vocabulary of terms, and some specification of their

meaning. This includes definitions and an indication of how

concepts are inter-related which collectively impose a structure

on the domain and constrain the possible interpretations of

terms.” (Usehold et al, 1998:32)

9

10

An ontology, like any other “taxonomic” structure in use, may be expressed

as an interface mapping out concepts, instances and the relations between them.

Moreover, an ontology may have a more linguistic aspect to its taxonomic structure,

resembling previous CIS “controlled vocabularies” and thesauri formats.

A commercial company, “Applied Semantics” (AS, 2003) illustrates this with

its CIS named CIRCA: “…the soul of CIRCA is its ontology”, which is described as:

“... consisting of millions of words, meanings, and their

conceptual relationships to other meanings in the human

language…with more than 1.2 million words, half-a-million

concepts, and tens of millions of relationships - CIRCA matches

words and phrases to its ontology, performs linguistic analysis,

disambiguates them into meanings, and weighs those meanings

by importance….” (Marla, 2002:35)

Typically, an ontology identifies the important classes (categories of objects)

of a domain and organises these classes in a sub-class hierarchy. Each class is

characterised by properties that are shared/inherited by all elements in that class,

which, taken together, provide detailed, consistent distinctions in a holistic structure.

Thus, an ontology, like any traditional taxonomic form, illustrates associative

and hierarchical relationships among concepts, with the major distinction that it

embodies a far more elaborate taxonomic framework. In theory at least, this

ontological taxonomy should incorporate machine processing powers, i.e. automatic

inference and deduction rules.

10

11

Tim Berners-Lee sums this up succinctly:

“The most typical kind of ontology for the Web has a taxonomy

and a set of inference rules… the taxonomy defines classes of

objects and relations among them. For example, an address may

be defined as a type of location, and city codes may be defined to

apply only to locations, and so on.” (Tim Berners-Lee et al,

2001)

The goal of an ontology is to seek to offer a collection of terms and

relationships within a homogeneous structure, i.e. an explicit encoding of knowledge

in a domain, or that spans sub-domains. Such knowledge-representation needs to

have the potential to be shared and reused by different information systems

communities (acting as a “dictionary” of the specific domain).

Ultimately, this “ontological representation” is the consensual agreement on

the concepts and relations characterising the specific information of the domain in

question, resulting in a shared understanding of a domain that is processible by both

computers and human beings. This dual aspect will be explored in more detail

further on.

In conclusion, it is the purpose for which the ontology will be used that will

eventually determine its final engineered form. Recent research offered in the

literature focuses purely on the more developed and complex forms of ontologies, as

they are crucial for covering the wide scale needed for current CIS applications. This

is what is referred to in Gruber's initial theoretical definition of ontologies.

11

12

1.3 Current Ontological Projects

At this stage, a review of some of the current research projects in course is

necessary, in order to understand the implications of ontologies for CIS --- it would

be no exaggeration to say that the praxis of ontology is an attempt to map out all

fields of human knowledge containable within a CIS.

The potential of ontologies to provide an “objective” specification of a

domain of information, by representing a consensual agreement on the concepts and

relations characterising the way knowledge in that domain is expressed, is aimed at

supporting information management in various fields.

The following is a brief outline of various ontological research projects,

spanning several knowledge domains (some features of these projects will

subsequently be explored in greater detail).

As reported in the literature, several research groups are currently involved in

the creation of ontologies in the biomedical field. The current most ambitious

project is “IFOMIS” (2003), the goal of which is to develop a “formal ontology” that

will be applied to the “whole” domain of medical science (Smith, 2002).

Another development in the biomedical domain is the “Gene Ontology

Consortium” (GO, 2003), which aims to produce a “controlled vocabulary” that can

be applied to all organisms, taking into account the fact that knowledge of gene and

protein roles in cells is in a perpetual state of evolution. Representing genetic data

necessitates a flexible ontological structure --- a structure that is capable of changing

as the knowledge of genetics changes. The Gene Ontology (GO) is used to

“annotate/describe” gene data, at various levels, mainly in three knowledge domains:

the molecular function, the biological process and the cellular component.

12

13

Several biomedical research groups have annotated their databases according

to the “controlled vocabularies” contained in the ontologies produced by the GO

Consortium.

In a similar vein, the goal of the “Plant Ontology Consortium” (PC, 2002) is

to produce an ontology that can be applied to plant-based database information.

A medical ontology project is currently underway, with the aim of building a

wide scale ontology called the “UMLS Semantic Network” (McCray, 2003).

The “Agricultural Ontology Service Project” (FAO, 2003b) includes several

ontological projects based on the task of aggregating agricultural information.

One related project is the “Fishery Ontology Service” (FOS) which is a FAO

based project (Pisanelli et al, 2002) for the “fishery” information domain.

Research has been carried out on a potential ontology for the geographical

domain (Kokla and Kavouras, 2001).

“Applied Semantics” (AS, 2003) is selling its various information

management services to the pharmaceutical, biotechnology and financial service

industries, etc., based on its ontology called “CIRCA”.

One interesting aspect of the CIRCA ontology is that it has been bought by

the most effective search engine to date, Google, in April of this year (Google,

2003). This recent development indicates the importance that ontologies seem to be

acquiring for the future of CIS.

The existence of such projects leaves the proponents of ontologies with the

idea that no future computerised information systems can be designed without

adopting an ontological approach, or as, Nicola Guarino, the President of one of the

main ontological research groups, OntoLab (2003), says, without binding CIS “to the

perspective of ontology driven information systems.” (Guarino, 1998).

13

14

2. The Babel Syndrome: Why use Ontologies?

In the ambit of CIS: “We design ontologies so we can share knowledge with

and among...agents.” (Gruber, 1995:908).

In the same way that a group of people, working together, need to have an

“agreement” on the basic meaning and definition of the words they employ in their

communications, CIS, as a community of “agents” (whether human or machine),

need a common mechanism for agreeing on the precise semantics of terms, in order

for them to be able to effectively communicate/exchange their data/content.

Data repositories use specific classification terminology and concepts to

represent and process the information or data/content that is received and stored within

a given knowledge domain, but this is not necessarily understood by all “agents”.

Indeed, CIS have always faced the problem of what might be termed the “Tower

of Babel syndrome”: the multiplicity of languages creates the problem of

communication and understanding, making the act of “translation” necessary. In the

ontological context, this takes the meaning of constructing an intermediary or bridge,

i.e. a representational “lingua franca” or one “meta-language”.

“Ontological driven” CIS are necessary because “if no man is an island”, “no

system is an island” either. What could be deduced from this is that the phenomenon of

“globalisation” in the economic and social sphere can only continue to develop if there

is an equivalent form of “globalisation” in the CIS sphere. In order to share and reuse

data, from one CIS data repository to another, there is a growing need for data or

knowledge integration.

14

15

To be more precise, CIS interoperability has two aspects related to the

capability of information systems to exchange/share data. One is purely syntaxical,

i.e. technical/functional interoperability. The other, more important, aspect of CIS

interoperability is the semantic component, i.e. the need for “semantic

interoperability” (the representation of the content of the data repositories).

“Semantic interoperability” is essential in order for CIS to exchange information

received from other systems. This is the main role for which ontologies are

designed --- to be used by a group of people which shares an essential need to

communicate information through the medium of CIS data repositories in their

specific field of interest.

In summary, based on the above observations, two principle applications for

“semantic interoperability” in CIS can be highlighted: “semantic integration” at the

human level and “semantic intelligence” at the machine level. Both these inter-

related aspects will now be explored in detail.

2.1 Human Interoperability: Semantic Integration

In practice, the effective demand for interoperative CIS comes from groups or

organisations, such as the medical, scientific or business domains, which need to

share a considerable quantity of content/data within their specific domain.

Human knowledge creation is caught in an eternal tension between the

effectiveness of specialist groups, acting independently, and the need for them to

integrate with the wider domain to which they belong. Most commonly, a specialist

group may develop “innovations” and this tends to produce a sub-domain. How the

resulting data is classified by their information management system is not necessarily

understood, or interpreted, in the same way by other information systems in the

domain.

15

16

In more general terms, when, for the sake of accessibility (mainly to facilitate

the search and retrieval process), an attempt is made to integrate information from

different CIS repositories, incompatibilities arise in the terminological or conceptual

representation of the content/data. Partridge (2002) states that:

“More and more enterprises are currently undertaking projects

to integrate their applications. They are finding that one of the

more difficult tasks facing them is determining how the data from

one application matches semantically with the data from the

other applications.”

Semantic heterogeneity may occur in CIS in instances where there are two or

more systems of concepts/terms covering, more or less, the same universe of

information, or when information expressed in one system cannot be exactly

expressed in the other, or when other semantic mismatches occur.

“Not only is there a need to determine when different databases

are talking about the same thing (within a paradigm) - but also

of determining how to map the things that exist according to one

paradigm into the ‘same but different things’ that exist according

to another (across paradigms). There is also the task of

determining which paradigm underlies each database and which

should underlie the unified database.” (Partridge, 2002)

In the final analysis, an ontology is a unified framework or paradigm, which

acts as a common reference “taxonomy” As such, it is designed to resolve (or

reduce) the conceptual or terminological problems encountered when integrating

disparate data. An ontology aims to develop an homogeneous base from existing

heterogeneous terminological sources, by establishing a semantically explicit

conceptual consensus.

16

17

“Semantic interoperability”, by the means of “semantic integration”, is

simultaneously the aim and result of a successful ontology, which is, itself, the

outcome of an engineered shared conceptualisation. Thus, the underlying premise of

this engineering praxis is that it should be possible to construct a classification

system by “cross-calibrating” disparate CIS data residing in a specific domain, as if

translating from one language into another, and establishing a benchmark taxonomic

framework.

In practice, this “semantic integration” process would need to involve a

substantial “semantic equaliser” element, in order to provide “relations” and

equivalences, as well as ensuring the absence of polysemy, with the direct result of

achieving a certain degree of semantic compatibility with the other CIS content

repositories in the domain. This would facilitate communication and sharing

between different CIS within the domain, enabling them to be accessed by different

applications/agents.

The following examples highlight, in greater detail, the nature of the

problematic which an ontology is designed to resolve:

In the medical domain, as in many other fields, nomenclatures (e.g.

standardised, controlled vocabularies) have been used for the purpose of managing

and retrieving its knowledge base. The domain of medicine has historically used:

“…many terminological systems, i.e. classifications, thesauri, vocabularies,

nomenclatures and various unstructured coding systems, each of them being

designed for a specific purpose.” (Burgun et al, 2001:96). Each of the above formats

for accessing and retrieving information in the medical domain reflects: “the

diversity of goals, approaches, and achievements across the different medical

communities according to their main orientation, which may be patient’s care,

research or public health.” (Burgun et al, 2001:96).

17

18

These various perspectives make it difficult to agree on even accepted

medical terminological classifications. This has the implication that, in most cases,

not only do medical terms and definitions differ between groups but, more

importantly, different groups use identical terms with different meanings. Even a

concept such as “gene”, which is common to all groups, is used with a different

semantic focus by the major genomic databases (Burgun et al, 2001). In summary,

different databases may use identical “labels” but with different meanings;

alternatively the same meaning may be expressed using different labels.

The same fundamental problem arises in the geographical data domain:

“In order to achieve information exchange between different

geographical databases, it is necessary to develop suitable

methods for formally defining and representing geographical

knowledge. However, the plethora and diversity of data

standards and terminologies representing different geographical

concepts further complicate the problem of geographical

information sharing and reuse. Semantic differences occur

between heterogeneous geographical data standards and raise

problems during the integration process.” (Kokla and Kavouras,

2001:683)

This problematic is currently being tackled in the development of the

“Fishery Ontology Service” (FOS), which is a FAO based project (FAO, 2003a),

designed for merging several “fishery terminologies”, in order to support an

integrated information search and retrieval process.

A fishery portal is one of the main goals that FOS is designed to achieve, by

semantically integrating the pre-existing fishery taxonomies and terminologies

(directories, reference tables and online thesauri), used by different groups in the

fishery domain, into one accessible portal.

18

19

For example, the concept “aquaculture” is classified differently according to

the “fishery groups” involved. The ASFA (FAO, 2002a) and AGROVOC

(FAO, 2002b) thesauri place the term “aquaculture” in different relational

hierarchies. Pisanelli et al (2002) sum up the dilemma encountered in this regard:

“two different contexts relating respectively to species and environment points of

view.” This has the direct implication that, without an ontology (in this case FOS),

the fishery search and retrieval process related, for instance, to the term

“aquaculture” (beyond the traditional keyword-match IR system) will produce

inconsistent results.

Another example of the type of semantic complexity, which an ontology’s

“shared conceptualisation” would need to address, is the “UMLS semantic network”

(McCray, 2003) project currently under development. The role of the UMLS

“semantic network” is to engineer an ontology out of the US national Library of

Medicine information management system, the UMLS, which:

“currently interrelates some 60 controlled vocabularies in the

biomedical domain. The vocabularies vary in nature, size and

scope and have been created for widely differing purposes.

Some vocabularies have been created for document retrieval

systems, others for coding medical records for billing and

administrative purposes, and yet others have been created for

use in medical decision support systems.” (McCray, 2003:82)

The UMLS “semantic network” ontology is designed to: “ provide an

overarching conceptual framework for all UMLS concepts.” This in order to

develop: “ a system whose goal it is to provide integrated access to a large number

of biomedical resources by unifying the domain vocabularies that are used to access

those resources.” (McCray, 2003:81).

19

20

The aim of ontologies, in general, is to provide integrated access in such a

way as to achieve consensus throughout a knowledge intensive community. Thus, an

ontology should be designed to resolve the representational disparities arising from

the different perspectives within a domain. Existing classification information

management systems have proved inadequate for such a task. They have been

unable to fully synthesise and do justice to the complexity and contextually-rooted

knowledge of human expertise, e.g. the differences in perspective or emphasis of

specialist groups which have the need to search and retrieve one common knowledge

base.

While the technology for running CIS has reached an impressive state of

maturity and power, the classification information systems, upon which the CIS

technology relies, are based on a myriad of singular “ad hoc” classifying systems: the

interpretation of the terms or categories is still left to the intuition of each individual

management system. In addition, the level of expressiveness of each system is quite

restricted (e.g. broader term, narrower term, synonymous term, related term).

2.1.1 The Promise of Better Information Management

A successful ontology, by providing CIS interoperability through engineered

“semantic integration”, could enhance the information management for CIS in many

ways. Better organised and explicit information would, by definition, support a more

targeted information retrieval or “extraction”, i.e. delivering the right information, in

the right amount and at the right time.

The search and retrieval process, aided by an ontology, would retrieve only

those “documents” that refer to the explicit concept defined, instead of using the

ambiguous and poly-semantic keyword match of traditional IR systems, i.e. the

explicit conceptuality of the ontology is used when querying the domain

20

21

For example, “plant data” residing in CIS databases needs inter-base querying

between different plant databases. However, the querying process is presently

severely hampered:

“terms used to describe comparable objects within and between

databases are sometimes quite variable and limit the ability to

accurately and successfully query information in and across

different databases. One solution to this problem involves the

development of an ontology.” (PC, 2002:138)

To this end, the “Plant Ontology Consortium” is currently developing such an

ontology.

The development of an ontology for plant data would integrate and provide a

richer structure, which would ensure more precise and complex querying. Due to the

explicit nature of the relationships between types of entities and features,

consultation of the ontology during the search and retrieval process would ensure

accurate recall of the specialist terminology.

The “Gene Ontology” (GO, 2003) is developing an interoperable

standardisation project for the biological domain. The aim of the GO is to facilitate

more sophisticated queries by synthesising the wide ranging biological-genetic

information domain. An ontology, in this context, could determine that the process

of photosynthesis occurs in plants but not in mammals (as explicitly pre-defined in

the ontology) and thus bind the query to that “represented conceptuality”.

21

22

If data is stored in an unstructured, “free text” form, as in the Google CIS

database, the search engine capability is inherently limited. For instance, in the

context of the search and retrieval of biological data, Harris and Parkinson

(2002:119) observe that:

“…In contrast, if ontologies are used to describe the species,

compound and developmental stage, structured queries, such as

‘what experiments use compound Y ?’ are possible…the source

ontology provides an unambiguous definition for Y.”

In summary, the human search and retrieval process is aided by an ontology

which permits a higher “information extraction” (i.e. targeted retrieval of data) than

is currently possible.

2.2 Machine Interoperability: Semantic Intelligence

An ontology, by describing a set of assumptions about a domain, is designed

not only to help humans communicate but also to enable different computer agents to

communicate and “manipulate” a domain’s “conceptual consensus”. This

constructive anteriority leads to another inter-related aspect of CIS “semantic

interoperability”, that of “semantic intelligence” (i.e. machine driven applications).

According to the proponents of “ontology-driven” CIS, ontologies would

permit the emergence of a more “intelligent” information system. Ontologies, in

addition to providing a common semantic structure, would allow for computers to

process data semantically, in a more “aware” manner.

Ontologies would offer an active “content-driven” role, as opposed to the

passive keyword “context blind” role of current information retrieval systems, thus

providing targeted retrieval of data in an “automatic” machine-driven manner.

22

23

An “intelligent” application, enabled by an ontologically driven CIS, would

have the potential to enhance the performance of everyday information systems, such

as e-mail and “calendar plus”.

Currently these types of applications are capable of filtering out unwanted

messages, or alerting the user to conflicts in meeting schedules, or identifying

meeting room availability, as the case may be. However, the functions that these CIS

applications are capable of performing are restricted, due to the lack of an

“understanding” of the context. In other words, the system’s information processing

has a “passive” reaction towards its content.

In order to play a more active, content-driven role, these CIS applications

would need to “act” upon the information content retrieved, i.e. by automatic

inference and/or deduction. This more powerful processing, however, would need to

be endogenous to the context, i.e. the actual “meaning/semantics” intended by the

content/users.

A series of automatic actions could then be generated in sequence by the CIS:

e.g. determining availability of participants and meeting rooms, scheduling meetings,

sending out automated messages, and so on along the “semantic chain”. In order to

achieve this there may be the need for creating an ontology of appointments (the

concepts of dates and available time slots in the context of the particular

organisational structure).

At present, most CIS use standard keyword based methodologies, which

fundamentally create the problem of information overload. In general, any query

process produces the effect of displaying a quantity of inappropriate data: the CIS

interrogated is “contextually blind”. As the number of information units grows, and

the specificity of specialist needs increases, the ability of current CIS to find the most

appropriate data is strained to the limit.

23

24

Typically, computerised information systems, supporting complex tasks and

domains, do not automatically possess the body of knowledge necessary for

generating adequate interpretations, but instead rely on the fact that the user performs

this function. Thus, inevitably most of the burden is placed on the user. Ontological

“intelligent” support implies that this burden should, to a maximum degree, be

shifted away from the user towards the CIS.

2.2.1 The Semantic Web

One of the main advocates of this semantic/intelligent vision is Tim Berners-

Lee, in his W3C “Semantic Web Activity” research project (W3C, 2002). The

“Semantic Web” project entails adding an extra ontological layer or infrastructure to

the current HTML/XML World Wide Web (WWW), permitting Web resources to be

more readily accessible to automated processes. In essence, Web content would be

structured and formalised in such a way as to provide content which could be

accessible and interpretable by machines.

The current WWW environment, as already indicated in general for CIS, is

entirely aimed at “human readers”. Machines are “blind” to the actual information

content --- Web enabled CIS, such as browsers, servers and search engines, do not

semantically distinguish between information sectors (the weather forecast, academic

papers, personal home pages, or corporate information, and so on). The “Semantic

Web vision” is set to change this.

The “inventor” of the WWW considers the development of ontologies to be

the core foundation for building a more “intelligent” Internet, i.e. the “Semantic

Web”. This could best be represented as a “gigantic electronic brain” which

“understands” Web resources. The “Semantic Web” would understand the

“meaning” of a Web page by following hyperlinks, from Web documents to domain-

specific ontologies (which could be seen as the “neurons” of the WWW). This

24

25

could make it possible to offer cross-references and automatic “inferences”, in order

to enable a computer to “understand”, for instance, that different words/terms are

different expressions of the same concept (e.g. “movie”, “film” and “motion

picture”).

“The Semantic Web is an infrastructure that will bring structure

to the meaningful content of Web pages, creating an environment

where software agents roaming from page to page can readily

carry out sophisticated or ‘intelligent’ tasks for different users.”

(Berners-Lee et al, 2001)

Berners-Lee illustrates how agents, supported by semantic information, could

be used to conduct research into everyday tasks, such as investigating health care

provider options, prescription treatments, or available appointment times. Each of

these tasks is now usually conducted by a human researcher but, with the “Semantic

Web”, thanks to the development of ontologies, this could be done by machines.

This important semantic feature would permit intelligent agents, and other

ontological driven Web-based applications, to not only passively swallow

representations/descriptions but to act on them as well.

“...an agent coming to the clinic's Web page will know not just

that the page has keywords such as ‘treatment, medicine,

physical, therapy’ (as might be encoded today) but also that Dr.

Hartman works at this clinic on Mondays, Wednesdays and

Fridays and that the script takes a date range in yyyy-mm-dd

format and returns appointment times.” (Berners-Lee et al,

2001)

The application/agent in the Semantic Web CIS environment would be able to

“know” and, therefore, act upon what it “knows”, because the “semantics” were

encoded by the Web site creator using appropriate “semantic software”, in much the

same way as Web page software is currently used to create HTML Web pages.

25

26

As Berners-Lee further explains, the ontology/semantics would encode data

such as the following:

“…professors work at universities and they generally have

doctorates. Further markup on the page (not displayed by the

typical Web browser) uses the ontology's concepts to specify that

Hendler received his Ph.D. from the entity described at the URI

http://www. brown.edu... and so on. All that information is

readily processed by a computer and could be used to answer

queries (such as where Dr. Hendler received his degree) that

currently would require a human to sift through the content of

various pages turned up by a search engine.” (Berners-Lee et al,

2001)

Berners-Lee’s intelligent Web CIS application would be composed of many

ontologies with links to other ontologies. The Semantic Web is designed to rely on

many decentralised ontologies, rather than one centralised, monolithic ontology.

These decentralised ontologies would be made available by their specialised domain

creators (and formalised by use of appropriate “markup”).

In order for intelligent agents to process Web-based information, it is

essential that they operate in a pre-defined human context of agreement, as to the

meaning of the terminology used within any particular domain. This ontological

agreement, at the human level of “semantic integration”, as detailed earlier, needs to

be further elaborated in a formal structure, in order to allow inference from the

ontological integration.

Intelligent Web services would use multiple ontologies in parallel (as Web

information is, by definition, located in numerous sources) and subsequently make

inferences from them.

26

27

If, for instance, the owners of different Web sites containing medical

information or providing medical services were to share and publish their underlying

ontologies, with their specialist variations of the terms they use, computer agents

would be able to extract and aggregate information from these different sites. As

such, the intelligent automated agent would “parse” the ontologies linked to the sites

and process and deliver the information accordingly, rather than merely matching

keyword “patterns”, as at present on the Web.

An ontology could build its hierarchic/taxonomic form with a focus on

“automatic” inferred inheritance. Thus, machines would correctly understand a

number of relationships among entities by assigning properties to higher classes and

then assuming that sub-classes inherit these properties. For example, if “Britney

Spears” is a type of “Pop Star” in a hierarchy marked “Singers”, a software

programme could make assumptions about the singer even if the details of her

biography are not explicitly known. An ontology may express the rule: “If a singer

has an agent or manager and released an album last year, then assume he or she has a

fan club”.

An agent/programme could then readily infer, for example, that Ms Spears

has a fan club and process information accordingly. Software-agents would not truly

“understand” the meaning of the particular information at hand, but inference

capabilities allow applications to effectively use or circumscribe language/concepts

in ways that are contextually significant to human users.

Furthermore, due to the fact that the Semantic Web is composed of Web

pages hyperlinked to their respective ontologies, it would allow a more semantically

rich environment to emerge, thereby providing an enhanced framework for

27

28

expressing human content/data on the Web (and simultaneously allowing a more

specific and complex querying process), for, on the Semantic Web:

“…we can incrementally add meaning and express a whole new set of

relationships (hasLocation, worksFor, isAuthorOf, hasSubjectOf, dependsOn, etc)

among resources, making explicit the particular contextual relationships that are

implicit in the current Web.” (Berners-Lee and Miller, 2002)

Thus the domain mapped out on the Semantic Web would allow such

statements as “all journal papers are publications” or “the authors of all publications

are people”. When combined with facts, these definitions allow other facts to be

inferred.

2.2.2 The Formal Aspect

In order for the Semantic Web, or any other “intelligent” driven CIS, to

function, an ontology must be structured in a formal manner to allow the expressions

of a given ontology to be processed unambiguously by a machine. This

“programming act” of formalisation is the essential process which would allow

software agents to manipulate/infer the terms in the ontology.

Existing knowledge representations for communicating an ontological

vocabulary and structure to humans, such as Yahoo's “lightweight” taxonomy, are

too informally expressed for automatic machine processing/inference abilities. No

machine processing leverage can be derived from such a form, or the unstructured,

“free” data common to most information on the Web.

The Extended Markup Language (XML) is a clear example of a formal

expression that could be applied to ontologies. XML is currently being proposed by

the W3C’s “Semantic Web Research Group” as the “carrier” body-syntax for the

ontologies making up the “Semantic Web”: a computer readable structure.

28

29

XML has already been accepted as the emerging standard for data

interchange on the Web. XML allows authors to create their own markup. The

formal data structure of XML “replaces” the format-based markup structure of

HTML. For instance, <B> My name is Bond, James Bond </B> is replaced by the

formal data structure of XML markup that provides a context of understanding for

the semantics of the data, i.e. <author> Ian Fleming </author>.

However, from a computational perspective, “semantic tags” like <author>

carry as little semantics as a tag like HTML’s <B>. A computer simply does not

know in what semantic sense “author” is intended.

XML markup allows users to structure a formal hierarchy of information in a

document, shows the relationship between items and is able to encode metadata in a

more “semantically unambiguous” way. However, it is not possible to know in

which semantic sense tags, such as “publisher” or “editor”, are being used. While

XML has been designed to be a more successful binding for highly structured,

complex metadata, the same problematic arises as with HTML’s original “meta-

tags”.

“XML lets everyone create their own tags …. Scripts, or

programs, can make use of these tags in sophisticated ways, but

the script writer has to know what the page writer uses each tag

for. In short, XML allows users to add arbitrary structure to

their documents but says nothing about what the structures

mean.” (Berners Lee et al, 2001)

This is why XML markup, in order to fulfil its machine processing potential

to the full, needs to be supplemented by an ontology, for which XML could be the

formal and body expression.

29

30

3. Building the Tower of Babel: Ontological Engineering

To summarise briefly, ontologies of CIS data are designed by establishing an

explicit/formal agreement to use the same terms with the same meanings and thus

“engineer” an ontology within a knowledge domain. The fundamental difference

between an ontology and a conventional representational vocabulary system lies in

the fact that it is machine understandable and, most importantly, that it provides an

enhanced semantic expression.

The shape and form of the constructed “shared conceptualisation” varies

considerably. In cases where the main need for CIS interoperability is “semantic

integration”, the ontological framework to be constructed would only be concerned

with how the particular community of users searches and queries the domain. To this

end, the engineering process would necessitate building a “conceptual consensus”

among the members of a group, representing what they “truly” intend, and

consolidating this into an explicit agreement. In other words, the ontological

information structure would solely focus on mapping the topic for the benefit of a

synthesised and accurate search and retrieval process, throughout the domain.

Thus the ontological engineers would focus on the role that ontologies play in

the reuse and exchange of data, to help the protagonists locate and interpret their own

information/domain.

If, on the other hand, the application is required to be principally machine-

driven, the ontological engineering would need to allow its “shared

conceptualisation” to be expressible in a formal manner. This would involve

translating into a formal expression (or fitting into a formal framework) each of the

ontological components/concepts agreed upon by the group.

30

31

Bearing these points in mind, ontologies are best understood through their

functional role, as opposed to adopting a purely format-based definition: “The

purpose to which the ontology will be put determines the nature and type of ontology

that is created.” (McCray, 2003:84)

But how is an ontology’s “shared conceptualisation” engineered? How does

the relevant conceptual distillation and abstraction emerge? How is the abstraction

constructed by a group possessing the relevant sphere of knowledge?

3.1 Engineering of Ontologies: General Guidelines

It is relatively easy to design an ontology based on concrete facts, such as

names, birthdates, etc. However, human knowledge is rarely that simple to represent

and to render the assumptions of a domain explicit is a complex task.

While no precise methodological procedure for “ontology generation” or

design can be deduced at present, a core streamlining process can, nevertheless, be

identified:

Raw material: acquiring and extracting domain knowledge

Collection of the “relevant” information resources and domain expertise that

will define, with consensus and consistency, the terms used to explicitly

describe things in the domain of interest.

•

•

Shared Conceptuality: purpose, scope, and organisation of the ontology

The design of the overall conceptual structure of the domain involves

identifying the domain's principal concepts and their properties (commitment

to the basic terms) and the relationships among the concepts.

31

32

Iterating this identification (inductive and deductive) process by adding

further concepts and relations at more taxonomic levels of detail or

resolution.

•

•

Identification of the key concepts and relationships has to be established in an

explicit manner, by defining each conceptual taxonomic layer/level.

Ontological engineering is a consensual methodology and hence involves a

thorough negotiating process, with the added complexity of continual monitoring of

the syntax, logic and semantics of each element constituting the ontology.

3.1.1 Gathering and Identification of the Raw Material

As previously mentioned, the “meta-level” nature of ontologies inevitably

places them in a “parasitical” relation to other CIS knowledge bases. The “building

blocks” or “raw material”, from which the ontology is constructed, stem from the

pre-existing knowledge base of the domain to be “ontologised”. An “ontological

analysis” or “mining” needs to be made of the domain, in order to subsequently

extract the key concepts from either “free” documentation or previous information

management systems: taxonomies, thesauri, etc. (Guarino and Welty, 2000).

A degree of arbitrariness is, by definition, involved if an ontology is built

“from scratch”; the researchers need to obtain information from forms,

questionnaires, free-text, etc., which, all combined, the group uses for the

formulation of the “conceptualisation”. While, in theory, this is possible, current

practice is based on the “conceptual anchorage” provided by “legacy” information

management systems.

32

33

For example, in the case of the fishery ontology, FOS, the previous

knowledge bases are the AGROVOC and ASFA online thesauri, as well as the

OneFish (2003) directory and “reference tables” of the FIGIS portal. The fishery

ontology FOS is constituted by, and created from, these heterogeneous fishery

information management systems (Pisanelli et al, 2002).

In the case of the “UMLS semantic network” biomedical ontology, the

“Unified Medical Language System” (UMLS) is used to carry out the ontological

analysis/mining necessary to build the ontology (McCray, 2003).

3.1.2 Shared Conceptualisation

The conceptualisation underlying the ontology is the fruit of a highly

collaborative and interdisciplinary process. The group responsible for engineering an

ontology is principally composed of computer scientists and the domain experts and

users.

In order to achieve “semantic interoperability”, the group must first agree on

the scope of their semantic interoperation and then reach a consensus, entity by

entity, relationship by relationship, attribute by attribute, and finally identify the

main concepts which give a “schematic representation” of the domain. This process

is called “ontological commitment” and is the essence of the ontological engineering

practice.

“Ontological commitment” is the agreement by multiple parties to adopt

particular ontological elements, when communicating about the domain of interest,

even though they do not necessarily have the same experience, theory, or perception

about the domain. For instance, all financial services practitioners agree that trade

execution and trade settlement exist and that execution precedes settlement.

However, there may be disagreement about whether the time limit should be two

days or six days, and so on.

33

34

Ultimately, there is no one correct way to model or conceptually represent a

domain; there are always viable alternatives. As such, the ontology is always

developed through a process of mediation, with the goal of ensuring, to the extent

possible, that all its users will find its characteristics to be sufficiently semantically

complex, clear and unambiguous, to be of practical use.

The resulting schematic representation thus achieved is based on the level of

granularity agreed upon and must be consistent, extendible (i.e. able to scale) and

updatable. Data sources that commit to the same ontological schema explicitly agree

to use a standardised set of terms with an explicit description/definition of those

terms.

The ontological conceptual process is analogous with creating a descriptive

language or grammar: constituted by nouns/adjectives (objects/concepts), verbs and

conjunctions/prepositions (relationships), expressed in an array or taxonomy of

classes, within which the main concepts of the domain are placed.

For example, an ontology of wine would be based on the overall class of wine

(which represents all wines), within which different types of wine would be instances

of this class, i.e. sub-classes. Bordeaux wine would be an instance of the red wine

sub-class. A class will, subsequently, have further sub-classes that represent

concepts that are more specific than the super-class. The classes could be further

divided to include properties such as red and white wines, further divided into

sparkling and non-sparkling, and so on, until the “wine ontology language” is

sufficiently representational for all the protagonists involved.

The important elements in the “wine ontology language” could include

“nouns-concepts” such as: grapes, location, colour, body, flavour and sugar content,

as well as adding the relevant “verbs-relations”, which are primarily responsible for

rendering the semantics of the whole domain explicit.

34

35

To take another field, in the biological domain, the main concepts needed for

representing “protein interactions” in developing a “protein-interaction ontology” is

described as follows: “we can identify two main concepts needed for the

representation of protein interactions: the interacting compounds and the

interactions themselves.” (Ratsch et al, 2003:86)

Another case in point is the “Plant Ontology Consortium” (2002), which is

attempting to develop a “plant ontology” to represent current and future

understanding of relationships among various plant knowledge domains. The

following example shows the conceptual complexity and degree of detail that the

plant ontology would need to represent:

“This protein can be equally described as a ‘DNA binding

protein’ and as a ‘catalyst’ (enzyme). Consequently, this protein

should be a ‘child’ of both ‘parents’ within an ontology. The

presence of such multiple ‘parent’ situations in biology requires

that they be accurately represented in a conceptual framework.”

(“Plant Ontology Consortium”, 2002:139)

The ontological engineering process is the development of a class hierarchy,

as well as the “properties” of those “classes”, while ensuring that an explicit

definition of those concepts in the hierarchy, and the properties contained within

them, is present and codified throughout the construction process.

There are several possible approaches to developing and constructing this

class hierarchy:

A top-down engineering process starts with the definition of the most general

concepts in the domain and progressively divides the conceptual definitions into

more specific sub-classes. For instance, the creation of a class for the general

concept of “class: wine” is described, then the class is further defined by generating

35

36

sub-classes and instances (white wine, red wine etc.), proceeding with categorisation

of the “red wine class” into Burgundy, Bordeaux, etc., and continuously refining

each sub-class by adding the corresponding properties/characteristics, until the

domain hierarchy is fully mapped out.

A bottom-up engineering process starts with a definition of the most specific

sub-classes of the domain (which is how most domain information is initially

presented) and progressively constructs the hierarchy by expanding these sub-classes

into more general concepts, i.e. super-classes.

A third approach is to combine both the top-down and bottom-up formats, by

defining the more salient concepts first and carrying out a continuous process of

switching from general, to middle or specialised levels, and vice versa, as

appropriate, to generate the ontological hierarchy.

The principle difficulty of engineering an ontological hierarchy (of classes

and sub-classes) is to achieve a high level of semantic relationships to be mapped out

between the concepts/classes. Numerous relations need to be extracted and made

explicit, such as: “is-a”, “part-of”, “manufactured-by”, “owned-by”, etc. and other

types of relational “constraints”.

The following is an illustration of some of the semantic complexity of the

relationships to be mapped out for an ontology: in this case the UMLS semantic

network ontology (McCray, 2003):

is a associated with physically related to part of consists of contains connected to interconnects tributary of ingredient of spatially related to location of

36

37

adjacent to surrounds traverses functionally related to affects manages treats disrupts complicates interacts with prevents occurs in process of uses manifestation of indicates result of temporally related to co occurs with precedes conceptually related to evaluation of degree of, etc.

The following type of concepts for describing/defining persons working in an

academic environment (with their characteristics) could be used for an “Academic”

or “Person Ontology” (Benjamins et al, 1999):

Academic staff class hierarchy (class definitions):

Lecturer Researcher Administrative-Staff Secretary Technical-Staff Student Phd-Student, etc.

37

38

Some relation definitions: Address, Affiliation, Cooperates-With, Editor-of, Email, First-Name, Has-Publication, Head-Of-Group, Head-Of-Project, Last-Name, Member-Of-Organization, etc.

In the same academic environment, a publication ontology could be

developed with the following classes and relations:

Publication class hierarchy

Conference-Paper Journal-Article Technical-Report Workshop-Paper, etc.

Some relation definitions:

In-Conference, In-Journal, In-Organization, In-Workshop, Journal-Editor, Journal-Number, Journal-Publisher, Journal-Year, Last-Page, etc.

3.1.3 Conceptual Scope: Multiplicity and Singularity

To engineer an ontology, which would achieve the goal of “semantic

interoperability”, outlined previously, is a difficult process to put into practice (some

would say an impossible one) because, as touched upon earlier, there are multiple

ways to model the same information (due to differences in perspectives, different

organisations and different professions within the same domain). There is no single,

canonical view of a particular domain of knowledge or the conceptual elements

within.

Furthermore, in order to uphold Gruber’s original theoretical definition

(Gruber, 1993), an ontology must have the resources to sustain the many levels of

representation/classification which any “authentic” human semantics calls for. In

other words, any “true” ontology must sustain representation/classification of

38

39

“entities” at different resolutions. An ontology, if it is to be of any use within a

domain, and at least fulfil its “higher” knowledge mission (as compared to traditional

systems), would have to permit a multitude of representational perspectives to be

reused by multiple groups within a domain through a single access point (i.e. the

synthesised engineered consensus, the ontology itself).

For example, an overarching medical ontology would need to represent an

anatomical ontology, both at the level of organs within the structure of the human

body, and also at the cellular, protein, genetic and molecular levels, constituting

ontological representations/classifications at successively finer resolutions. Thus the

extreme semantic complexity of medical information calls for a corresponding

ontology, capable of supporting numerous applications, from the perspectives of

doctor, patient, pharmacologist, geneticist, etc.

The degree of mediation achievable between different levels of granularity

and human perspectivism (kaleidoscopic subjectivity) is the essential requirement for

whether an ontology, as both discipline and praxis, will be successful in years to

come.

As Pisanelli et al (2002) state, a “full” ontology would:

“… provide the framework to integrate in a meaningful and

intersubjective way different views on the same domain, such as those

represented by the queries that can be done to an information system….”

The following examples reflect some of the semantic ambiguity arising from

the problematic of perspectivism:

Specialist ontologies are essential to the acquisition and expression of domain

knowledge. On the other hand, the more detailed and/or specialised an ontology, the

more “ontological commitments” are made to particular or specialist tasks (i.e.

creating further sub-domains), which may not be taxonomically compatible with the

classification of the parent domain.

39

40

Semantic problems may arise in the situation where two ontological

structures, representing the same domain, refer to the same concept in different ways,

such as zip/postal code or gender/sex. A “tumour”, for example, can be defined at

the same time as an “anatomical structure” and a “pathological phenomenon”.

This type of ambiguity creates severe problems for the ontological

representations of the “shared conceptual” engineering process. This problematic

could be resolved if ontologies were to provide equivalence/mapping relations: one

or both ontologies may contain the information that a term is equivalent to another

term or that it comes under a particular conceptual heading (equivalence between

classes and properties).

One solution to tackle this type of problematic, which is endogenous to the

knowledge endeavour, would be to establish an integrated consensus on the scale

needed, by building a multiplicity of linked ontologies, known as an “ontology-

library” (Ding and Fensel, 2000). This would have the effect of creating synergy in a

domain, mainly by means of a cross-referencing process between different types of

ontologies.

In the engineering process, a substantial number of ontologies may need to be

created to fully represent the semantics of a domain, at the level of detail required.

Thus, ontologies would be taxonomically integrated or linked within the domain (i.e.

primitives/relationships that allow ontologies to map terms to their equivalents in

other ontologies).

This could be best described as a “Russian doll” taxonomic modality,

whereby an ontology can be layered according to different perspectives/requirements

and resolutions, and thus has the ability to conceptually represent specialised data

(i.e. sub-domain within sub-domain).

40

41

As mentioned before in the case of the “Semantic-Web” scenario, an

ontology of a domain is rarely a monolithic unit but is rather constituted of several

ontologies that compose the overall ontology, i.e. an “ontology-library”.

In essence, an ontology-library of a given domain is a combination of

“domain ontologies” and “core-ontologies” and, more rarely, the possibility of

including domain independent “top-level” ontologies (Guarino, 1998) and (Smith,

2002).

The most prominent types of ontologies are “core-ontologies”, describing

more specialist fields of endeavour, such as anatomy, clinical guidelines or

diagnosis, etc. in the medical sphere, and “domain-ontologies” which operate at a

more general level, in the numerous fields of medicine itself.

Thus, the “subject” of an ontology may be the area pertaining to a single

specialist task (core ontology) or a particular subject domain (domain ontology). The

latter allows “communication” with other related domains and contains more general

conceptual elements, while the “core-ontology” contains the representational

elements needed for conceptualising a domain, or sub-domain, according to some

specialist task or data.

The actual ontology results from the combination of several such types of

ontologies, which are linked under an overarching ontological umbrella, i.e. a further

domain ontology which provides a framework of the domain to be mapped out. This

“higher” ontological framework provides the singular ontologies with the appropriate

links or taxonomic subsumption relations and equivalencies.

Basically, in order to fully capture the representations of a knowledge

domain, an ontology-library could be viewed as a synergetic zooming in and out

platform, which would facilitate the search and retrieval process, not only at different

levels of resolution but also, to some degree, across specialist domains. Thus, a “full”

ontology has an in-built modular structure and approach, and each “module” is in

effect created and maintained by different working groups.

41

42

Such a multiple, “modular ontology-library” approach (Gangemi, 2003) is

necessary to achieve the standardisation/integration process, which provides the

degree of “semantic integration” required, as described in section 2 above.

To this end, even more ambitiously, it has been speculated (Smith, 2002) that,

at the other extreme, it could even be possible to engineer upper or “top-level”

ontologies, which are independent of any domain. This type of ontological

construction is called “formal ontology” (Guarino, 1998).

Formal ontology seeks to provide a robust general foundation, by focusing on

categories common to all domains. The development of “top-level” ontologies

engineers theories or specifications of such highly general (domain-independent)

categories as time, space, inherence, instantiation, identity, processes, events, etc. A

top-level ontology seeks to cover a wide range of areas, by providing a foundational

ontological framework for defining categories to be reused by different domains and,

therefore, to be applicable to large communities of users.

This would be useful for integrating multiple specialist sub-domains into

wider knowledge domains, for instance in the context of geographical data:

“an ontology may contribute to the unification of different

conceptualizations of geographical space into an ultimate

geographical ontology. However, this integration can be

accomplished only if these ontologies are embedded within a

more general, top-level ontology, which provides a solid

framework for more specialized applications.” (Guarino, 1998)

Such a formal ontology could consist of an integrated interchange between

“top-level” ontologies, which are domain independent and could thus be reused by

many non-related groups (consisting of general notions such as “species”,

“organism” or even descriptors for space , time, etc.) and “core ontologies” or

“domain ontologies”.

42

43

A top-level ontology is a further attempt to integrate sub-disciplines of a wide

domain, such as physics, biology or medicine. At this level, high-level concepts,

such as ‘organism’ (animal, plant, virus), ‘process’ (photosynthesis) or ‘structure’

(animal or plant anatomy), have to be appropriately organised so that they link to the

more specific, concrete domain descriptions in a structured way.

For instance, models for many different domains need to represent the notion

of time. This representation includes the notions of time intervals, points in time,

relative measures of time, and so on. If one group of ontological engineers develops

such an ontology in detail, others can simply reuse (or adapt) it for their own

domains.

In creating an ontology to manage a bookstore inventory, the ontological

engineers could begin by defining a class of objects (in this case books) which have a

temporal extent (life within the bookstore), a position (such as on a particular shelf)

and physical characteristics (format and size).

In this sort of situation, it would be ideal to adopt and reuse a standard

“upper-level ontology”, which is not created by the book domain experts, and this

would involve significant time and labour saving effects, as well as providing

guidance for adapting categories or classifications, for more specific use.

3.2 Methodology: “Representational Ontology Languages”

As is apparent by now, while in theory, building ontologies would be possible

without aid (a purely manual process), the representational complexity and scale to

be represented, call for a precise streamlining methodology and/or, more accurately,

“representational languages” --- in order to provide a structure or guide for building

the descriptive content and help to make it explicit.

43

44

For representing the shared conceptuality, “ontological languages” and

methodologies have recently been developed to guide the ontological engineering

process. However, as Michael Denny (XML, 2002) accurately sums up the situation,

a complete and standard ontological methodology and/or “ontological language” is

still lacking:

“The problem is that these procedures have not coalesced into

popular development styles or protocols…to the degree one

expects in other software practices. Further, full support for the

latest ontology languages is lacking.”

“The Laboratory for Applied Ontology” (OntoLab, 2003) is a

multidisciplinary research group, which works in collaboration with other

international ontology developers, such as the “W3C” “Semantic Web Activity”, to

develop languages and methodologies for “Ontological Engineering”

Research Scientist, Gangemi (2003) of OntoLab addresses the nature of a

methodology which could guide the engineering process:

“For example, a domain ontology in biology may contain

definitions of ‘species’, ‘organism’, ‘pathway’, ‘anatomical

structure’, ‘biological process’, etc”. Our tools help the encoder

of the ontology decide whether his/her meaning of ‘species’ is

about organisms or classes of organisms; whether the meaning

of ‘function’ is about substances or processes involving

substances.. A user of that ontology (or a software agent using it)

will then be aware of the encoder’s meaning on a transparent

basis.” (p105)

44

45

3.2.1 Representational Ontology Languages

To be able to specify a “shared conceptualisation” explicitly and formally,

some representational languages have recently begun to be developed. Thus, the

“shared conceptuality” can be guided and developed with the aid of “representational

ontology languages”.

An “ontological language” is the process of streamlining and defining data

and, thereby, structuring it into classes, attributes, instances, functions and relations

(along the lines of the process described in section 3).

These representational languages are designed to act as a “template

language”, in order to provide a more precise content framework or conceptual slot,

with the aim of increasing the consistency and logicality of the ontological building

process.

3.2.1.1 DAML+OIL and OWL

Two of the most powerful ontology languages to date, which are considered

to have a high degree of machine-processible capability, are currently being

developed: “DAML+OIL” (DAML, 2003a) and the “Web Ontology Language”

(WC3, 2002). The latter is designed especially for use in the CIS Semantic Web type

scenario.

The DAML ontology “semantic mark-up language” (the DARPA Agent

Markup Language) has been created by the Defense Advanced Research Projects

Agency (DARPA), which has recently been combined with the EU-based ontology

language, OIL: “Ontology Interface Layer” (OntoKnowledge, 2002). Together they

form the DAML+OIL ontology standard.

45

http://www.daml.org/

http://www.w3.org/2001/sw/WebOnt/

46

The recently established “Web Ontology Working Group” W3C (2002) is

working towards defining a “semantic markup ontology” standard for creating and

managing ontologies within Web documents. W3C’s “Ontology Web Language”

(OWL) is a “language” for defining “Web ontologies” (providing the “soul” or

“neurons” of the Semantic Web).

All three ontological languages, OWL, DAML, and OIL, whether used

together or separately, provide sets of modelling primitives and “conceptual

containers” for creating ontologies.

“In order to write an ontology that can be interpreted unambiguously and

used by software agents we require a syntax and formal semantics” (W3C, 2003c),

for which all three ontology standards provide specifications.

In general terms, “representational ontology languages”, such as DAML+OIL

and OWL have the following modelling features:

“Description logics”, which describe knowledge in terms of classes, or

frames. The meaning of any expression in a description can be described in a

mathematically precise way.

•

• Logical/formal type of “markup” framework, which allows users to operate

within a consistent “tagging” framework, in which to fit or structure their

content/data. The goal is to define a machine-readable markup knowledge

representational language in a formal semantics that clearly delineates what is

entailed in any particular content construct.

46

47

A taxonomic structure for elaborating the codification for organising the

interaction between concepts, relations and attributes, etc. --- hierarchies of

classes and properties based on sub-class and sub-property relations which,

taken together, describe the domain. Classes are built from other classes,

using combinations of intersection (AND), union (OR) and complement

(NOT).

•

•

The semantic markup is designed to provide a basic infrastructure that would

allow a machine to make simple inferences. One of the most important

characteristics of such ontology languages is a degree of support for inference

power. Any expression would entail a certain set of conclusions from any

information system that conforms to that ontology language. The following

is an example of the automatic taxonomic/inference parsing power of an

ontology:

“Parenthood is a more general relationship than motherhood”

and “Mary is the mother of Bill” together allow a system

conforming to DAML to conclude that “Mary is the parent of

Bill”. Accordingly, if a user poses a query such as “Who are

Bill’s parents?” to a DAML search system, the system can

respond that Mary is one of Bill's parents, even though that act is

not explicitly stated anywhere.” (DAML, 2002)

In a formal expression, the ontological statement becomes:

(motherOf subPropertyOf parentOf) (Mary motherOf Bill)

47

48

A DAML+OIL compliant system can conclude: (Mary parentOf Bill)

based on the logical definition of "subPropertyOf", as given in the DAML+OIL

specification.

In summary, the semantic markup languages of OWL, DAML+OIL provide a

framework for “annotating” content/data. They structure domain statements into an

array of classes and properties and are able to represent a complex range of

relationships between them, such as subclassOf, subPropertyOf, inverseOf and, in

addition, a set of defining restrictions, such as oneOf, disjointWith and

intersectionOf.

As such, ontologies built with these ontology languages could represent

axioms describing a domain, such as “all newspapers or magazines are publications”

or “the authors of all publications are writers”. Thus content providers in a domain

would be able to “annotate” their content at successive levels of detail, as required.

Inevitably, such a process relies on content providers annotating their data

with these ontology languages. However, it can be assumed that it is to the benefit of

the content providers that their content be accessed as widely as possible and that

they would, therefore, be willing to make the necessary effort to engineer the

ontology.

4. Conclusion: Towards Babel: Will Ontologies Work?

As can be concluded from the ontological projects discussed above, there is

no doubt that, if successful, an ontological approach for CIS would improve all

aspects of information management and the retrieval process and that it is, therefore,

a desirable enterprise and research endeavour. This is all the more true as the

48

49

problems that an “ontological approach” are designed to tackle are real and destined

to increase with time, severely hampering the effective use of CIS. It is evident that

it is impossible for traditional information management systems to deal with the ever

increasing problem of information overload. It is, therefore, essential for a

practicable solution to be found if the information age is to continue to flourish and it

is to this end that ontologies have been conceived.

Nevertheless, two main critical areas need to be further explored, both related

to the ontological aspect of CIS semantic interoperability: “semantic intelligence”

and “semantic integration”, if future research is to be pursued:

• to what degree is an ontology constructible?

• to what degree and extent is an ontology workable?

Machine processing: the formal aspect

An ontology must be sufficiently formal to enable it to be processed by

computers, while at the same time allowing a sufficiently complex/comprehensive

knowledge representation of the domain. Clearly this is a difficult balancing act,

which the formal ontological languages of OWL and DAML+OIL do not in any way

resolve.

It is difficult to see how a trade-off can be avoided between the two sides of

the equation: gaining in formality and losing in semantic expressibility or vice versa.

Any formal ontological representation, due to its axiomatic nature, will gain

in automatic inference power but will, simultaneously, limit the range of human

semantic expression, endogenous to natural language (where most human knowledge

resides).

In other terms, the creators of an ontology, supporting machine processing

powers, face the dilemma of reconciling two extremes when engineering the

taxonomic ontological structure. At one extreme, there is the need to capture the

49

50

web-like complexity of human semantics. However, this gives rise to the situation

that the ontological conceptual representations are too ambiguous to be “inferred” by

software applications. At the other extreme, explicit formal ontologies are too

inflexible and insufficiently rich and expressive to convey the range and intricacy of

human knowledge --- yet, paradoxically, this is the prime purpose for which an

ontology has been devised.

Working on the assumption that it is feasible for a complete ontology to be

explicitly formalised with “knowledge representation languages”, such as OWL and

DAML+OIL, they would, by virtue of their minimal expressivity, only be capable of

supporting a limited and simplistic set of inferences, which would be far from

adequate to express the human knowledge context.

In the case of the Semantic Web scenario, while it is relatively easy to

surmise its application for simple, automatic tasks, it is daunting to conceive how it

could work in contexts where more complex inferences are demanded by users of

any knowledge domain. However, Tim Berners-Lee seems to imply that, not only is

the semantic integration created by an ontology easily achievable, but that

formalisation is possible without a significant loss of meaning.

The fundamental problem for engineering an ontology is to try to encode the

semantics in a formal expression without sacrificing flexibility, i.e. the “semantic

range” of unstructured data. Basically computability (and simplicity) is, to some

degree, incompatible with semantic expressiveness (depending on the semantic range

to be captured).

Ultimately, a great deal of knowledge communication between humans is not

formally explicit but has a contextual background and is, furthermore, of a tacit

nature. An attempt could be made to make this contextual communication formally

explicit but this is a problem of infinite regress and, therefore, the question arises as

to whether it would even be feasible. The formalisation needed for intelligent

applications for CIS will always be limited and partial, due to the intrinsic social

nature of knowledge and language.

50

51

Semantic Consensus

As far as the second part of the semantic interoperability equation goes, i.e.

semantic integration at the human level, it poses the opposite problem to the limited

expressivity of the formal aspect of machine processing. The semantic complexity

and richness of any domain to be represented is boundless. It is for this reason that it

is important to understand that a conceptualisation has to be shared, not only in the

sense of making it possible for an ontology constructed by different parties to be

reused, but also in the sense of building a consensus. This process of consensus

building, which is extremely difficult to achieve in practice, is the major obstacle

encountered when engineering an ontology.

If, as some might assume, constructing an ontology is purely an objective

process of classifying representations of the knowledge domain, consensus building

would be a relatively easy matter. However, ultimately, all human

knowledge/information is based, to a large degree, on social conventions and

agreements, which relate to a particular knowledge community and its dynamics.

The “Tower of Babel” problem, within CIS knowledge domains, which the

creation of ontologies is intended to resolve, still remains, but at a higher meta-level.

The founder of “Infomis”, Barry Smith, sums the problem up as follows: “Ironically,

the very Tower of Babel conditions which the ontological project was initially

designed to address have been recreated within ontology itself” (Smith, 2002).

The “Babel” factor means that the same domain may be “semantically” sliced

in different ways when addressed from different perspectives of interest/use, as

described earlier. However, it is difficult to see, even with an “ontology-library”

approach, how this multitude of perspectives could be fully and completely

integrated by an ontological framework and, thus, how the pitfalls of previous

information management systems could be avoided.

51

52

It is thus equally hard to imagine how the issues of “semantic heterogeneity”

would be resolved to the satisfaction of all the parties concerned. The conceptual

representation of any CIS knowledge domain does not repose on a neutral

framework, from which to deduce all descriptions and fit them into an explicit array

of classes. This is due to the fact that not all conceptualisations are equal to each

other.

Every aspect of conceptual representation is highly context-dependent. This

contextual background needs to be made explicit, in order for all parties to describe

and define the terminology in a structured way. Without this explicit mapping out

process, a large amount of poorly articulated or ambiguous knowledge would

severely impede the necessary consensus building. The question then remains as to

how to distinguish between the fundamental ontological commitments, arising from

the contextual background, and the more superficial issues, which are relatively easy

to negotiate.

Based on the “Babel” factor, in practice the following consensus engineering

problems can be highlighted:

• Different authors create substantially different conceptualisations of any

domain, despite the fact that their purposes are similar. Differences in

ontological conceptualisation by members of a group do not necessarily

reflect differences in the concepts identified --- it is not so much a difference

in kind but a difference of “degree”. Conceptualisations vary in focus and the

type of emphasis and priority accorded to them by each member of the

engineering team.

• It is difficult to reach an agreement even with regard to what constitute the

most elementary building blocks of any knowledge domain, especially when

representing relationships, due to the different focus or perspectives, from

which the ontological creators generate their conceptualisations.

52

53

• Terms defined in an ontology vary in their reusability across sub-domains.

Some terms are reusable across all sub-domains, whilst others are particular

to a specialised knowledge domain (and thus not reusable).

If, for all the above reasons, ontologies are not as “completely” feasible as the

proponents of ontologies might wish, it is still essential to focus on the “degree” to

which ontologies could still be constructed and used for mapping out knowledge

domains.

While an ontological approach could prove highly useful as a consolidator of

existing specialist CIS knowledge bases, thereby creating a small-scale ontology, the

main problem would seem to arise when such specialist sub-domain ontologies need

to be integrated into the overarching representation of the domain.

In the final analysis, whatever the scale represented, the problems

encountered are no doubt intrinsic to the human knowledge domain. However, more

importantly, they are also contingent on the lack of the right “tools” and environment

for consensus building.

4.1 Mapping Complexity: The Need for Tools

As Ding and Foo (2002a:18) conclude: “It is evident that much needs to be

done in the area of ontology research before any viable large scale system can

emerge to demonstrate ontology’s promise of superior information organization,

management and understanding.”

If the ontological approach for CIS is to achieve a degree of success, and

hence usability, it is essential to develop accessible and efficient tools, that would

provide, at minimum, some “semi-automatic” help. Without such tools, ontological

engineering is an unduly complex, laborious, and time-consuming process --- this is

for some critics the major detraction of the ontological approach.

53

54

“The majority of existing ontologies have been generated

manually. Generating ontologies in this manner has been the

normal approach undertaken by most ontology engineers.

However, this process is very time-intensive, error-prone, and

poses problems in maintaining and updating ontologies. For this

reason, researchers are looking for other alternatives to

generating ontologies in a more efficient and effective way.”

(Ding, and Foo, 2002a:135 )

One of the main problems of manually constructed ontologies is that they are

particularly subject to significant delay in updating their content and maintaining

currency.

The more complex the ontological construction to be made of the domain, the

more essential is the necessity for tools which would ideally streamline the process,

by helping for instance to:

• guide, in a consistent and cohegent way, the “shared conceptualisation” and

representational (and inference) complexity required to cover the range and

appropriate formalisation, as well as providing verification and validation

throughout the engineering process.

• enhance the deductive and inference powers of the creators of the ontology

and thus facilitate consensus building amongst members of the group at each

conceptual stage of the construction process.

• capture the complexity involved in constructing a “true” ontology with a

consistent “template measure” (and formalisation guide), for all participants

throughout the ontological process. In other terms, to acquire, organise and

conceptually visualise the domain knowledge, before and during the building

of an ontology.

54

55

• enable a number of different users to create machine-readable content

without being experts in logic, which is a crucial aspect to the potential

success of the Semantic Web. In other words, “formalisation” or “semantic

mark-up” should be a by-product of normal computer use. Ultimately, much

as in the case of current Web content, a small number of tool creators and

web ontology designers will need to know the details, but most users will not

even be aware that ontologies exist.

It is of the utmost necessity to develop tools which would incorporate such

features. These tools would support and accelerate the ontological process in a

consistent and less ad-hoc manner: inspecting/browsing, codifying, modifying and

maintaining the ontology.

Recently, a range of editing software tools (XML, 2002) have been developed

to help to accomplish some aspects of ontological engineering. Michael Denny

(XML, 2002) portrays the current situation, in what is probably the first concise

survey of ontology tools, termed “ontology editors”:

“Despite the immaturity of the field, or perhaps because of it, we were able to

identify a surprising number of ontology editors -- more than 50 overall”. Although,

as the author states, some are general data modelling tools, such as “Microsoft’s

Visio”, and not specifically designed for the purpose of “ontological editing”, most

of the other editing tools are specific to a particular ontological purpose and/or

domain.

In general, one could conclude that the design of an ideal “ontology editor”

or, better still, viewer, would need to focus on supporting the most laborious and

complex parts of ontological engineering, such as assisting with the arduous task of

maintenance which needs to be done on a regular and ongoing basis, as well as

provididing an “easy” mechanism for the validation process. Both tasks would need

55

56

to be provided by the “ontology editor” in a semi-automatic manner. In fact, all such

tools would need to provide a high degree of semi-automatic mechanisms which, all

combined, would help to consolidate and, therefore, streamline the ontological

engineering praxis.

At present this lack of consistent and coordinated streamlining of the

ontological process is a major drawback. Bearing in mind the scale to be

represented, it would seem impossible that, without some form of semi-automatic

mechanism, semantic consistency/coherence could be maintained throughout the

building of the ontology.

The key to the usability of an “ontological editor” or viewer would be the

ability to organise and manage the taxonomic structure of an ontology through an

accessible interface, which would give a visual representation and enable

manipulation of the ontology's framework (the interlinking concepts and relation

hierarchies, etc.). The use of a multiple tree viewer with expanding and contracting

levels could be a solution.

In effect, this would be similar to a homepage-creation tool like

Dreamweaver or Frontpage. A user could thus choose from a menu to add

information about a person and then choose a relative or a professional colleague,

etc. Users, with the support of the “ontological editor”, could build the semantic

elements of their ontological structures through a visual display.

An “ontological editor”, Protégé 2000, is being developed by Stanford

University (2002), to provide an ontological editing tool along the lines described.

“OILEd” is another “ontological editor”, which has been developed by

Manchester University (2003), and is ready for full use. OILEd allows the user to

build ontologies using the ontological language DAML+OIL.

56

57

In general, as Ding and Foo (2002a/b), in their survey of ontological

engineering aided by tools, where they examine current research groups related to

semi-automatic (and even automatic ontology generation), come to the conclusion

that human input is essential at each stage of the ontological building process and

that tools can only support, and not replace, the expertise of humans.

4.2 Ontological Building Environment

Ultimately, if ontologies are to be practicable, a new field of research into the

ontological approach for CIS could be explored beyond the development of singular

tools. Ontological tools are a useful aid but, by themselves, are limited in

effectiveness for the users in a group. Another aspect needs to be developed: the

building environment itself.

Because of the highly collaborative process that the creation of ontologies

demands, in order to cover the wide scale and semantic mapping out needed, it is

imperative for all the participants to have a propitious environment, in which to

encourage a maximum of collaboration. The members of the group need to work in

close collaboration with each other to make full use of the various streamlining tools

used for specific aspects of the ontological engineering.

In other terms, the process of ontological engineering needs to be embedded

in an environment which would foster what is essentially a collaborative, consensus

building endeavour. Without an accessible “arena” within which to perform the

ontological building tasks, the ontological project is in effect severely impaired, no

matter what tools each member of the group may use. However automated the tools

may become in the future, they cannot suffice on their own without the human

collaborative element.

57

58

Basically, the technology needs to be harnessed to create an environment

which would encourage and cultivate a synergy and synchronisation between the

participants in the building process.

The Ontolingua Server (Farquhar et al, 1997) is probably the first practical

example of what form such an “ontological environment” might take.

The “ontolingua server” provides an interface for users and applications/tools

to access or manipulate ontology-libraries. The “ontololingua server” stores

extensible libraries of sharable/reusable ontological structures which can be browsed

and conveniently accessed. In addition it can integrate editing tools for ontological

engineering. The WWW accessible client/server architecture forms the core of this

“ontology server” type of ontological environment.

The main function of an “ontology server” is to edit, evaluate, publish,

maintain and enable reuse of ontologies, within a remotely accessible environment.

Thus, the main significance of this type of technology is its ability to support and

facilitate collaborative work through the decentralised medium of the World Wide

Web.

In fact, one of the most important aspects of the “ontology server” ontological

environment is that numerous collaborators, regardless of geographical location, can

contribute to the ontological engineering process in a manner analogous to the

“Open-Source” methodology for engineering software. In the ontological context,

the engineering process would be applied not to the “writing” of computer code, but

rather to the semantic representational elements of the “shared conceptualisation”.

58

59

As such, this Web-based “distributed” engineering process could overcome

some of the semantic mapping out problems previously identified. The distributed

nature (from the perspective of the participants accessing the WWW) of ontological

engineering, epitomised by the ontology server, should be encouraged and

decentralised even further, which is precisely the essence of open-source

methodology.

Eric S Raymond (1998) has produced two guiding metaphors to describe the

phenomenon of open-source software, exemplified by the incredible success of the

Linux operating system (which like Microsoft Windows involves millions of lines of

computer code). The “bazaar” style of software development, characterised by

decentralised cooperation, which can lead to increased productivity, reliability and

quality, as opposed to the “cathedral” centralised style, characteristic of Microsoft

software.

With the bazaar style of open-source, the software producer relinquishes

certain intellectual property rights over its software in exchange for other engineering

benefits: “Given enough eyeballs, all bugs are shallow.” This is what Raymond

(1998) calls “Linus' Law” after the creator of Linux. The basic advantage of open-

source methodology is extremely simple: putting a programme into open-source

mode is the ultimate code review, producing an ongoing “peer review” of

suggestions and code revision.

The “bazaar” model of development and engineering has another essential

feature. The code of a programme is made available and “free”, i.e. open code, to

any interested party, thereby allowing many users to contribute, modify and improve

59

60

it, without proprietary restrictions. The philosophy of open-source became so

powerful and successful because:

“Linux was the first project to make a conscious and successful

effort to use the entire world as its talent pool. I don't think it's a

coincidence that the gestation period of Linux coincided with the

birth of the World Wide Web...Linus was the first person who

learned how to play by the new rules that pervasive Internet

made possible.” (Raymond, 1998)

The supporters of the open-source paradigm argue that from this distinct

philosophy emerges a methodology which is responsible for producing software,

superior in quality to that of proprietary applications.

To draw a final conclusion, it would seem that the wide scale and complexity

necessary for mapping out ontologies, could profit from the open-source paradigm.

The open-source concept of software engineering could be beneficially transposed to

the ontological semantic engineering of the conceptual representations required, as it

involves a similar collaborative and “open” process to support the consensus building

mechanism (as opposed to programming computer code).

An “open-source” ontological environment would in effect be constituted of

three main components:

• Remote access by a vast number of collaborators, who would contribute, review

and update the material at hand, on a far larger scale than is currently possible.

• Multiple parties can contribute online from wherever they are located and can be

automatically informed of each other’s activities.

60

61

• The ontology, both during the construction process and after, should be made

“freely” available, i.e. “open”, in the non-proprietary sense of a lack of

centralised ownership, to encourage modifications and updates from as many

contributing sources as possible.

The next section is a brief proposal of how such an open-source ontological

environment might work in practice.

4.3 OntoP2P: Consensus-Building Environment

Maybe it would be feasible to use a peer-to-peer (P2P) technological driven

methodology to support the consensus driven “shared conceptualisation” process as

well as the maintenance of ontologies.

The aim of this “OntoP2P” system would be to assist the “ontological

commitment” process amongst the members of a domain (i.e. to facilitate the inter-

subjective negotiation procedure) and aid the building of more semantically complete

types of ontologies, at the “core” and “domain” levels.

This “OntoP2P” is an intensively collaborative methodology. The more

individuals (i.e. “peers”) involved in the process, the more complete the ontological

construction becomes.

In basic terms, this “OntoP2P” approach relies on building one centralised

ontology from multiple decentralised ontologies, which are located in singular

computers/“peers”, which together form an overall “OntoP2P” network.

61

62

As such, the decentralised ontologies located in the various “peer” units,

which make up the network, could be built using such ontological languages as OWL

or DAML+OIL. The subsequent ontologies produced by all the participating “peers”

(whatever their “taxonomic levels” or degrees of completeness) would be entered

into the “OntoP2P” network to “compete” with each other.

In addition, a “neural network” behaviour would be embedded in this

“OntoP2P” framework. That is to say, the ontological content/input of each

individual “peer” entering the network would be monitored and logged. The

“OntoP2P” neural network would then output an overall index of the ontological

content of all the competing “peers”. This index could be consulted to see if a

consistent/consensus order emerges (what conceptual terminology is common to all,

new fundamental terminologies, repetitions, etc.).

In short, one ontology of a particular domain would be synthesised,

maintained and continuously updated throughout this “OntoP2P” networked

environment, which would easily allow an iterative process.

Ontologies could be built and treated more in the manner of philosopher

Wittgenstein's “Language-Games” (1953) vision of the constitution of language, as

being essentially not monolithic but rather a multiple structure (in use and task).

Word count: 14,896

62

63

Bibliography (AS) Applied Semantics (2003). About Applied Semantics (Online). www.appliedsemantics.com (Accessed 27 May 2003). Ashenhurst, R.L. (1996). Ontological aspects of information modelling. Minds and Machines, 6, 287-394. Bateman, J.A. (1995). On the relationship between ontology construction and natural language: a socio-semiotic view. International Journal of Human-Computer Studies, 43, 929-944. Berners-Lee,T. and Miller, E. (2002). The Semantic Web lifts off (Online) http://www.ercim.org/publication/Ercim_News/enw51/berners-lee.html (Accessed 22 May 2003). Benjamins, V.R. et al (1999). (KA)2: building ontologies for the Internet -- a mid-term report International Journal of Human-Computer Studies, 51, 687-712. Berners-Lee, T. et al (2001). The Semantic Web. Scientific American. (Online) "http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21" (Accessed 22 May 2003). Berners-Lee, T. and Fischetti, M. (1999). Weaving the web. Harper San Francisco: USA. Borst, W.N. et al (1997). Engineering ontologies. International Journal of Human-Computer Studies, 46, 365-406. Burgun, A. et al (2001). Issues in the design of medical ontologies used for knowledge sharing Journal of Medical Systems, 25(2), 95-106. DAML (2003a). DAML+OIL (Online). http://www.daml.org/about.html (Accessed July 11 2003).

63

64

DAML (2003b). DAML tools (Online). http://www.daml.org/tools/#all (Accessed July 11 2003). DAML (2002). Why use daml (Online). http://www.daml.org/2002/04/why.html (Accessed 13 July 2003). DAML (2001a). DAML+OIL index (Online). http://www.daml.org/2001/03/daml+oil-index (Accessed 13 July 2003). DAML (2001b) DAML examples: http://www.daml.org/2001/03/daml+oil Ding, Y. and Foo, S. (2002a). Ontology research and development Part 1: a review of ontology generation Journal of Information Science, 28 (2), 123/136. Ding, Y. and Foo, S. (2002b). Ontology research and development Part 2: a review of ontology mapping and evolving Journal of Information Science. 28 (5), 375/388. Ding, Y. and Fensel, D. (2000). Ontology library systems: the key to successful ontology re-use.(Online). www.cs.vu.nl/~ying,~dieter (Accessed 30 June 2003). Everett, J.O. et al (2002). Making ontologies work for resolving redundancies across documents Communications of the ACM. 45(2), 57 69. FAO (2003a). FAO FI (Online). http://www fao.org/fi (Accessed 29 May 2003). FAO (2003b) AOS (Online). http://www.fao.org/agris/aos/ (Accessed 29 May 2003). FAO (2002a) FAO Asfa (Online). http://www fao.org/asfa (Accessed 29 May 2003). FAO (2002b). FAO Agrovoc (Online). http://www fao.org/agrovoc (Accessed 29 May 2003).

64

65

Farquhar, A. et al (1997). The Ontolingua Server: a tool for collaborative ontology construction. International Journal of Human-Computer Studies. 46, 707-727. Gaines, B. (1997). Using explicit ontologies in knowledge-based system development. International Journal of Human-Computer Systems. 46, 181-9. Gangemi, A. (2003). Some tools and methodologies for domain ontology building. Comparative and Functional Genomics. 4, 104–110. Google (2003). Google acquires Applied Semantics. (Online). http://www.google.com/press/pressrel/applied.html (Accessed 24 July 2003). GO (2003). Gene Ontology Consortium (Online). http://www.geneontology.org/ (Accessed 1 July 2003). Gruber, T.R. (1995). Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies. 43(5/6), 907-928. Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition. 5,199-220. Gruninger, M. and Lee, J. (2002). Ontology applications and design: Introduction. Communications of the ACM. 45(2), 39-41. Grüninger, M. and Usehold, M. (1996). Ontologies: principles, methods and applications. Knowledge Engineering Review. 11(2), 110-127. Guarino, N. and Welty, C. (2000). A formal ontology of properties. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 30 June 2003). Guarino, N. et al (1999). OntoSeek: content-based access to the web. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 30 June 2003).

65

66

Guarino, N. (1998). Formal ontology and information Systems. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 30 June 2003). Guarino, N. (1997). Understanding, building and using ontologies. International Journal of Human-Computer Studies. 46, 293-310. Guarino, N. (1995). Formal ontology, conceptual analysis and knowledge representation. International Journal of Human-Computer Studies. 43(5/6), 625-640. Harris, M. and Parkinson, H. (2002). Conference report: standards and ontologies for functional genomics: towards unified ontologies for biology and biomedicine. Comparative and Functional Genomics. 4, 116–120. Holsapple, C.W. and Joshi K.D. (2002). A collaborative approach to ontology design. Communications of the ACM Vol. 45(2), 42-47. Infomis (2003) (Online). The institute for formal ontology and medical information science. http://www.ifomis.uni-leipzig.de/ (Accessed 2 June 2003). Kokla, M. and Kavouras, M. (2001). Fusion of top-Level and geographical domain ontologies based on context Formation and complementarity. International Journal of Geographical Information Science.15(7), 679-687. Kwasnik, B. (1999). The role of classification in knowledge representation and discovery. Library Trends. 48(1), 22-47. Manchester University (2003) OILED (Online). http://oiled.man.ac.uk/ (Accessed 7 July 2003). Marla, M. (2002). Applied Semantics: making meaning matter. E-Content Magazine. June 2002, 34-39. McCray, A.T. (2003). An upper-level ontology for the biomedical domain Comparative and Functional Genomics. 4, 80–84.

66

http://www.ifomis.uni-leipzig.de/

67

Nilsson, N. (1991) Logic and artificial intelligence. Journal of Artificial Intelligence. 3, 31-55. Onefish (2003). Onefish org (Online). http://www onefish.org (Accessed 23 April 2003). OntoKnowledge (2002). Welcome to oil (Online). http://www.ontoknowledge.org/oil/ (Accessed 13 July 2003). OntoKnowledge (2000) Harmelen, F,V. and Horrocks, I. (Online). FAQ: OIL, the Ontology Inference Layer for the semantic web. www.ontoknowledge.oil/faq (Accessed 12 July 2003). OntoLab (2003). Laboratory for Applied Ontology, Institute of Cognitive Sciences and Technology, National Research Council. (Online) http://www.ladseb.pd.cnr.it/ (Accessed 30 June 2003). OntoWeb (2003) About OntoWeb (Online). www.ontoweb.org (Accessed 30 May 2003). Paling, S. and Qin, J.(2001). Converting a controlled vocabulary into an ontology: the case of GEM. Information Research. 6(2),120-38. Partridge, C. (2002). The role of ontology in integrating semantically heterogeneous databases. http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Online). (Accessed 5 July 2003). PC: The Plant Ontology Consortium (2002). The Plant Ontology Consortium and plant ontologies. Comparative and Functional Genomics. 3, 137–142. Pisanelli, D.M. et al (2002). Ontologies and nformation Systems: the marriage of the century? Proceedings of Lyee Workshop, Paris. (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 3 July 2003).

67

68

Pisanelli, D. M. et al (2000). The role of ontologies for an effective and unambiguous dissemination of clinical guidelines (Online). http://www.ladseb.pd.cnr.it/infor/Ontology/Publications.html (Accessed 12 July 2003). Ratsch, E. et al (2003). Developing a protein-interactions ontology. Comparative and Functional Genomics. 4, 85–89. Raymond, E. S. (1998). The Cathedral and the Bazaar. (Online). http://www.firstmonday.org/issues/issue3_3/raymond/index.html (Accessed 23 June 2003). Schlosser, M. et al (2002). A Scalable and ontology-based P2P infrastructure for Semantic Web Services IEEE International Conference on Peer-to-Peer Computing. (Online). http://citeseer.nj.nec.com/schlosser02scalable.html. (Accessed 7 July 2003). Smith, B. (2002). From classical metaphysics to medical informatics (Online). http://ontology.buffalo.edu/smith (Accessed 5 July 2003). Standford University (2002). Protege editor (Online). http://protege.stanford.edu/ (Accessed 12 July 2003). Udo, H. (2003). Turning informal thesauri into formal ontologies: a feasibility study on biomedical knowledge re-use. Comparative and Functional Genomics. 4, 94–97. Usehold M. et al (1998). The enterprise ontology. Knowledge Engineering Review.13, 31–89. W3C (2003a). OWL, Web Ontology Language: overview (Working Draft 31 March 2003) (Online). http://www.w3.org/TR/2003/WD-owl-features-20030331/ (Accessed 30 June 2003). W3C (2003b). OWL, Web Ontology Language: test Cases (Working Draft 28 May 2003). (Online). http://www.w3.org/TR/2003/WD-owl-test-20030528/ (Accessed 30 June 2003).

68

69

W3C (2003c). OWL Web Ontology Language guide (Working Draft 31 March 2003). (Online). http://www.w3.org/TR/2003/WD-owl-guide-20030331/ (Accessed 30 June 2003). W3C (2002). Web-Ontology (WebOnt) Working Group (Online) http://www.w3c.org/2001/sw/WebOnt/ (Accessed 20 July 2003). W3C (2001a). Annotated daml-oil ontology markup (Online). http://www.w3.org/TR/daml+oil-walkthru/ (Accessed 25 June 2003). W3C (2001b). Semantic Web (Online). http://www.w3.org/2001/sw/ (Accessed 3 June 2003) Weinstein, P.C. and Birmingham, P. (1998). Creating ontological metadata for digital library content and services. International Journal on Digital Libraries. 2(1), 19-36. Weinstein, P. (1998). Ontology-based metadata: transforms the MARC legacy. In Akscyn, F. and Shipman, F,M. (Edit). Digital Libraries 98: third ACM Conference on digital libraries. ACM Press. 254-263. Wittgenstein, L. (1953) Philosophical Investigations Blackwell Ltd Oxford 1991. XML.Com (2002). Denny, M. Ontology building: a survey of editing tools (Online). http://www.xml.com/pub/a/2002/11/06/ontologies.html (Accessed 20 July 2003). XML.Com (2000). Dumbill, E. Tim Berners Lee's lecture, Conference Xml 2000. (Online). http://www.xml.com/pub/a/2000/12/xml2000/timbl.html (Accessed 23 June 2003).

69

the creation and use of ontologies ... - information...

Documents