ldk r logics for data and knowledge representation the dera methodology

34
L Logics for D Data and K Knowledge R Representation The DERA methodology

Upload: chloe-adams

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LDK R Logics for Data and Knowledge Representation The DERA methodology

LLogics for DData and KKnowledgeRRepresentation

The DERA methodology

Page 2: LDK R Logics for Data and Knowledge Representation The DERA methodology

Outline Overview: the semantic heterogeneity problem

Ontologies

The DERA methodology

Use cases

Projects

2

Page 3: LDK R Logics for Data and Knowledge Representation The DERA methodology

The semantic heterogeneity problem

3

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

The difficulty of establishing a certain level of connectivity between people, software agents or IT systems [Uschold & Gruninger, 2004] at the purpose of enabling each of the parties to appropriately understand the exchanged information [Pollock, 2002]

Page 4: LDK R Logics for Data and Knowledge Representation The DERA methodology

Early solutions

4

Physical connectivity relies on the presence of a stable communication channel between the parties, for instance ODBC data gateways and software adapters.

Syntactic connectivity is established by instituting a common vocabulary of terms to be used by the parties or by point-to-point bridges that translate messages written in one vocabulary in messages in the other vocabulary.

This rigidity and lack of explicit meaning causes very high maintenance costs (up to 95% of the overall ownership costs) as well as integration failure (up to 88% of the projects) [Pollock, 2002]

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 5: LDK R Logics for Data and Knowledge Representation The DERA methodology

The semantic interoperability solution

5

The solution in three points:

Semantic mediation: the usage of an ontology, providing a shared vocabulary of terms with explicit meaning.

Semantic mapping: using the ontology, the establishment of a mapping constituted by a set of correspondences between semantically similar data elements independently maintained by the parties.

Context sensitivity: the mapping has contextual validity, i.e. it has to be used by taking into account the conditions and the purposes for which it was generated.

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 6: LDK R Logics for Data and Knowledge Representation The DERA methodology

Ontologies An ontology is an explicit

specification of a shared conceptualization [Gruber, 1993]

Ontologies are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts

By providing a common formal terminology and understanding of a given domain of interest, it allows for automation (logical inference), supports reuse and favor interoperability across applications and people.

They differ according to the purpose and the semantics

6

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

Eats

Eats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 7: LDK R Logics for Data and Knowledge Representation The DERA methodology

Definition of ontology in detail An ontology is an explicit specification of a shared conceptualization

[Gruber, 1993]

Conceptualization: an abstract model of how people theorize (part of) the world in terms of basic cognitive units called concepts. Concepts represent the intention, i. e. the set of properties that distinguish the concept from others, and summarize the extension, i.e. the set of objects having such properties.

Explicit specification: the abstract model is made explicit by providing names and definitions for the concepts, i.e. the name and the definition of the concept provide a specification of its meaning in relation with other concepts.

Formal specification: when it is written in a language with formal syntax and formal semantics, i.e. in a logic-based language.

Shared conceptualization: it captures knowledge which is common to a community of people and therefore represents concretely the level of agreement reached in that community.

7

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 8: LDK R Logics for Data and Knowledge Representation The DERA methodology

Kinds of ontologies [Uschold and Gruninger, 2004]

8

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 9: LDK R Logics for Data and Knowledge Representation The DERA methodology

Classification vs. Descriptive Ontologies Classification ontologies

They are used to classify things, such as books, documents, web pages, etc.; the purpose is to provide domain specific terminology and organize individuals accordingly. Such ontologies usually take the form of classifications with (BT\NT\RT) or without explicit relations.

Descriptive ontologies

They are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc.; the purpose is to offer an unambiguous description of the world. Relations are typically explicit (e.g. is-a) and can be of any kind.

See [Giunchiglia et al., 2009]9

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 10: LDK R Logics for Data and Knowledge Representation The DERA methodology

Classification ontologiesOVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 11: LDK R Logics for Data and Knowledge Representation The DERA methodology

Descriptive ontologiesOVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 12: LDK R Logics for Data and Knowledge Representation The DERA methodology

Classification vs. Real World semantics Classification ontologies are in classification semantics

In classification ontologies, the extension of each concept (label of a node) is the set of documents about the entities or individual objects described by the label of the concept. For example, the extension of the concept animal is “the set of documents about animals” of any kind.

Descriptive ontologies are in real world semantics

In descriptive ontologies, concepts represent real world entities. For example, the extension of the concept animal is the set of real world animals, which can be connected via relations of the proper kind.

12

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 13: LDK R Logics for Data and Knowledge Representation The DERA methodology

13

How to build a robust ontology?We answer to this question with DERA

[Giunchiglia et al., 2013]

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 14: LDK R Logics for Data and Knowledge Representation The DERA methodology

Domains Any area of knowledge or field of study that we are interested in

or that we are communicating about that deals with specific kinds of entities:

conventional fields of study (e.g. physics, mathematics) applications of pure disciplines (e.g. engineering,

agriculture) any aggregate of such fields (e.g. physical sciences, social

sciences) capturing knowledge about our everyday lives (e.g. music,

movie, sport, recipes, tourism)

Domains are the main means by which the diversity of the world is captured, in terms of language, knowledge and personal experience. 14

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 15: LDK R Logics for Data and Knowledge Representation The DERA methodology

How to build domains? Several methodologies have been developed for the construction

and maintenance of vocabularies in specific domains

Among them, the faceted approach [Ranganathan, 1967] is known to have great benefits in terms of quality and scalability of the developed resources

BUT the faceted approach - and Knowledge Organization (KO) in general - aims at the development of controlled vocabularies built as classification ontologies, while Knowledge Representation (KR) needs descriptive ontologies

15

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 16: LDK R Logics for Data and Knowledge Representation The DERA methodology

The faceted approach as from KO The faceted approach is a well-established methodology used in

library & information science (LIS) for the organization of knowledge in libraries [Ranganathan, 1967]

It is based on the fundamental notions of domain and facets, which allow capturing the different aspects of a domain and, at the same time, allow for an incremental growth of knowledge.

Originally facets were of 5 types (PMEST): Personality, Matter, Energy, Space, Time).

For instance, in the medicine domain the important facets are the body parts, the diseases that affect them and the treatment to cure or prevent them.

16

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 17: LDK R Logics for Data and Knowledge Representation The DERA methodology

From KO to KR: DERA How to build high quality and scalable descriptive ontologies? DERA is faceted as it is inspired to the principles and canons of

the faceted approach by Ranganathan DERA is a KR approach as it models entities of a domain (D) by

their entity classes (E), relations (R) and attributes (A)

17

R AD

FACET

E

CATEGORY

ARRAY

CONCEPT

Entity Classes Relations AttributesDomain

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 18: LDK R Logics for Data and Knowledge Representation The DERA methodology

Elements of DERAA DERA domain is a triple D = <E, R, A> where: E (for Entity) is a set of facets grouping terms denoting entity classes, whose instances (the entities) have either perceptual or conceptual existence. Terms in these hierarchies are explicitly connected by is-a or part-of relation.R (for Relation) is a set of facets grouping terms denoting relations between entities. Terms in these hierarchies are connected by is-a relation.A (for Attribute) is a set of facets grouping terms denoting qualitative/quantitative or descriptive attributes of the entities. We differentiate between attribute names and attribute values such that each attribute name is associated corresponding values. Attribute names are connected by is-a relation, while attribute values are connected to corresponding attribute names by value-of relations.

18

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 19: LDK R Logics for Data and Knowledge Representation The DERA methodology

Mapping DERA to Description Logic (DL) is-a, part-of and value-of relations form the backbone of facets,

are assumed to be transitive and asymmetric, and hence are said to be hierarchical.

Other relations, whenever defined, not having such properties, are said to be associative and connect terms in different facets.

All together facets constitute the TBox of a descriptive ontology.

The mapping of E/R/A above to DL should be obvious:o Classes correspond to conceptso Relations and Attributes to roleso is-a and part-of correspond to subsumption (⊑) between classes

and between roleso value-of correspond to value restrictions

19

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 20: LDK R Logics for Data and Knowledge Representation The DERA methodology

An example of DERA domain

20

ENTITY

Location Landform (is-a) Natural elevation (is-a) Continental elevation (is-a) Mountain (is-a) Hill (is-a) Oceanic elevation (is-a) Seamount (is-a) Submarine hill (is-a) Natural depression (is-a)Continental depression (is-a) Valley (is-a) Trough (is-a) Oceanic depression (is-a) Oceanic valley (is-a) Oceanic trough Body of water (is-a) Flowing body of water (is-a) Stream, Watercourse (is-a) River (is-a) Brook (is-a) Still body of water (is-a) Lake (is-a) Pond

RELATION

Direction (is-a) East (is-a) North (is-a) South (is-a) West Relative level (is-a) Above (is-a) Below

Containment (is-a) part-of

ATTRIBUTE

NameLatitude Longitude Altitude AreaPopulation  Depth (value-of) deep (value-of) shallow Length (value-of) long (value-of) short 

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 21: LDK R Logics for Data and Knowledge Representation The DERA methodology

Steps in the DERA approach Step 1: Identification of the atomic concepts

(E) watercourse, stream: a natural body of running water flowing on or under the earth

Step 2: Analysis a body of water a flowing body of water no fixed boundary confined within a bed and stream banks larger than a brook

21

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 22: LDK R Logics for Data and Knowledge Representation The DERA methodology

Steps in the DERA approach (cont.) Step 3: Synthesis

(E) Body of water

(is-a) Flowing body of water

(is-a) Stream

(is-a) Brook

(is-a) River

(is-a) Still body of water

(is-a) Pond

(is-a) Lake

Step 4: Standardization

(E) stream, watercourse: a natural body of running water flowing on or under the earth22

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 23: LDK R Logics for Data and Knowledge Representation The DERA methodology

Steps in the DERA approach (cont.) Step 5: Ordering

Terms and concepts in the facets are ordered

Step 6: Formalization

Descriptive ontologies are translated into Description Logic formal ontologies, e.g.,:

Lake Body-of-Water⊑

Lake (Garda Lake)

Depth (Garda Lake, deep)

23

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 24: LDK R Logics for Data and Knowledge Representation The DERA methodology

Guiding principles [Dutta et al., 2011]

24

Principle Example

Relevance breed is more realistic to classify the universe of cows instead of by grade

Ascertainability flowing body of water

Permanence spring as a natural flow of ground water

Exhaustiveness to classify the universe of people, we need both male and female

Exclusiveness age and date of birth, both produce the same divisions

Context bank, a bank of a river, OR, a building of a financial institution

Currency metro station vs. subway station

Reticence minority author, black man

Ordering stream preferred to watercourse

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 25: LDK R Logics for Data and Knowledge Representation The DERA methodology

Advantages of DERA DERA facets have explicit semantics and are modeled as

descriptive ontologies

DERA facets inherits all the important properties of the faceted approach, such as robustness and scalability

DERA allows for automated reasoning via the formalization into Description Logics ontologies. In particular, DERA allows for a very expressive search by any entity property

25

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 26: LDK R Logics for Data and Knowledge Representation The DERA methodology

26

Concrete use-case of the application of DERA principles

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 27: LDK R Logics for Data and Knowledge Representation The DERA methodology

Problems with WordNet

27

The position of nodes is driven by syntax

Glosses exhibit space and time bias

Some concepts are too similar in meaning

Some concepts are actually individuals

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 28: LDK R Logics for Data and Knowledge Representation The DERA methodology

Reorganizing WordNet with DERA

28

Educational Institution <by level of complexity>PreschoolSchool

Primary schoolSecondary schoolPost-secondary school

<by programme orientation>Training schoolVocational schoolTechnical schoolGraduate school

CollegeUniversity

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 29: LDK R Logics for Data and Knowledge Representation The DERA methodology

Reorganizing WordNet with DERA

29

Preschool (an educational institution for children too young for primary school)

Nursery school, kindergarten (a preschool where children below the age of compulsory education play and learn)

Playschool (a part time preschool where children meeting for half-day session

School (an educational institution designed for the teaching of students (or "pupils") under the direction of teachers)

Primary school (a school for children where they receive the first stage of compulsory education)

Infant school (a primary school for very young children where they learn basic reading and writing skills)

Junior school (a primary school for young children where they learn about some simple basic disciplines like mathematics, geography and history)

Secondary school (a school for students intermediate between primary school and tertiary school)

Junior high school (a secondary school where students learn about complex aspects of basic disciplines)

Senior high school (a secondary school where students can opt for their field of study)…

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 30: LDK R Logics for Data and Knowledge Representation The DERA methodology

30

Concrete projects with DERA

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 31: LDK R Logics for Data and Knowledge Representation The DERA methodology

The space ontology [Giunchiglia et al., 2012]

Objects Quantity

Entity classes (E) 845

Entities (e) 6,907,417

Relations (R) 70

Attributes (A) 31

Knowledge is extracted from GeoNames and the Getty Thesaurus of Geographic Names

Terms are collected, categorized into classes, entities, relations and attributes, and synsets are generated

Synsets are mapped to and integrated with WordNet

Synsets are analyzed and arranged into facets Terms are standardized and ordered

Landform Natural depression Oceanic depression Oceanic valley Oceanic trough Continental depression Trough Valley Natural elevation Oceanic elevation Seamount Submarine hill Continental elevation Hill Mountain

Body of water Flowing body of water Stream River Brook Stagnant body of water Lake Pond

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 32: LDK R Logics for Data and Knowledge Representation The DERA methodology

The semantic-geo catalogue [Farazi et al., 2012]

Objects QuantityFacets 5Entity classes (E) 39Entities (e) 20,162part-of relations 20,161Alternative names 7,929

Knowledge is extracted from the geographical dataset of the Province of Trento

The faceted ontology was built in English and Italian Usage of the ontology

The ontology is used in combination with S-Match within the search component of the geo-catalogue to improve search

The evaluation shows that at the price of a drop in precision of 0.16% we double recall

Body of water Lake Group of lakes Stream River Rivulet Spring Waterfall Cascade Canal   Natural elevation Highland Hill Mountain Mountain range Peak Chain of peaks GlacierNatural depression Valley Mountain pass

OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS

Page 33: LDK R Logics for Data and Knowledge Representation The DERA methodology

References Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge

Acquisition, 5 (2), 199–220. Maltese V. (2012). Dealing with semantic heterogeneity in classifications. PhD thesis:

http://eprints-phd.biblio.unitn.it/700/ (see chapters 1.1. and 2.1) Uschold, M., Gruninger, M. (2004). Ontologies and semantics for seamless connectivity. SIGMOD

Rec., 33(4), 58–64. Pollock, J. (2002). Integration’s Dirty Little Secret: It’s a Matter of Semantics. Whitepaper, The

Interoperability Company. Giunchiglia, F., Dutta, B., Maltese, V. (2009). Faceted lightweight ontologies. In “Conceptual

Modeling: Foundations and Applications”, LNCS 5600 Springer. Giunchiglia, F., Dutta, B., Maltese, V. (2013). From Knowledge Organization to Knowledge

Representation. ISKO UK conference. Ranganathan (1967). Prolegomena to library classification. Asia Publishing House. Dutta, B., Giunchiglia, F. and Maltese, V. (2011). A Facet-based Methodology for Geo-Spatial

Modeling. GEOS conference, 6631, pp 133–150. Farazi, F., Maltese, V., Dutta, B., Ivanyukovich, A., V. Rizzi (2013). A semantic geo-catalogue for a

local administration. AI Review Journal, 40 (2), 193-212. Giunchiglia, F., Dutta, B., Maltese, V., Farazi, F., 2012. A facet-based methodology for the

construction of a large-scale geospatial ontology. Journal on Data Semantics, 1 (1), pp. 57-73.

33

Page 34: LDK R Logics for Data and Knowledge Representation The DERA methodology

Thank you!Questions?