a knowledge based interface for distributed biological databases

57
A knowledge based interface for distributed biological databases Paolo Bresciani and Paolo Fontana and Paolo Busetta brescian,pfontana,busetta @itc.it. ( )ITC-irst (TRENTO) and ( )IASMAA (San Michele a.A.) with the collaboration of Giorgio Valle and Stefano Toppo CRIBI - University of Padua NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.1

Upload: others

Post on 12-Sep-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A knowledge based interface for distributed biological databases

A knowledge based interface fordistributed biological databases

Paolo Bresciani

and Paolo Fontana�

and Paolo Busetta

brescian,pfontana,busetta

�@itc.it.

(

)ITC-irst (TRENTO) and (�

)IASMAA (San Michele a.A.)

with the collaboration of Giorgio Valle and Stefano Toppo

CRIBI - University of PaduaNETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.1

Page 2: A knowledge based interface for distributed biological databases

Outline of the Talk

� Motivation for new approaches in biological DBaccess

� The current state of the art (2 examples)

� Our Knowledge Based approach:

� an example of interaction

� some technical details

� Extending to multiple DBs

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.2

Page 3: A knowledge based interface for distributed biological databases

Biological Database access

The formulation of the intended query for retrieving thedesired data is a problem for every database user.As a simple example in Biology, consider the task ofsearching for KDEL receptor :

� what does the user exactly mean with KDELreceptor? Is she looking for the description of that

functionality; or for any protein with that functionality; or for

any genomic sequence that is expressed in such a protein?

� moreover, does the user really know all theconsequences of looking for all (let’s say) theprotein having KDEL receptor functionality?

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.3

Page 4: A knowledge based interface for distributed biological databases

Biological Database access cont’

It may be very useful to know some relevant limitationson the form of the query, when already someconstraints are imposed:E.g., KDEL receptor function can NOT be exhibited byany protein in the cell nucleus.

“protein located in the nucleus and with KDELreceptor function”is inconsistent: submitting it to any biological DBresults in a useless interaction (loss of time andmoney).

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.4

Page 5: A knowledge based interface for distributed biological databases

Biological Database access cont’

It may be very useful to know some relevant limitationson the form of the query, when already someconstraints are imposed:E.g., KDEL receptor function can NOT be exhibited byany protein in the cell nucleus.

� “protein located in the nucleus and with KDELreceptor function”is inconsistent: submitting it to any biological DBresults in a useless interaction (loss of time andmoney).

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.4

Page 6: A knowledge based interface for distributed biological databases

Main sources of errors in queries

Current query systems do not provide any support toavoid (or limit) the source of conceptual errors inqueries.

Many sources of errors:

� Lack of knowledge on the domain

Limited knowledge on some parts of the domain

Terminology disagreement

Little understanding of the domain representationinside the database: terminology, taxonomy,relationships, constraints

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.5

Page 7: A knowledge based interface for distributed biological databases

Main sources of errors in queries

Current query systems do not provide any support toavoid (or limit) the source of conceptual errors inqueries.

Many sources of errors:

� Lack of knowledge on the domain

� Limited knowledge on some parts of the domain

Terminology disagreement

Little understanding of the domain representationinside the database: terminology, taxonomy,relationships, constraints

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.5

Page 8: A knowledge based interface for distributed biological databases

Main sources of errors in queries

Current query systems do not provide any support toavoid (or limit) the source of conceptual errors inqueries.

Many sources of errors:

� Lack of knowledge on the domain

� Limited knowledge on some parts of the domain

� Terminology disagreement

Little understanding of the domain representationinside the database: terminology, taxonomy,relationships, constraints

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.5

Page 9: A knowledge based interface for distributed biological databases

Main sources of errors in queries

Current query systems do not provide any support toavoid (or limit) the source of conceptual errors inqueries.

Many sources of errors:

� Lack of knowledge on the domain

� Limited knowledge on some parts of the domain

� Terminology disagreement

� Little understanding of the domain representationinside the database: terminology, taxonomy,relationships, constraints

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.5

Page 10: A knowledge based interface for distributed biological databases

The current solutions

The common way to deal with the problem is by

� being as much expert as possible in the domain

� being as much aware as possible of the designand implementation details of the DB.

This may be sometimes interesting (domain knowl-

edge), even if difficult, but also tedious (DB design and

implementation details) specially when using several

and changing DBs.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.6

Page 11: A knowledge based interface for distributed biological databases

Our solution

We introduce a concept-demonstrator of a knowledgebased Visual Query System. It has been applied inthe context of the access to biological databases, withthe following advantages for the user:

� allows to interactively and iteratively buildconsistent queries only;� allows to interactively explore the databasesemantics by gradually browsing only theinteresting parts of the conceptual model;� uses simple, but effective, features for queryrefinement and generalization.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.7

Page 12: A knowledge based interface for distributed biological databases

A QBE [Zloof] interface example(The SRS: Sequence Retrieval System)

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.8

Page 13: A knowledge based interface for distributed biological databases

A slightly better example(The muscle-trait DB — CRIBI-UniPD)

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.9

Page 14: A knowledge based interface for distributed biological databases

Problems and difficulties

� In the first case, matching strings must beprovided

In the second case, a more “guided” interface isavailable, but the selection still is among long listsof terms

In any case no semantic support is provided.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.10

Page 15: A knowledge based interface for distributed biological databases

Problems and difficulties

� In the first case, matching strings must beprovided

� In the second case, a more “guided” interface isavailable, but the selection still is among long listsof terms

In any case no semantic support is provided.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.10

Page 16: A knowledge based interface for distributed biological databases

Problems and difficulties

� In the first case, matching strings must beprovided

� In the second case, a more “guided” interface isavailable, but the selection still is among long listsof terms

In any case no semantic support is provided.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.10

Page 17: A knowledge based interface for distributed biological databases

The “flat files” legacy problemID HSA010063 standard; DNA; HUM; 1730 BP.

AC AJ010063;

SV AJ010063.1

DT 01-OCT-1998 (Rel. 57, Created)

DT 07-JAN-2000 (Rel. 62, Last updated, Version 2)

DE Homo sapiens telethonin gene

KW telethonin gene.

OS Homo sapiens (human)

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;

OC Eutheria; Primates; Catarrhini; Hominidae; Homo.

RN [1]

RP 1-1730

RA Pallavicini A.L.;

RL Submitted (06-AUG-1998) to the EMBL/GenBank/DDBJ databases.

RL Pallavicini A.L., Complesso interdipartim. Vallisneri Dipartimento di

RL Biologia, Universita di Padova, via G.Colombo 3, 35121, ITALY.

DR SWISS-PROT; O15273; TELT_HUMAN.

FH Key Location/Qualifiers

FT source 1..1730

FT /chromosome="17"

FT /db_xref="taxon:9606"

FT /organism="Homo sapiens"

FT /map="q12"

FT promoter 504..529

FT /gene="telethonin"

FT 5’UTR 520..529

FT /evidence=EXPERIMENTAL

FT /gene="telethonin"

FT mRNA join(520..639,887..1730)

FT /gene="telethonin"

FT CDS join(530..639,887..1280)

FT /db_xref="SWISS-PROT:O15273"

FT /gene="telethonin"

FT /product="telethonin"

FT /protein_id="CAA08987.1"

FT /translation="MATSELSCEVSEENCERREAFWAEWKDLTLSTRPEEGCSLHEEDT

FT QRHETYHQQGQCQVLVQRSPWLMMRMGILGRGLQEYQLPYQRVLPLPIFTPAKMGATKE

FT EREDTPIQLQELLALETALGGQCVDRQEVAEITKQLPPVVPVSKPGALRRSLSRSMSQE

FT AQRG"

FT exon 520..639

FT /number=1

FT /gene="telethonin"

FT intron 640..886

FT /number=1

FT /gene="telethonin"

FT exon 887..1280

FT /number=2

FT /gene="telethonin"

FT TATA_signal 488..495

FT /gene="telethonin"

FT 3’UTR 1281..1730

FT /evidence=EXPERIMENTAL

FT /gene="telethonin"

FT polyA_signal 1707..1712

FT /evidence=EXPERIMENTAL

FT /gene="telethonin"

FT polyA_site 1730

FT /evidence=EXPERIMENTAL

FT /gene="telethonin"

FT STS 1451..1630

FT /note="mapped with RH GB4 on chromosome 17q12"

FT /gene="telethonin"

XX

SQ Sequence 1730 BP; 348 A; 486 C; 612 G; 280 T; 4 other;

gtggtgaggg tgactgggga ctaggcacta ggcctttggt gcaggcgcct gaggacktgg 60

ttgcactctc ccttctgggg atatgccctt gagcccaggc agaggagagc acagcccagg 120

gcaggacctg gcagccctgg tacagagccc agagggggca tcagttcctg ctggtcctgc 180

tctgtttaca gacaasctgc tgtcctccct gcaaagggga gtgggtgggg cagagggcaa 240

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.11

Page 18: A knowledge based interface for distributed biological databases

Terminology standardization

Fortunately some relevant steps ahead have beendone in the last few years.

In particular GeneOntology is one of the mostimportant efforts toward terminology standardization.It aims at providing a support for data-integration and

inter-operability among sequence data and data from functional

analyses.

This is crucial for the discovery of the functions of new sequences

by comparison with already studied and annotated sequences.

Molecular Function, Biological Process, and Cellular Component

are classified in three hierarchies.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.12

Page 19: A knowledge based interface for distributed biological databases

Terminology standardization

Fortunately some relevant steps ahead have beendone in the last few years.In particular GeneOntology is one of the mostimportant efforts toward terminology standardization.

It aims at providing a support for data-integration and

inter-operability among sequence data and data from functional

analyses.

This is crucial for the discovery of the functions of new sequences

by comparison with already studied and annotated sequences.

Molecular Function, Biological Process, and Cellular Component

are classified in three hierarchies.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.12

Page 20: A knowledge based interface for distributed biological databases

Terminology standardization

Fortunately some relevant steps ahead have beendone in the last few years.In particular GeneOntology is one of the mostimportant efforts toward terminology standardization.It aims at providing a support for data-integration and

inter-operability among sequence data and data from functional

analyses.

This is crucial for the discovery of the functions of new sequences

by comparison with already studied and annotated sequences.

Molecular Function, Biological Process, and Cellular Component

are classified in three hierarchies.NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.12

Page 21: A knowledge based interface for distributed biological databases

The Knowledge Based Approach

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.13

Page 22: A knowledge based interface for distributed biological databases

Intelligent Interface

Need of Intelligent Interfaces that must:

� be easy and intuitive to be used� be “natural” to be used and understood� require no knowledge of DB technologies or datarepresentation formalisms� possibly require only little or partial knowledge ofthe application domain� be capable to give some semantic advice to theusers

� reduce user’s cognitive effort.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.14

Page 23: A knowledge based interface for distributed biological databases

Semantics based approachfor query formulation support

An approach is needed that:

� is semantically well founded

� is based on a representation of the model/schemaof the DB and of the domain (biology)

� support specific kinds of reasoning (terminologicalreasoning)

a Knowledge Based approach

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.15

Page 24: A knowledge based interface for distributed biological databases

Semantics based approachfor query formulation support

An approach is needed that:

� is semantically well founded

� is based on a representation of the model/schemaof the DB and of the domain (biology)

� support specific kinds of reasoning (terminologicalreasoning)

� a Knowledge Based approach

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.15

Page 25: A knowledge based interface for distributed biological databases

The ingredients of our system

� Visual Interface for:

� listing domain concepts� representing queries� transforming queries

� A Conceptual Model representation:an Ontology (better, a Knowledge Base)(Tambis + part of GeneOntology)� A reasoner (the Description Logics (DL) ReasoneriFaCt [Horrocks – U.Manchester] + specific code).

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.16

Page 26: A knowledge based interface for distributed biological databases

Semantic checks

Some important features of the interface:

� Only consistent actions are allowed(actions that lead to consistent queries).

� Only relevant modifications are proposed(transformations that produce queriessemantically not equivalent to the original).

� Only close modifications are proposed(not all the consistent and relevant modificationsare proposed, but only those that lead to querieswith semantics close to the original one).

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.17

Page 27: A knowledge based interface for distributed biological databases

An example of interaction

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.18

Page 28: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.19

Page 29: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.20

Page 30: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.21

Page 31: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.22

Page 32: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.23

Page 33: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.24

Page 34: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.25

Page 35: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.26

Page 36: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.27

Page 37: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.28

Page 38: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.29

Page 39: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.30

Page 40: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.31

Page 41: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.31

Page 42: A knowledge based interface for distributed biological databases

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.31

Page 43: A knowledge based interface for distributed biological databases

KRR services

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.32

Page 44: A knowledge based interface for distributed biological databases

Knowledge Based Approach means . . .

Description Logics

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.33

Page 45: A knowledge based interface for distributed biological databases

Knowledge Based Approach means . . .Description Logics

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.33

Page 46: A knowledge based interface for distributed biological databases

DL are . . .

L � �

A � C

D � C � R � �

R �C � � � � � C

D � �

R �C � R �1 � R

n �C � R

n �C � � � � �

I � �

∆I � � I �

semantically sound (and complete) automaticreasoning services as:

� consistency-checking� subsumption� classification

KB = {C D . . . }NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.34

Page 47: A knowledge based interface for distributed biological databases

DL: FOL based semantics

��� � FA

γ

��� A

γ

� � � FP

α � β � � P

α � β �Infix Prefix Semantics

C �"! #$

C

� FC

�γ

C

%

D

� �!&

C D

FC

�γ

�('FD

γ

C

)

D

� #*

C D

FC

γ

�(+

FD

γ

,

R -C � �. .

R C� ,

x -FR

γ � x � / FC

x

0

R -C �21 #34R C

� 0

x -FR

γ � x � '

FC

x

......

...

R

51 �"6 !7 4 * 1 4

R

FR

β � α �

R 8 Q �29 #3 � #1 4

R Q

� 0

x FR

α x

� '

FQ

x β

......

...

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.35

Page 48: A knowledge based interface for distributed biological databases

The KB;;;;;;;;;;;;;; DISJOINT COVERING ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(FACT::disjoint NUCLEUS CELL ORGANELLE CYTOSOL RIBOSOME SPLICEOSOME

MEMBRANE ENDOPLASMIC-RETICULUM GOLGI-COMPLEX)

(FACT::implies CELLULAR-PART (:OR NUCLEUS ORGANELLE CYTOSOL RIBOSOME

SPLICEOSOME MEMBRANE ENDOPLASMIC-RETICULUM GOLGI-COMPLEX))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;; RELATIONSHIPS RESTRICTIONS ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(implies

(:AND protein (:SOME part-of nucleus))

(:ALL has-function (:OR nucleic-acid-binding transcription-factor-binding)))

(implies

(:AND protein (:SOME has-function (:OR nucleic-acid-binding transcription-factor-binding)))

(:ALL part-of nucleus))

(implies (:AND protein (:SOME has-function (:OR cell-adhesion signal-transduction))) (:ALL part-of cell-membrane))

(implies (:AND protein (:SOME part-of cell-membrane)) (:ALL has-function (:OR cell-adhesion signal-transduction)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;; RELATION DOMAIN AND RANGE ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(implies enzyme (:SOME has-function enzyme-function))

(implies enzyme (:ALL has-function enzyme-function))

(implies holoenzyme (:SOME has-function enzyme-function))

(implies holoenzyme (:ALL has-function enzyme-function))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;; CONCEPT DEFINITIONS ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(FACT::defprimconcept PROTEIN (:AND (:AT-LEAST 1 HAS-FUNCTION :TOP)

(:SOME HAS-FUNCTION BIOLOGICAL-FUNCTION)

(:ALL HAS-FUNCTION BIOLOGICAL-FUNCTION)

(:AT-LEAST 1 HAS-SEQUENCE :TOP)

(:ALL HAS-SEQUENCE PROTEIN-SEQUENCE)

; (:SOME TRANSLATED-FROM (:OR DNA MESSENGER-RNA))

; (:ALL TRANSLATED-FROM (:OR DNA MESSENGER-RNA))

(:SOME TRANSLATED-FROM MESSENGER-RNA)

(:ALL TRANSLATED-FROM MESSENGER-RNA)

;modificato (:OR DNA MESSENGER-RNA) da PF 08/03/2002

(:SOME HAS-SECONDARY-STRUCTURE PROTEIN-SECONDARY-STRUCTURE)

(:ALL HAS-SECONDARY-STRUCTURE PROTEIN-SECONDARY-STRUCTURE)

(:SOME HAS-TERTIARY-STRUCTURE TERTIARY-STRUCTURE)

(:ALL HAS-TERTIARY-STRUCTURE TERTIARY-STRUCTURE)

(:ALL HAS-QUARTERNARY-STRUCTURE QUARTERNARY-STRUCTURE)

(:SOME HOMOLOGOUS-TO (:OR PROTEIN NUCLEIC-ACID))

(:ALL HOMOLOGOUS-TO (:OR PROTEIN NUCLEIC-ACID))

(:SOME PART-OF (:OR EXTRA-CELLULAR CELLULAR-PART))

(:ALL PART-OF (:OR EXTRA-CELLULAR CELLULAR-PART))

MACROMOLECULAR-COMPOUND

(:SOME POLYMER-OF AMINO-ACID)

(:ALL POLYMER-OF AMINO-ACID)

(:ALL HAS-BOUND COFACTOR)))

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.36

Page 49: A knowledge based interface for distributed biological databases

Reasoning with DL: Refinement

Refine Biological-SpaceQ

:

x

;=< Protein

:

x

;?>A@ Enzyme

:

x

;>

has-function

:

x B w ;?>

Nucleic-Acid-Binding

:

w

;?>

is-in

:

x B y ;?>

Biological-Space

:

y

;?>

is-in

:

y B z ;>

Cell

:

z

;(AND Protein (NOT Ezime)

(SOME has-function Nucleic-Acid-Binding)

(SOME is-in (AND Biological-Space (SOME is-in Cell))

Ref=

MSS

�C

X

D

Q

E�

PROTEIN

% - - - % 0is-in - �

X

% 0

is-in -Cell

� � '

Q F Q E' G 0

Q

E E�

PROTEIN

% - - - % 0is-in - �

Y

% 0

is-in -Cell

� �

s -t - Y F X'

Q F

Q

E E F Q EH �

,

and:

MGS

Equivalences

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.37

Page 50: A knowledge based interface for distributed biological databases

Architecture

Socket Interface

KB Manager

iFact Reasoner

ODBC

PostreSQLUser

Interface

Configurationfiles

(eg. translationtables)

KB

MuscleTRAIT

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.38

Page 51: A knowledge based interface for distributed biological databases

Towards accessing more DBs

DB NDB 1

Wrapper NWrapper 1 ...

Queryplanner

Answerintegrator

Localschemataknowledge

Globalschema

integrator

Registerschema

Query

Mediator

Semanticinterface

KBQuery builder Reporter

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.39

Page 52: A knowledge based interface for distributed biological databases

Towards accessing more DBs

DB NDB 1

Wrapper NWrapper 1 ...

Queryplanner

Answerintegrator

Localschemataknowledge

Globalschema

integrator

Registerschema

Query

Mediator

Semanticinterface

KBQuery builder Reporter

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.39

Page 53: A knowledge based interface for distributed biological databases

Promising opportunities fromAgent Technologies

� accessing multiple DBs with one homogeneousinterface

� architectural flexibility (easily extensible)

� robustness (in case of redundancy)

� caching and notification (multiple users, repeated“similar” queries)

standard protocol for schemata interchange andfor communication (XML; DAML+OIL)

wrappers design and maintenance

query planning

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.40

Page 54: A knowledge based interface for distributed biological databases

Promising opportunities fromAgent Technologies

� accessing multiple DBs with one homogeneousinterface

� architectural flexibility (easily extensible)

� robustness (in case of redundancy)

� caching and notification (multiple users, repeated“similar” queries)

� standard protocol for schemata interchange andfor communication (XML; DAML+OIL)

� wrappers design and maintenance

� query planningNETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.40

Page 55: A knowledge based interface for distributed biological databases

Conclusions and FutureDevelopments

A VQS prototypical system, applied to the access tobiological databases, has been introduced.The use of KR reasoning tools adds interestingsemantic services, like:

� consistency checking of the queries;� intelligent incremental browsing and exploration ofthe database semantics;� effective features for relevant query refinementand generalization.

Some preliminary ideas on an agent basedarchitecture for accessing distributed DBs have beenpresented.

NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.41

Page 56: A knowledge based interface for distributed biological databases

Conclusions and FutureDevelopments

A VQS prototypical system, applied to the access tobiological databases, has been introduced.The use of KR reasoning tools adds interestingsemantic services, like:

� consistency checking of the queries;� intelligent incremental browsing and exploration ofthe database semantics;� effective features for relevant query refinementand generalization.

Some preliminary ideas on an agent basedarchitecture for accessing distributed DBs have beenpresented.NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.41

Page 57: A knowledge based interface for distributed biological databases

References

M. M. Zloof. Query-by-example: A database language. IBM System

Journal, 16(4):324-343, 1977�

T. Cartacci, S.K. Chang, M.F. Costabile, S. Levialdi and G. Santucci. A

graph-based framework for multiparadigmatic visual access to

database. IEEE Transactions on Knowledge and Data Engineering,

8(3):455-475, 1996.�

P. Bresciani, M. Nori and N. Pedot. A Knowledge Based Paradigm for

Querying Databases. Proc. DEXA 2000, LNCS #1873. Springer.�

P. Bresciani, M. Nori and N. Pedot. QueloDB: a Knowledge Based Visual

Query System. Proc. IC-AI 2000, vol. III, June 2000. CSREA Press.�

M. Ashburner at al. Creating the gene ontology resources: design and

implementation. Genome Research, 11(8):1425-1433, Aug. 2001.�

P. G. Baker, C. A. Goble, S. Bechhofer, N. W. Paton, R. Stevens and A.

Brass. An ontology for bioinformatics applications. Bionformatics,

15(6):510-520, 1999.NETTAB 2002. Bologna, July 13th, 2002. P. Bresciani: “A knowledge based interface for distributed biological databases” – p.42