1 sri international bioinformatics the ocelot frame knowledge representation system peter d. karp,...

18
1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International [email protected]

Upload: lily-galloway

Post on 26-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

1 SRI International Bioinformatics

The Ocelot Frame Knowledge Representation System

Peter D. Karp, Ph.D.Bioinformatics Research Group

SRI International

[email protected]

Page 2: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

2 SRI International Bioinformatics

Frame Knowledge Representation Systems

Long history of development in the AI knowledge representation community

Distant cousin of object-oriented databases (convergent evolution)

Background reading on frame systems P. Karp, “The design space of frame knowledge

representation systems” http://www.ai.sri.com/pubs/files/236.pdf

P. Karp, “Distinguishing Knowledge Bases and Data Bases: Who's on First and What's on Second”

http://www.ai.sri.com/pubs/files/1397.pdf

Page 3: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

3 SRI International Bioinformatics

Ocelot Information

P.D. Karp et al, “A collaborative environment for authoring large knowledge bases,” J Intelligent Information Systems 13:155-94 1999.

http://www.ai.sri.com/pkarp/pubs/99jiis.pdf

“Ocelot User’s Guide”

http://www.ai.sri.com/pkarp/ocelot/

Page 4: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

5 SRI International Bioinformatics

Ocelot Data Model

Ocelot database Aka DB, Knowledge Base, KB, PGDB

An Ocelot database is a collection of frames and slots

Page 5: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

6 SRI International Bioinformatics

Ocelot Frames

Two kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle

A symbolic frame name (id, key) uniquely identifies each frame

Examples: EG10223, TRP, Proteins

Classes have Superclass(es), Subclass(es), Instance(s) Instances have one or more parent classes

Page 6: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

7 SRI International Bioinformatics

SlotsEncode attributes and properties of a frame

Molecular weight, gene coordinates, commentsRepresent relationships between frames

The value of a slot is the identifier of another frame

Page 7: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

8 SRI International Bioinformatics

Slots

Number of values Single valued Multivalued: sets or lists

Slot values Integer, real, string, symbol (frame name)

Every slot is described by a “slot frame” (slotunit) in a KB that defines meta information about that slot

Datatype, classes it pertains to, constraints Enumerations Two slots are inverses if they encode opposite relationships

Slot Product in class Genes Slot Gene in class Polypeptides

Page 8: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

10 SRI International Bioinformatics

Ocelot Schema

Schema is stored within the DBSchema is self documentingSlot frames define metadata about slots

Schema evolution facilitated by Easy addition/removal of slots, or alteration of slot datatypes Flexible data formats that do not require dumping/reloading of

data New versions of Pathway Tools include a schema upgrade

function Updates schema to match that of new MetaCyc version Transforms data into new schema

Page 9: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

12 SRI International Bioinformatics

Figure showing multiple users tapping into one mysql server

Page 10: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

13 SRI International Bioinformatics

Ocelot Storage Subsystem

RDBMS KBs

RDBMS schema is independent of application schema

DBMS is submerged within Ocelot, invisible to users

Frames transferred from DBMS to Ocelot On demand By background prefetcher Memory cache Persistent disk cache speeds performance via Internet

Page 11: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

14 SRI International Bioinformatics

Ocelot Frame Faulting

When a frame is referenced by Pathway Tools Look in Ocelot virtual memory Look in disk cache Look in RDBMS

Page 12: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

15 SRI International Bioinformatics

Ocelot RDBMS Transaction History

RDBMS KBs store complete transaction history

Stored as sequences of GFP operations executed by the user or by Pathway Tools

Right click -> Show -> Changes in pop-up window

Used to compute gene last-curated date

Can be used to open a PGDB in an earlier state

Page 13: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

16 SRI International Bioinformatics

Ocelot RDBMS Concurrency Control

When user A saves updates: Ocelot queries all transactions that occurred since A last

saved or since the start of A’s session Ocelot compares the operations in those transactions with the

updates made by A If conflicts are found, save does not occur and conflicts are

reported to the user If no conflicts, save proceeds Other user transactions are evaluated into A’s session

“Refresh”

Page 14: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

17 SRI International Bioinformatics

Ocelot Update Conflicts

Example conflicting updates: User A deletes frame F ; User B modifies value in slot F User A changes MW of protein P from 3 to 4 ; User B

changes MW of protein P from 3 to 5

Example of updates that don’t conflict: User A updates frame E ; User B updates frame F User A updates the value of P.MW ; User B updates the

value of P.pI Users A and B both delete all values of P.MW

Page 15: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

18 SRI International Bioinformatics

Revert KB Operation

Undoes all changes in current session

Page 16: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

19 SRI International Bioinformatics

Pathway Tools / BioCycSoftware/Database Bundles

Each downloadable Pathway Tools configuration contains a combination of PGDBs

Those PGDBs are loaded into Lisp virtual memoryBuild process:

Start Common Lisp Load in all Pathway Tools compiled Lisp code into virtual

memory Load in all PGDBs for that configuration into virtual memory Save virtual memory image as binary executable file

Page 17: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

20 SRI International Bioinformatics

“Full BioCyc” or Tier 1+2+3 Configuration

507 PGDBs loaded into virtual memory

Page 18: 1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

21 SRI International Bioinformatics

BioCyc at 10,000 Genomes

Scalability of current approach is limited

New approach: For full BioCyc, store PGDBs not in virtual memory but in Franz AllegroCache

AllegroCache is a Common Lisp object-oriented database

Implementation now in hand for OcelotWe have done extensive performance testingPerformance looks good to 10,000 PGDBs