1 sri international bioinformatics the ocelot frame knowledge representation system peter d. karp,...
TRANSCRIPT
1 SRI International Bioinformatics
The Ocelot Frame Knowledge Representation System
Peter D. Karp, Ph.D.Bioinformatics Research Group
SRI International
2 SRI International Bioinformatics
Frame Knowledge Representation Systems
Long history of development in the AI knowledge representation community
Distant cousin of object-oriented databases (convergent evolution)
Background reading on frame systems P. Karp, “The design space of frame knowledge
representation systems” http://www.ai.sri.com/pubs/files/236.pdf
P. Karp, “Distinguishing Knowledge Bases and Data Bases: Who's on First and What's on Second”
http://www.ai.sri.com/pubs/files/1397.pdf
3 SRI International Bioinformatics
Ocelot Information
P.D. Karp et al, “A collaborative environment for authoring large knowledge bases,” J Intelligent Information Systems 13:155-94 1999.
http://www.ai.sri.com/pkarp/pubs/99jiis.pdf
“Ocelot User’s Guide”
http://www.ai.sri.com/pkarp/ocelot/
5 SRI International Bioinformatics
Ocelot Data Model
Ocelot database Aka DB, Knowledge Base, KB, PGDB
An Ocelot database is a collection of frames and slots
6 SRI International Bioinformatics
Ocelot Frames
Two kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle
A symbolic frame name (id, key) uniquely identifies each frame
Examples: EG10223, TRP, Proteins
Classes have Superclass(es), Subclass(es), Instance(s) Instances have one or more parent classes
7 SRI International Bioinformatics
SlotsEncode attributes and properties of a frame
Molecular weight, gene coordinates, commentsRepresent relationships between frames
The value of a slot is the identifier of another frame
8 SRI International Bioinformatics
Slots
Number of values Single valued Multivalued: sets or lists
Slot values Integer, real, string, symbol (frame name)
Every slot is described by a “slot frame” (slotunit) in a KB that defines meta information about that slot
Datatype, classes it pertains to, constraints Enumerations Two slots are inverses if they encode opposite relationships
Slot Product in class Genes Slot Gene in class Polypeptides
10 SRI International Bioinformatics
Ocelot Schema
Schema is stored within the DBSchema is self documentingSlot frames define metadata about slots
Schema evolution facilitated by Easy addition/removal of slots, or alteration of slot datatypes Flexible data formats that do not require dumping/reloading of
data New versions of Pathway Tools include a schema upgrade
function Updates schema to match that of new MetaCyc version Transforms data into new schema
12 SRI International Bioinformatics
Figure showing multiple users tapping into one mysql server
13 SRI International Bioinformatics
Ocelot Storage Subsystem
RDBMS KBs
RDBMS schema is independent of application schema
DBMS is submerged within Ocelot, invisible to users
Frames transferred from DBMS to Ocelot On demand By background prefetcher Memory cache Persistent disk cache speeds performance via Internet
14 SRI International Bioinformatics
Ocelot Frame Faulting
When a frame is referenced by Pathway Tools Look in Ocelot virtual memory Look in disk cache Look in RDBMS
15 SRI International Bioinformatics
Ocelot RDBMS Transaction History
RDBMS KBs store complete transaction history
Stored as sequences of GFP operations executed by the user or by Pathway Tools
Right click -> Show -> Changes in pop-up window
Used to compute gene last-curated date
Can be used to open a PGDB in an earlier state
16 SRI International Bioinformatics
Ocelot RDBMS Concurrency Control
When user A saves updates: Ocelot queries all transactions that occurred since A last
saved or since the start of A’s session Ocelot compares the operations in those transactions with the
updates made by A If conflicts are found, save does not occur and conflicts are
reported to the user If no conflicts, save proceeds Other user transactions are evaluated into A’s session
“Refresh”
17 SRI International Bioinformatics
Ocelot Update Conflicts
Example conflicting updates: User A deletes frame F ; User B modifies value in slot F User A changes MW of protein P from 3 to 4 ; User B
changes MW of protein P from 3 to 5
Example of updates that don’t conflict: User A updates frame E ; User B updates frame F User A updates the value of P.MW ; User B updates the
value of P.pI Users A and B both delete all values of P.MW
18 SRI International Bioinformatics
Revert KB Operation
Undoes all changes in current session
19 SRI International Bioinformatics
Pathway Tools / BioCycSoftware/Database Bundles
Each downloadable Pathway Tools configuration contains a combination of PGDBs
Those PGDBs are loaded into Lisp virtual memoryBuild process:
Start Common Lisp Load in all Pathway Tools compiled Lisp code into virtual
memory Load in all PGDBs for that configuration into virtual memory Save virtual memory image as binary executable file
20 SRI International Bioinformatics
“Full BioCyc” or Tier 1+2+3 Configuration
507 PGDBs loaded into virtual memory
21 SRI International Bioinformatics
BioCyc at 10,000 Genomes
Scalability of current approach is limited
New approach: For full BioCyc, store PGDBs not in virtual memory but in Franz AllegroCache
AllegroCache is a Common Lisp object-oriented database
Implementation now in hand for OcelotWe have done extensive performance testingPerformance looks good to 10,000 PGDBs