integration of e. coli data (e. coli pathway and genomic data from biocyc) jesse walsh
TRANSCRIPT
Outline
• Description of BioCyc data– Format– Key Classes
• How I am retrieving and storing the data– SPDB schema– Key tables
• Recent Developments
BioCyc Data Format
• Frames are made of slots– Slots are made of facets– Slots values can have annotations
Slot
Slot
Slot
Frame
Facet Facet Facet
Annotation
Annotation
Reaction X
Common Name
EC #
Reactants
Coefficient
Compartment
:VALUE-TYPE, :DOCUMENTATION
Key Classes in BioCyc
• Genes • Proteins • Polypeptides (a subclass of Proteins) • Protein-Complexes (a subclass of Proteins) • Pathways • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters http://bioinformatics.ai.sri.com/ptools/classes.html
Why not just use BioCyc?
• Advantages:– Fast access to individual objects– Logic based assertions
• Disadvantages– Hard to query– Difficult to understand the structures– Difficult to know all of what is in the database– Difficult to integrate other types of data
• Solution:– Create a relational database
Pathway
• “Central” table• Allows organization of major pathways• Easy to retrieve a pathway, or all reactions
that share a pathway with a specified reaction
Reaction
• Reactions types include: – Catalysis, Spontaneous, Transcription, Translation,
Promoter, Transcription Factor• Transcription, Translation, Promoter, and TF reactions
are all inferred reactions• Reactions are the “nodes” of networks in SPBD
Entity• Entities include:
– Compound, Protein (Complex/Monomer), Gene, Transcription Unit, Promoter
• Entities with multiple types are represented with the most specific type in its hierarchy– (i.e. A protein that is also a complex will be listed as “Complex”, not
“Protein”– “Enzyme” status is stored as a participation type
Participation in Reactions
• Entities participate in reactions• Information includes km data• Unsure if condition data exists, and unsure
how to access evidence data
Data Links in BioCycPathway
Reaction
Reactants/Products Enzymes/Cofactors
Genes
Transcriptional Unit
Promoter
Transcription Factor Sigma Factor
Translation Reaction
Transcription Reaction
Promoter Relation
Activation/RepressionSpecificity Relation
Data Retrieval StrategyPathway
Reaction
Reactants/Products Enzymes/Cofactors
Genes
Transcriptional Unit
Promoter
Transcription Factor Sigma Factor
Translation Reaction
Transcription Reaction
Promoter Relation
Activation/RepressionSpecificity Relation
1
2
3
Improvements to SPDB
• Explicitly organize pathway networks and reaction networks
• Allow recursive tracing of pathway elements
Better Way
RxnRxn
Rxn
RxnRxn
Pathway
Explicitly link reactions in the context of individual pathways
Recursively Tracing the DataPathway
Reaction
Reactants/Products Enzymes/Cofactors
Genes
Transcriptional Unit
Promoter
Transcription Factor Sigma Factor
Translation Reaction
Transcription Reaction
Promoter Relation
Activation/RepressionSpecificity Relation
Genes of TFs
Coefficient Data for Reactions
6 ATP + 3 L-serine + 3 2,3-dihydroxybenzoate 6 diphosphate + 6 AMP + enterobactin + 9 H+
Flow of Data (The Big Picture)• Data is imported from BioCyc (EcoCyc + MetaCyc)• Changes can be made to BioCyc via Cell Designer, which will then be
propagated to SPDB• Biomart is one option to directly view data in SPDB
BioCycPGDB SPDBJavaCycConnection BioCycImporter
Lisp Based DB MySQL Object Oriented DB
API based on JavaCyc
Cell Designer BioMart
Researcher