![Page 1: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/1.jpg)
Pathway/GenomeDatabases:
Concepts and SoftwareTools
Peter D. Karp, Ph.D.Bioinformatics Research Group
SRI International
http://www.ai.sri.com/pkarp/
http://BioCyc.org/
![Page 2: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/2.jpg)
SRI InternationalBioinformaticsOverview
l Pathway/genome databases
l Pathway Tools software
l EcoCyc and MetaCyc
lCharacterization of the E. coli metabolic network
![Page 3: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/3.jpg)
SRI InternationalBioinformaticsWhat to do When Theories Become
Larger than Minds can Grasp?
l Example: E. coli genetic network
l Control by 97 transcription factors of 1174 genes in 630transcription units
l Example: E. coli metabolic network
l 160 pathways involving 744 reactions and 791 substrates
l Partition theories across multiple minds
lRely on the printed word
![Page 4: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/4.jpg)
SRI InternationalBioinformaticsLimitations
lCannot effectively
l Evaluate them for internal consistency
l Evaluate them for consistency with new data: microarrays
l Refine them with respect to new data
l Integrate across them to produce system understanding
l They are too large and complex
l The printed word cannot be manipulatedeffectively
![Page 5: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/5.jpg)
SRI InternationalBioinformaticsSolution:
Biological Knowledge Bases
l Store biological knowledge and theories in computers in adeclarative form
l Amenable to computational analysis and generative user interfaces
l Accepted to store data in computers, but not knowledge
l Refined, interpreted, consensus views
l Establish ongoing efforts to curate (maintain, refine,embellish) these knowledge bases
l Such knowledge bases are an integral part of the scientificenterprise
![Page 6: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/6.jpg)
SRI InternationalBioinformaticsOrganism-Specific
Pathway/Genome Databases
l Layer functional information above the genome
lRich ontology to encode biological informationwith high fidelity
l Chromosomes, genes, operons, gene products, reactions,pathways
lCurated by experts for that organism
l Integrate literature and computational predictions
![Page 7: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/7.jpg)
SRI InternationalBioinformaticsPathway/Genome Database
Chromosomes,Plasmids
Genes
Proteins
Reactions
Pathways
Compounds
CELL
Operons,Promoters,DNA Binding Sites
![Page 8: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/8.jpg)
SRI InternationalBioinformaticsPathway Tools Software
l PathoLogicl Prediction of metabolic network from genomel Computational creation of new Pathway/Genome Databases
l Pathway/Genome Editorsl Distributed curation of genome annotationsl Distributed object database systeml Interactive editing tools
l Pathway/Genome Navigatorl WWW publishing of PGDBsl Graphic depictions of pathways, chromosomes, operonsl Analysis operations
u Pathway visualization of gene-expression datau Global comparisons of metabolic networks
![Page 9: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/9.jpg)
SRI InternationalBioinformaticsSequence Project Workflow
Raw Sequence
Phred
Phrap
CONSED
BLAST, BLOCKS
GeneMark/Glimmer
PathoLogic
P/G Navigator
P/G Editors
WWW Publishing Analyses
![Page 10: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/10.jpg)
SRI InternationalBioinformaticsBioCyc Collection of
Pathway/Genome DBs
Literature-based Datasets:
lMetaCyc
lEscherichia coli (EcoCyc)
Computationally DerivedDatasets:
lAgrobacterium tumefaciens
lCaulobacter crescentus
lChlamydia trachomatis
lBacillus subtilis
lHelicobacter pylori
lHaemophilus influenzae
lMycobacterium tuberculosis
lMycoplasma pneumonia
lPseudomonas aeruginosa
lSaccharomyces cerevisiae
lTreponema pallidum
http://BioCyc.org/
![Page 11: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/11.jpg)
SRI InternationalBioinformaticsEcoCyc Project Overview
l E. coli Encyclopedia
l Model-Organism Database for E. colil Tracks the evolving annotation of the E. coli genome
l Over 3500 literature citationsl Collaborative development via Internet
l Karp (SRI) -- Bioinformatics architect
l Riley (MBL) -- Metabolic pathways, signal transduction
l Saier (UCSD) and Paulsen (TIGR)-- Transport
l Collado (UNAM)-- Regulation of gene expression
l Ontology: 1000 biological classes
l Database content: 16,000 instances
![Page 12: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/12.jpg)
SRI InternationalBioinformatics
EcoCyc = E.coli Dataset + Pathway/Genome Navigator
Genes: 4,393
Proteins: 4,273
Reactions: 2,760
Pathways: 165
Compounds: 774
http://BioCyc.org/
Transcription Units: 684 Factors: 108
Enzymes: 914Transporters: 162
Promoters: 781TransFac Sites: 910
Citations: 3,508
![Page 13: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/13.jpg)
SRI InternationalBioinformaticsEcoCyc Pathways
lBiosynthesis of amino acids, purines,pyrimidines, fatty acids, cofactors (heme, biotin,folic acid, etc)
lCatabolism of fatty acids, D-glucuronate,L-alanine, L-arabinose, fucose, galactonate,galactose, glucose, mannose, ribose, xylose
l Entner-Doudoroff pathway, TCA cycle,fermentation, gluconeogenesis, glycerolmetabolism, glycolysis, glyoxylate cycle, pentosephosphate pathway
![Page 14: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/14.jpg)
SRI InternationalBioinformaticsMotivations for Understanding
Schema
l Pathway Tools visualizations and analysesdepend upon the software being able to findprecise information in precise places within aPathway/Genome DB
lWhen writing Lisp complex queries to PGDBs,those queries must name classes and slots withinthe schema
lA Pathway/Genome Database is a web ofinterconnected objects; each object represents abiological entity
![Page 15: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/15.jpg)
SRI InternationalBioinformaticsWeb of Relationships for One Enzyme
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
![Page 16: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/16.jpg)
SRI InternationalBioinformaticsFrames
l Entities with which facts are associated
l Kinds of frames:
l Classes: Genes, Pathways, Biosynthetic Pathways
l Instances (objects): trpA, TCA cycle
l Classes:
l Superclass(es)
l Subclass(es)
l Instance(s)
l A symbolic frame name (id, key) uniquely identifies eachframe
![Page 17: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/17.jpg)
SRI InternationalBioinformaticsSlots
l Encode attributes/properties of a frame
l Integer, real number, string
lRepresent relationships between frames
l The value of a slot is the identifier of another frame
l Every slot is described by a “slot frame” in a KBthat defines meta information about that slot
![Page 18: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/18.jpg)
SRI InternationalBioinformaticsSlot Links
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
product
component-of
catalyzes
reaction
in-pathway
![Page 19: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/19.jpg)
SRI InternationalBioinformaticsRepresentation of Function
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
EC#Keq
CofactorsInhibitors
Molecular wtpI
Left-end-position
![Page 20: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/20.jpg)
SRI InternationalBioinformaticsMonofunctional Monomer
Gene
Reaction
Enzymatic-reaction
Monomer
Pathway
![Page 21: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/21.jpg)
SRI InternationalBioinformaticsBifunctional Monomer
Gene
Reaction
Enzymatic-reaction
Monomer
Pathway
Reaction
Enzymatic-reaction
![Page 22: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/22.jpg)
SRI InternationalBioinformaticsMonofunctional Multimer
Monomer Monomer Monomer Monomer
Gene Gene Gene Gene
Reaction
Enzymatic-reaction
Multimer
Pathway
![Page 23: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/23.jpg)
SRI InternationalBioinformaticsPathway and Substrates
Reactant-1
Reaction
Pathway
ReactionReactionReaction
Reactant-2
Product-2
Product-1
in-pathwayleft
right
![Page 24: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/24.jpg)
SRI InternationalBioinformaticsTranscriptional Regulation
site001
pro001
trpE
trpD
trpC
trpB
trpA
trpL
Int003 RpoSig70
TrpR*trpInt001
trpLEDCBA
trp
apoTrpRInt005
![Page 25: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/25.jpg)
SRI InternationalBioinformaticsPrinciple Classes
l Class names are capitalized, plural
l Genetic-Elements, with subclasses:l Chromosomesl Plasmids
l Genes
l Transcription-Units
l RNAs
l Proteins, with subclasses:l Polypeptidesl Protein Complexes
![Page 26: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/26.jpg)
SRI InternationalBioinformaticsPrinciple Classes
lReactions, with subclasses:
l Transport-Reactions
l Enzymatic-Reactions
l Pathways
lCompounds-And-Elements
![Page 27: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/27.jpg)
SRI InternationalBioinformaticsSlots in Multiple Classes
lCommon-Name
l Synonyms
lNames (computed as union of Common-Name,Synonyms)
lComment
lCitations
lDB-Links
![Page 28: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/28.jpg)
SRI InternationalBioinformaticsGenes Slots
lChromosome
l Left-End-Position
lRight-End-Position
lCentisome-Position
l Transcription-Direction
l Product
![Page 29: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/29.jpg)
SRI InternationalBioinformaticsProteins Slots
lMolecular-Weight-Seq
lMolecular-Weight-Exp
l pI
l Locations
lModified-Form
lUnmodified-Form
lComponent-Of
![Page 30: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/30.jpg)
SRI InternationalBioinformaticsPolypeptides Slots
lGene
![Page 31: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/31.jpg)
SRI InternationalBioinformaticsProtein-Complexes Slots
lComponents
![Page 32: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/32.jpg)
SRI InternationalBioinformaticsReactions Slots
l EC-Number
l Left, Right
l Substrates (computed as union of Left, Right)
lDeltaG0
lKeq
l Spontaneous?
l Species
![Page 33: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/33.jpg)
SRI InternationalBioinformaticsEnzymatic-Reactions Slots
l Enzyme
lReaction
lActivators
l Inhibitors
l Physiologically-Relevant
lCofactors
l Prosthetic-Groups
lAlternative-Substrates
lAlternative-Cofactors
![Page 34: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/34.jpg)
SRI InternationalBioinformaticsPathways Slots
lReaction-List
l Predecessors
l Primaries
![Page 35: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/35.jpg)
SRI InternationalBioinformatics
MetaCyc Overview
lMeta Metabolic Encyclopedia
l 445 pathways, 1115 enzymes, 4218 reactions
l 173 E. coli pathways; 158 organisms
l 2381 citations
l Literature-based DB with extensive referencesand commentary
l Pathways, reactions, enzymes, substrates
![Page 36: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/36.jpg)
SRI InternationalBioinformaticsMetaCyc Frequent Organisms
7M. pneumoniae
7P. putida
8S. cerevisiae
12M. capricolum
15Hp. influenzae
17Pseudomonas
18Soybean
18B. subtilis
20Sf. sulfataricus
31Ho. sapiens
35Sm. typhimurium
173E. coli
![Page 37: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/37.jpg)
SRI InternationalBioinformaticsMetaCyc Data
lMetaCyc contains one DB object for each distinctpathway
l Distinct in terms of reaction steps
l Each pathway labeled with species it occurs in
lMetaCyc pathways are experimentally determined
l 4218 reactions in MetaCyc
l 401 lack EC numbers
![Page 38: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/38.jpg)
SRI InternationalBioinformaticsMetaCyc Enzyme Data
lReaction(s) catalyzed
lAlternative substrates
lCofactors / prosthetic groups
lActivators and inhibitors
l Subunit structure
lMolecular weight, pI
lComment, literature citations
l Species
![Page 39: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/39.jpg)
SRI InternationalBioinformaticsMetaCyc Super-Pathways
l Groups of pathways linked by common substrates
l Example: Super-pathway containing
l Chorismate biosynthesis
l Tryptophan biosynthesis
l Phenylalanine biosynthesis
l Tyrosine biosynthesis
l Super-pathways defined by listing their componentpathways
l Multiple levels of super-pathways can be defined
l Pathway layout algorithms accommodate super-pathways
![Page 40: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/40.jpg)
SRI InternationalBioinformaticsComparison of MetaCyc to KEGG
lDatal KEGG has no literature citations, no commentsl KEGG has no detailed information about enzymes (inhibitors,
subunits)l KEGG pathways are composites of pathways found in many
organismsu Unclear what sub-pathways occur in what organisms
l Software toolsl KEGG has no algorithmic visualization toolsl KEGG has no queryable metabolic-map overview diagraml KEGG has no interactive editing tools
![Page 41: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/41.jpg)
SRI InternationalBioinformaticsEcoCyc/MetaCyc Availability
lWWW EcoCyc-Plus freely availablel EcoCyc, MetaCycl Pathway/genome DBs for 12 other organisms
lhttp://BioCyc.org/
lOn-site EcoCyc-Plus freely available to non-profits
l Flatfilesl Binary executable: Hardware requirements
u Sun UltraSparc-170 w/ 64MB memoryu PC, 500MHz CPU, 64MB memory, Windows-98
![Page 42: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/42.jpg)
SRI InternationalBioinformatics
EcoCyc and MetaCyc:Resources for Microbial GenomeAnalysis
l E. coli has large fraction of gene functionsidentified experimentally
lAssigning function by similarity to E. coli genesless likely to introduce annotation errors
l Predict metabolic pathways of other microbesusing MetaCyc
![Page 43: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/43.jpg)
SRI InternationalBioinformaticsApplications of EcoCyc and MetaCyc
lReference sources on E. coli and metabolism
l Sequence/pathway analysis of microbial genomes
lAnalysis of gene-expression data
lComputer-aided education
lAnti-microbial drug discovery
l Pathway engineering
l Investigations of
l Comparative metabolism
l Global properties of E. coli metabolic network
![Page 44: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/44.jpg)
SRI InternationalBioinformaticsPathway Tools Software
l PathoLogicl Prediction of metabolic network from genomel Computational creation of new Pathway/Genome Databases
l Pathway/Genome Editorsl Distributed curation of genome annotationsl Distributed object database systeml Interactive editing tools
l Pathway/Genome Navigatorl WWW publishing of PGDBsl Graphic depictions of pathways, chromosomes, operonsl Analysis operations
u Pathway visualization of gene-expression datau Global comparisons of metabolic networks
![Page 45: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/45.jpg)
SRI InternationalBioinformaticsImplementation Details
lAllegro Common Lisp
l Sun and PC platforms
lOcelot object database
l Lisp-based WWW server at BioCyc.org
l CWEST-based
l Manages 14 organism DBs
![Page 46: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/46.jpg)
SRI InternationalBioinformaticsPathway Tools Architecture
Object DBMS
GFP API
PathwayGenome Navigator
WWWServer
X-Windows Graphics
Object EditorPathway EditorReaction Editor
Oracle
![Page 47: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/47.jpg)
SRI InternationalBioinformaticsOcelot Knowledge Server
Architecture
l Frame data modell Classes, instances, inheritancel Classes and instances both treated as data
l Persistent storage via disk files, Oracle DBMSl Concurrent development: Oraclel Single-user development: disk filesl Read-only delivery: bundle data into binary program
l Transaction logging facilitylOptimistic concurrency-control protocoll Schema evolutionl Local disk cache to improve Internet performance
![Page 48: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/48.jpg)
SRI InternationalBioinformaticsEcoCyc WWW Server
![Page 49: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/49.jpg)
SRI InternationalBioinformaticsVisualization and Editing Tools
l Full Metabolic Map
l Pathways
lReactions
lCompounds
l Enzymes, Transporters, Transcription Factors
lGenes
lChromosomes
lOperons
![Page 50: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/50.jpg)
SRI InternationalBioinformatics
Inference of Metabolic Pathways
GenomicMap
Genes
Gene Products
Reactions
Pathways
Compounds
Pathway/Genome Database
PathoLogicList of Genes/ORFs
List of Gene Products
ANNOTATED GENOMEStructured ASCII Text File
DNA Sequence
MetaCyc
![Page 51: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/51.jpg)
![Page 52: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/52.jpg)
![Page 53: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/53.jpg)
![Page 54: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/54.jpg)
SRI InternationalBioinformaticsPathoLogic Analysis Phases
l Trial parsing of input data files
lAutomated build of initial PGDB
l Initialize schema of new PGDB
l Create DB objects for chromosomes, genes, proteins
l Predict reactions and pathways present
lDefine protein complexes
lDefine metabolic overview diagram
![Page 55: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/55.jpg)
SRI InternationalBioinformaticsPathoLogic Pathway Prediction
l Create associations between enzymes and metabolicreactions
l Reactions and substrates imported from MetaCycl Automatically via EC numbersl Automatically via enzyme name matchingl Manuallyl CC0092 / galE / “UDP-glucose-4-epimerase” / EC 5.1.3.2l UDP-D-glucose à UDP-galactose
l Import from MetaCyc all pathways associated with inferredreactions
l UDP-D-glucose à UDP-galactose is a reaction of:l galactose metabolism, UDP-glucose conversion,l lactose degradation 4, colanic acid building blocks biosynthesis
l Prune out pathways with insufficient evidence
![Page 56: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/56.jpg)
SRI InternationalBioinformaticsPathoLogic Prunes Pathways With
Insufficient Evidence
lNo unique enzyme AND EITHER
l 1 reaction present for pathway greater than 2steps
l Set of reactions present is a subset of reactionspresent in another pathway
l There exists a variant pathway with moreevidence
![Page 57: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/57.jpg)
SRI InternationalBioinformaticsPathoLogic: Inference of
Pathway Complement
l Extends the paradigm of genome analysis
l Predicted genes placed in their biochemicalcontext
l Information reduction device
l Assess coherence of the set of genes in a genome
l Identifies pathway holes and singleton enzymes
l Provides a framework for analysis of functional-genomicsdata
![Page 58: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/58.jpg)
![Page 59: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/59.jpg)
![Page 60: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/60.jpg)
SRI InternationalBioinformaticsPathway Comparisons
Eco Mtb Bsu Hin Sce Hpy
Eco 130 103 92 90 84 73
Mtb 103 84 79 82 70
Bsu 96 77 72 65
Hin 90 67 61
Sce 84 64
Hpy 74
Mp
![Page 61: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/61.jpg)
SRI InternationalBioinformaticsSummary
l Pathway/Genome Databases
l 14 PGDBs available through SRI at BioCyc.org
l Computational theories of biochemical machinery
l Pathway Tools software
l Extract pathways from genomes
l Distributed curation tools
l Query, visualization, WWW publishing
l Analysis algorithms
![Page 62: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter](https://reader031.vdocuments.us/reader031/viewer/2022021811/5cb3fd7b88c9939d228b82f1/html5/thumbnails/62.jpg)
SRI InternationalBioinformaticsAcknowledgements
l SRI: Suzanne Paley, Pedro Romero, John Pick
l EcoCyc Project: Milton Saier, Julio Collado, Ian Paulsen,Monica Riley
l Stanford: Harley McAdams, Lucy Shapiro, Gary Schoolnik,Russ Altman
l Funding sources:l NIH National Center for Research Resourcesl Department of Energy Microbial Cell Projectl DARPA BioSpice, UPC
[email protected]://BioCyc.org/