![Page 1: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/1.jpg)
How pathway databases were created and curated
Peifen Zhang
Plant Metabolic Network (PMN)
![Page 2: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/2.jpg)
About PMN, http://plantcyc.org
![Page 3: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/3.jpg)
PMN is
• A network of plant metabolic pathway databases and database curation community
– A plant reference database, PlantCyc• Genes, enzymes and pathways consolidated from all plant species
– A collection of single-species pathway databases• Pathway Genome Databases (PGDB)• Genes, enzymes and pathways in a particular species
– A community for data curation• Curators at databases (PMN, Gramene, SGN etc)• Researchers in the plant biochemistry field
![Page 4: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/4.jpg)
Prediction of PGDBs, why
• Huge sequence data are generated from genome and EST projects
• Put individual genes into a metabolic network
• Use the network to visualize and analyze large experimental data sets, discover missing enzymes, design metabolic engineering, conduct comparative and evolutionary studies
![Page 5: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/5.jpg)
Creation of PGDBs, how
• Manual extraction of pathways from the literature, assigning genes/enzymes to pathways
• Computational assigning genes/enzymes to reference pathways, manual validation/correction and further curation
![Page 6: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/6.jpg)
Prediction of PGDBs, how
• Annotated sequences, molecular function
• A reference database (such as MetaCyc and PlantCyc)
• PathoLogic (Pathway Tools software)
![Page 7: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/7.jpg)
PathoLogic
ANNOTATED GENOME
AT1G69370chorismate mutase
chorismate mutase
prephenate aminotransferase
arogenate dehydratase
chorismate prephenate L-arogenate L-phenylalanine5.4.99.5 2.6.1.79 4.2.1.91
Gene calls
Gene functions
DNA sequences
AT1G69370
chorismate mutase
MetaCyc
PGDB
![Page 8: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/8.jpg)
A snap shot of AraCyc
• Arabidopsis genome – 27,235 protein coding genes
• AraCyc– 6158 enzyme coding genes– 2733 genes are assigned to reactions– 1914 genes are assigned to pathways
![Page 9: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/9.jpg)
Currently available PGDBsSpecies Database Status
Arabidopsis TAIR Substantial curation
Rice Gramene Some curation
Sorghum Gramene No curation
Medicago Noble Foundation some curation
Tomato SGN some curation
Potato SGN No curation
Pepper SGN No curation
Tobacco SGN No curation
Petunia SGN No curation
Coffee SGN No curation
![Page 10: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/10.jpg)
Prediction of new PGDBs by PMN
• Prioritization– Available sequences, economic impact
• High priority– Maize, Poplar, Soybean, Wheat
• Second priority– Cotton, Grape, Sugarcane, Sunflower,
Switchgrass…
![Page 11: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/11.jpg)
A quality database REQUIRES manual validation and curation
![Page 12: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/12.jpg)
Validation: pruning false-positive predictions
• Pathways not operating in plants or not in a target species– glycogen biosynthesis– C4 photosynthesis– caffeine biosynthesis
• Pathways operating via a different route– Phenylalanine biosynthesis in bacteria v.s. in
plants
![Page 13: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/13.jpg)
![Page 14: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/14.jpg)
Validation: adding evidence and literature supports
![Page 15: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/15.jpg)
Pathways are supported by different evidence
• Pathways supported by molecular data• enzymes and genes
• Pathways based on radio tracer experiments• no enzymes or genes
• Expert hypothesis (paper chemistry)
• Pure computational prediction
![Page 16: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/16.jpg)
Correcting pathway diagrams
![Page 17: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/17.jpg)
Curating missing pathways
• What information are curated from the literature– Pathway: diagram, summary, evidence,
citations– Reaction: co-substrates, EC number– Compound: name and synonyms, structure– Enzyme: coding gene, physical-/biochemical
properties, evidence, comments, citations
![Page 18: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/18.jpg)
Source of literature
• PubMed, SciFinder
• Special journals (i.e. phytochemistry),
• Books in specialized field (i.e. alkaloids)
![Page 19: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/19.jpg)
Curation workflow
identify a pathway
find details of reactions
find details of enzymes
data entry
• structure of substrates
• EC number
• enzymes
• physical & chemical properties
• coding gene
• reactions
• species
draw pathway diagram
![Page 20: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/20.jpg)
Current curation priority
• Big economic impact– Bio-energy production, i.e. cell wall
components– Industrial material, i.e. rubber– Medicinal metabolites
• Under-represented domains– i.e. quinones, volatiles
![Page 21: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/21.jpg)
The importance of community contribution, why we need your help
• A mountain of information – 17 million citations in PubMed alone– 4208 citations in PlantCyc
• Triage the most up-to-date and most relevant references
• Synthesize and extract information from individual papers
![Page 22: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/22.jpg)
The importance of community contribution, why we need your help
• Limited human resource– curator (3 at PMN, 1 at SGN, 1 at Gramene)
• Limited expertise– molecular biologist, may be familiar in one
particular pathway, but certainly not all the pathways.
![Page 23: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/23.jpg)
How you can help
• Expedite data coverage– Submitting a pathway, an enzyme, a bunch of
compounds
• Enhance data accuracy – Reporting errors
• Your idea/need of new features and functionalities
![Page 24: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/24.jpg)
Data submission forms
![Page 25: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/25.jpg)
Reporting errors
![Page 27: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/27.jpg)
The PMN project, us and you
PlantCyc
poplar
wheat
maize
AraCyc
tomato rice
medicago
sugarcane other…
MetaCyc
![Page 28: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/28.jpg)
![Page 29: How pathway databases were created and curated](https://reader034.vdocuments.us/reader034/viewer/2022051419/5681588d550346895dc5ecb3/html5/thumbnails/29.jpg)
Type of pathway databases
• Multi-species– MetaCyc (Universal, from microbes to plants to
human)– PlantCyc (Plant kingdom)– BIACyc (a specific clade, for alkaloid biosynthesis)
• Single-species (Pathway Genome Database, PGDB)– AraCyc (Arabidopsis)– LycoCyc (tomato)– RiceCyc– etc