kate dreher biocurator / plant molecular biologist the carnegie institution for science stanford, ca...

30
kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data and Tools for Analysis and Discovery

Upload: earl-cross

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

kate dreher

biocurator / plant molecular biologist

The Carnegie Institution for Science

Stanford, CA

Introduction to the Plant Metabolic Network: Data and Tools for

Analysis and Discovery

Page 2: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Free access to high quality, curated data promotes beneficial research on plant metabolism

Plants provide crucial benefits to humanity and the ecosystem

A better understanding of plant metabolism may contribute to:

More nutritious foods More pest-resistant plants More stress-tolerant crops Higher photosynthetic capacity

and higher yield in

agricultural and biofuel crops . . . many more applications

These efforts benefit from access to high quality plant metabolism data

Page 3: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

PMN provides data and analysis tools: www.plantcyc.org

Page 4: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

The Plant Metabolic Network has multiple goals

Support research, breeding, and education

Transform published results into data-rich metabolic pathways Capture information on Enzymes, Reactions, and Compounds

Create and deploy improved methods for predicting enzyme function and metabolic capacity using plant genome sequences

Provide public resources: PlantCyc AraCyc 16 additional species-specific databases

Facilitate data analysis

www.plantcyc.org

Page 5: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Numerous collaborators contribute to the Plant Metabolic Network

SRI International – BioCyc project Provide Pathway Tools Software

Maintain and update MetaCyc

Other collaborators / contributors include:

MaizeGDB GoFORSYS SoyBase Sol Genomics Network (SGN) / Boyce Thompson Institute Cosmoss Gramene PlantMetabolomics group . . . and more

SoyBase

Editorial BoardCommunity

Submissions

Page 6: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

17 PMN species are phylogenetically and "functionally" diverse

Database Species Pathways* Enzymes Reactions CompoundsPlantCyc 8.0 400+ 1050 188798 5332 4410AraCyc 11.5 Arabidopsis thaliana 597 9041 3490 2613

CassavaCyc 3.0 Manihot esculenta 491 10007 3058 2232ChineseCabbageCyc 1.0 Brassica rapa (+ spp.) 499 10976 3104 2270

GrapeCyc 3.0 Vitis vinifera 479 7572 3015 2229PapayaCyc 2.0 Carica papaya 481 5714 2999 2220PoplarCyc 6.0 Populus trichocarpa (+ spp.) 505 20822 3124 2295

SoyCyc 4.0 Glycine max 520 20317 3105 2273BarleyCyc 1.0 Hordeum vulgare 465 7572 2901 2135

BrachypodiumCyc 1.0 Brachypodium distachyon 473 8802 2915 2128CornCyc 3.5 Zea mays 484 13979 2821 2160OryzaCyc 1.0 Oryza sativa (+ spp.) 482 15677 3000 2226

SetariaCyc 1.0 Setaria italica 477 10214 2942 2145SorghumBicolorCyc 1.0 Sorghum bicolor 480 8630 2939 2141

SwitchgrassCyc 1.0 Panicum virgatum 479 17319 2985 2184MossCyc 2.0 Physcomitrella patens 416 7805 2713 1901

SelaginellaCyc 2.0 Selaginella moellendorffii 421 6462 2737 1987ChlamyCyc 3.5 Chlamydomonas reinhardtii 349 3330 2263 1514

Eudicots

Cereals

Basal land plants

Green alga

• Major crops

• Model species

• Legume

• Woody and herbaceous species

• Annuals and perennials

• C3 and C4 species

Page 7: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

PlantCyc provides access to important pathways not found in species-specific databases

PlantCyc contains over 1000 pathways and information from over 400 plant species

Database Species Pathways* Enzymes Reactions CompoundsPlantCyc 8.0 400+ 1050 188798 5332 4410AraCyc 11.5 Arabidopsis thaliana 597 9041 3490 2613

CassavaCyc 3.0 Manihot esculenta 491 10007 3058 2232ChineseCabbageCyc 1.0 Brassica rapa (+ spp.) 499 10976 3104 2270

GrapeCyc 3.0 Vitis vinifera 479 7572 3015 2229PapayaCyc 2.0 Carica papaya 481 5714 2999 2220PoplarCyc 6.0 Populus trichocarpa (+ spp.) 505 20822 3124 2295

SoyCyc 4.0 Glycine max 520 20317 3105 2273BarleyCyc 1.0 Hordeum vulgare 465 7572 2901 2135

BrachypodiumCyc 1.0 Brachypodium distachyon 473 8802 2915 2128CornCyc 3.5 Zea mays 484 13979 2821 2160OryzaCyc 1.0 Oryza sativa (+ spp.) 482 15677 3000 2226

SetariaCyc 1.0 Setaria italica 477 10214 2942 2145SorghumBicolorCyc 1.0 Sorghum bicolor 480 8630 2939 2141

SwitchgrassCyc 1.0 Panicum virgatum 479 17319 2985 2184MossCyc 2.0 Physcomitrella patens 416 7805 2713 1901

SelaginellaCyc 2.0 Selaginella moellendorffii 421 6462 2737 1987ChlamyCyc 3.5 Chlamydomonas reinhardtii 349 3330 2263 1514

1050pathways

476pathways(average)

Page 8: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

PlantCyc provides access to numerous specialized metabolic pathways

Many PlantCyc-specific metabolic pathways produce or break-down specialized ("secondary") metabolites

Many of the enzymes in these pathways have experimental evidence

Caffeine biosynthesis I

(Caffea arabica)Morphine biosynthesis(Papaver somniferum)

Taxol biosynthesis(cancer drug)

(Taxus brevifolia)

Raspberry ketone biosynthesis

(Rubus idaeus)

Vicianin bioactivation (defense compound)

(Vicia sativa)

Alliin degradation (garlic odor)

(Allium sativum)

Page 9: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

The PMN developed a new method to predict enzyme functions

A high-confidence Reference Protein Sequence Database was built RPSD 2.0 contains 34,269 enzymes and 82,216 non-enzymes

The Ensemble Enzyme Prediction Pipeline (E2P2) uses the RPSD to predict functions based on protein sequence

It assigns protein functions using standard EC (Enzyme Commission) numbers AND using specific reaction IDs

Many reactions are not associated with full EC numbers but are associated with specific reaction IDs

E2P2 can predict a higher number of enzyme functions

E2P2 may help to fine-tune predictions among large enzyme families

Page 10: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Evidence codes enhance transparency

Evidence codes clearly define when enzymes are assigned to reactions based on computational or experimental evidence

Users can check references for computational prediction methods and original experiments in journal articles

Page 11: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

A simple search connects you to a wealth of information

AT3G16785

Page 12: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Quickly move between genes, enzymes, reactions, compounds, and pathways

Page 13: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Pathways may look deceptively simple at first glance

Page 14: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

PathwayEnzyme

Gene

Compound

Evidence Codes

Regulation

Upstreampathway

The PMN provides valuable pathway context information

Reaction

Page 15: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Curated summaries provide background information, cite papers, and link to other PMN pages

Page 16: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Different search tools help meet the needs of many usersfrom high school biology students to bioinformaticians

Quick Search

Detailed search menus for for Compounds, Pathways, and Enzymes

Advanced Search (queryable with BioVelo language)

BLAST against PMN-specific enzyme sets

Comparative analysis

JavaCyc and PerlCyc

Complete downloadable files - Biopax and other file formats

Groups - for custom dataset building

Page 17: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

"Groups" facilitate data retrieval, enrichment, and sharing

Page 18: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Transform your search results into a rich data set

Page 19: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Work with groups

Page 20: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Different properties and transformations are available for enzymes, reactions, pathways, and compounds

Page 21: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Groups can be downloaded and shared

Page 22: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Omics datasets can be displayed on a metabolic map

Page 23: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

A simple tab-delimited text file should IDs/names and data value(s)

ID(Gene ID)

(enzyme name)(compound name)

etc.

Numerical data:(Relative values)(Absolute levels)

Page 24: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Results can be examined at different levels andconnect to relevant pages

Page 25: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Additional analysis tools and resources are available

Page 26: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

The PMN actively interacts with its users We answer questions!

[email protected]

We make suggested corrections!

We make public presentations and meet with individuals Booth 200 Poster 07029 Email

We ask for your help

We thank you publicly

CHOCOLATE !!(Theobroma cacao)

Page 27: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

The PMN is working on additional enhancements

Continue to improve our enzyme annotation pipeline

Increase coverage of secondary metabolism

Add transcription factors that affect enzyme expression

Predict plant metabolic pathways for additional species

Page 28: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Plant metabolic NETWORKING Please use our data

Please use our tools

Please help us to improve our databases!

Please contact us if we can be of any help!

[email protected]

www.plantcyc.org

Page 29: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

PMN AcknowledgementsCurator:

- kate dreher

Post-docs:

- Lee Chae

- Ricardo Nilo Poyanco

- Chuan Wang

Interns- Caryn Johansen

Tech Team Members:- Bob Muller (Manager)- Larry Ploetz (Sys. Admin)- Shanker Singh- Bill Nelson

Rhee Lab Members:- Flavia Bossi- Jue Fan- Hye-in Nam- Taehyong Kim- Meng Xu

Peifen Zhang (Director and curator)

Sue Rhee (PI)

Collaborators:

SRI

- Peter Karp

- Ron Caspi

- Hartmut Foerster

- Suzanne Paley

- SRI Tech Team

MaizeGDB

- Mary Schaeffer

- Lisa Harper

- Jack Gardiner

- Taner Sen

ChlamyCyc- Patrick May

- Dirk Walther

- Lukas Mueller (SGN)

- Gramene and MedicCycDOE

PMN Alumni:

- A. S. Karthikeyan (curator)

- Christophe Tissier (curator)

- Hartmut Foerster (curator)

- Eva Huala (co-PI)

- Tam Tran (intern)

- Varun Dwaraka (intern)

- Damian Priamurskiy (intern)

- Ricardo Leitao (intern)

- Michael Ahn (intern)

- Purva Karia (intern)

- Anuradha Pujar (SGN curator)

Tech Team Alumni

- Anjo Chi

- Cynthia Lee

- Tom Meyer

- Vanessa Kirkup

- Chris Wilks

- Raymond Chetty

Page 30: Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data

Please use our data

Please use our tools

[email protected]

www.plantcyc.org

Sue Rhee (PI)

Peifen Zhang (Director)

We're here to help . . .

Booth 200

Poster 7029

Thank you!!