databases for systems biology herbert m sauro keck graduate institute claremont, ca, 91711

63
Databases for Systems Biology Herbert M Sauro Keck Graduate Institute Claremont, CA, 91711

Upload: morgan-francis

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Databases for Systems Biology

Herbert M Sauro

Keck Graduate InstituteClaremont, CA, 91711

Systems Biology

Systems Biology•Computational Systems Biology Group (Peter Spirtes) in Pittsburgh, Pennsylvania •Biochemical Networks Modeling Group (Pedro Mendes) at the Virginia Bioinformatics Institute Computational Systems Biology Group (Reinhard Laubenbacher) at the Virginia Bioinformatics Institute •Evolution of Molecular Networks group (Andreas Wagner) at the University of New Mexico •Systems biology group (Trey Idekeker) at the Whitehead Institute for Biomedical Research, Cambridge (USA) •Computational Cell Biology (Dennis Bray) at the University of Cambridge (UK) STRC Biocomputation Group (Hamid Bolouri) at the University of Hertfordshire •Computational Molecular Biology (Ron Shamir) at the University of Tel Aviv •Complex Systems Division (Carsten Peterson) at the University of Lund •Design Principles of Protein Networks (Uri Alon) at the Weizmann Institute •Design Principles of Protein Networks (Naama Barkai) at the Weizmann Institute •Probabilistic Graphical Models (Daphne Koller) at the University of Stanford •Molecular Biology and Probabilistic Models (Nir Friedman) at the Hewbrew University of Jerusalem •Systems Optimization Group (Eckart Zitzler) at the ETH Zürich •Protein Interaction Group (Benno Schwikowski) at the Systems Biology Institute, Seattle •Systems Biology Center at TU Delft •Integrative Systems Biology at TU Denmark •U Ghent •Institute for Advanced Study, Center for Systems Biology •Ron Weiss group, Princeton University •BII Systems Biology Group (Singapore) •UC San Francisco BioSystems Group •Kitano Systems Biology Group •Davidson Lab at Caltech •Bioinformatics & Systems Biology Group at the Burnham Institute (La Jolla) •Virtual Cell Project, U Connecticut •UC Santa Barbara IGERT Program on Systems Biology •UC San Diego Bioinformatics & Systems Biology Groups •UC San Diego Systems Biodynamics Group •Integrated Systems Biology Group at Rensselaer Polytechnic Institute

Groups World-Wide

Systems Biology

•BioSPI Project at Weizmann •BioSPICE •BioMaps Institute at Rutgers: •Institute for Systems Biology, Seattle •Bauer Center for Genomics Research (CGR) at Harvard University •Systems Biology Department at Harvard Medical School •Computational and Systems Biology Initiative at MIT •Bio-X at Stanford University •Center for Studies in Physics and Biology at The Rockefeller University •GENSCEND Initiative of the Wellcome Trust •"Genomes to Life program" (a funding initiative of the DOE) •"Cell Systems Initiative" (an initiative of the University of Washington) •"Systems of Life - System Biology" (a funding initiative of the German Ministry of Education and Research, BMBF) •SFB 618 (funded by the German Research Council DFG) •STAGSIM - Systems Biology (An Expression of Interest (EoI) submitted to the EU Framework Program VI) •Systems Biology in Sweden •Institute for Computational Biomedicine at the Weill Medical College of Cornell University. •Pathways/Systems Biology Working Group at I3C.

Institutes and Larger Initiatives

Though coined 40 years ago,1 a lot of people still ask, "What's that?" when the term systems biology comes up. "It is used in so many different contexts, nobody is really clear what you mean by it," says John Yates III, a professor at the Scripps Research Institute in La Jolla, Calif. He's not the only one stumped by the term's meaning. David Placek, president of Sausalito, Calif.-based Lexicon Branding, a company that cooks up names for pharmaceutical products such as Velcade and Meridia, says he's not so hot on the moniker. "Systems biology is just so general that it could apply to many things. When you're naming a category, the underlying principle is that if you make a statement like, 'I'm doing systems biology,' do people know what you're talking about?'“……

Systems Biology Has its Backers and Attackers

Revolution or buzzword du jour, pundits ponder a pervasive term | By Mignon Fogarty

Volume 17 | Issue 19 | 27 Oct. 6, 2003, The Scientist

Systems Biology?

High-throughput Data?

Systems Biology?                                                               

PathDB

Databases?

Understanding the principles of how physiological/phenotypic characteristics emerge from the properties of the components.

Predicting how these characteristics will change in response to alterations in the environment or system components.

What is Systems Biology?

What are we dealing with?

What are we dealing with?

Mirit Aladjem et al., Stke, March 2004

Successful Models

Barbara Bakker, Westerhoff and Cornish-Bowden

Trypanosoma Brucei

Bas Teusink

Yeast Glycolysis

Frances Brightman et al

EGF Signaling Pathway

Red Blood Cell

Mulquiney, Joshi, Heinrich, …

Poolman and Fell

Calvin Cycle Yeast Cell Cycle

John Tyson et al

Chemotaxis, ecoli

Many Contributors

Level of Complexity

Molecule# Molecules per cell # of Types

Protein 2,360,000 1000-2000RNA 270,000 5Small Molecules millions 500Ions millions 20-30

http://biosci191.bsd.uchicago.edu/L02/ecoli.htmhttp://opbs.okstate.edu/5753/Composition%20table.html

E. coli composition

Man-made Complex Devices

Intel Pentium 4

42 million transistors

Man-made Complex Devices

• The AMD Opteron• 105.9 million transistors• Number of gates > 54 Million

Man-made Complex Devices

• The Intel Itanium 2• 410 million transistors• Number of gates > 100 Million

Man-made Complex Devices

• The Intel Itanium 2• 410 million transistors• Number of gates > 100 Million

By 2007 both Intel and AMD are predicting dies with 1 billion transistors

Man-made Complex Devices

• The Intel Itanium 2• 410 million transistors• Number of gates > 100 Million

By 2007 both Intel and AMD are predicting dies with 1 billion transistors

Many of the new graphics chips have over 60 million transistors

AMD are working towards 45-nanometer transistors by 2007. The sizes of proteins vary from 2nm to 20 nm.

Man-made Complex Devices

Probably by 2010, man-madedevices will have comparable complexity to bacterial cells if not greater.

Cellular Models

Building computational models of cells seems more and more like a viable project.

Such a project would bring a much clearer understanding of how cellular systems are controlled and ultimately it should bring unprecedented predictive power.

Are Biologists Ready?

Xo and X1 fixed,

all reactions reversible, assume stable steady state.

Xo S1 S2 X1S3 S4 S5 S6v

Are Biologists Ready?

What happens to the steady state?

Xo S1 S2 X1S3 S4 S5 S6v

Xo and X1 fixed,

all reactions reversible, assume stable steady state.

50 %

Are Biologists Ready?

Xo S1 S2 X1S3 S4 S5 S6

Students reply:

1. Nothing happens.

2. Nothing happens unless it is the rate-limiting step.

3. The rate v goes down, but that’s all.

4. S3 goes up.

5. S4 goes down.

6. Species downstream of v go up.

7. Steady State flow changes but species levels don’t.

8. Xo and X1 change

v

50 %

Are Biologists Ready?

Xo S1 S2 X1S3 S4 S5 S6

If we can’t understand this system how can we hope to understand:v

50 %

Functional Motif Identification

http://bms-mudshark.brookes.ac.uk/frances/fabweb5.htm

29 species

Computer simulation of EGF signal transduction PC12 cells.

Frances Brightman, Simon Thomas and David Fell

Functional Motif Identification

Computer simulation of EGF signal transduction PC12 cells.

Frances Brightman, Simon Thomas and David Fell

http://bms-mudshark.brookes.ac.uk/frances/fabweb5.htm

Functional Motif Identification

27 components

Functional Motif Identification

Functional Motif Identification

Amplifier

Res

onan

ce D

ete

ctor

Dem

odu

lato

r

Functional Motif Identification

Pow

er A

mpl

ifier

Pre

-Am

plifi

er

Am

plifi

erFeedback

Fee

dbac

k

Filt

er

Carrier Filter

Audio FilterRectifier

Amplifier

Demodulator

How Intel Engineers Cope

Complex man-made devices are modeled and designed on multiple levels, each level may use different modelingtechniques:

Transistor Characteristics

Basic Logic Gates

Small Gate Modules

Hierarchy of functional modules

Top Level Module

How Intel Engineers Cope

Complex man-made devices are modeled and designed on multiple levels, each level may use different modelingtechniques:

Transistor Characteristics

Basic Logic Gates

Small Gate Modules

Hierarchy of functional modules

Top Level Module

Fundamental Protein Chemistry

Basic Enzyme Rate Characteristics

Small Enzyme Motifs

Hierarchy of functional modules

Top Level Module

Functional Motif Identification

Negative Feedback in the MAPK Pathway

yi

yo

A k

At high amplifier gain (A k > 1):

Functional Motif Identification

Negative Feedback in the MAPK Pathway

At high amplifier gain (A k > 1):

Linearization of the amplifier response.

Without Feedback With Feedback

Functional Motif Identification

E. coli Chemotaxis

Signaling network reset

Tumble

Run

Motor

Software

Tools and Resources:

Software Infrastructure

Interchange Formats

Analysis Algorithms

Model Editors

Visualization

Model Databases

Theoretical Foundation

Databases for Systems Biology

• Kinetic Data

• Network Information

Systems Biology Models

Systems Biology Models

Simple first-order reaction kinetics

Power Law

Systems Biology Models

Simple irreversible Michaelis-Menten

Systems Biology Models

Reversible Michaelis-Menten

Systems Biology Models

Irreversible Allosteric Mechanism

Databases for Systems Biology

The oldest known metabolic pathway is Yeast Glycolysis

http://www.utc.edu/Faculty/Becky-Bell/210-outline05.html

http://www.utoronto.ca/greenblattlab/yeast.htm

Databases for Systems Biology

Hexokinase 2.7.1.1

Databases for Systems Biology

Hexokinase 2.7.1.1

Glucose + ATP = G6P + ADP

Km None available

Specific Activity: 512 M/min/mg

Databases for Systems Biology

Phosphofructokinase 2.7.1.11

Databases for Systems Biology

Phosphofructokinase 2.7.1.11

ATP + F6P = ADP + FBP

Km None available

Specific Activity: 180 M/min/mg148 M/min/mg114 M/min/mg

Databases for Systems Biology

Pyruvate Kinase 2.7.1.40

Databases for Systems Biology

Pyruvate Kinase 2.7.1.40

PEP + ADP = Pyruvate + ATP

Km ADP : 0.16 mM (+ FBP)

Specific Activity: None available

Databases for Systems Biology

1. Kinetic equations

2. Values for kinetic constants plus standard errors

3. Conditions under which enzyme was characterized

Networks

Network information is mainlyInaccessible in convenientformats, much work has to bedone by the user to extract the desired information.without much work.

The need for a model or network exchange format.

Networks

There is also the need for a network visualization standard.

Mirit I. Aladjem and Kurt Kohn

DCL: Gene Network Sciences

Model Databases

SBW

Desktop

Client

Web Services

=> translatorMatlab, XPP, FORTRANBerkeley Madonna, SBML, CellML, C, Java, Mathematica, etc…..

Database

=> SBML/SQL translator

Client

Other Systems eg BioSPICE

Peer Reviewed

Version ControllerScratchpad

Model Databases

• BIOSSIM (1968)

• ESSYN (1976)

• SCAMP (1983)

• SCOP (1986)

• METAMOD (1986)

• SIMFIT (1990)

• METAMODEL (1991)

• METASIM (1992)

• KINSIM (1993)

• GEPASI (1994)

• METALGEN (1994 ?)

• MIST (1995)

• METABOLIKA (1997 ?)

• METAFLUX (1997)

• SIMFLUX (1997)

• MNA (1998)

• CELLMOD (1998)

• FLUXMAP (1999)

• METATOOL (1999)

• VCELL (1999)

Modelling Tools

65-69 70-74 75-79 80-84 85-89 90-94 95-99

1

3

5

7

9

Period

Klaus Mauch, University of Stuttgart

SBML – Systems Biology Markup Language

The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable

to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas

in systems biology.

The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable

to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas

in systems biology.

The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable

to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas

in systems biology.

The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable

to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas

in systems biology.

The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. SBML is applicable to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas in systems biology.

Originally developed Hamid Bolouri, Andrew Finney, Mike Huck and Herbert Sauro

Tool 1

Tool 2

Tool 2

SBML – Systems Biology Markup Language

XML based Standard

• Simple Compartments (well stirred reactor)

• Internal/External Species

• Reaction Schemes

• Global Parameters

• Arbitrary Rate Laws

• DAEs (ODE + Algebraic functions, Constraints)

• Physical Units/Model Notes

• Annotation – extension capability

SBML – Systems Biology Markup Language

What is XML?

<?xml version="1.0" ?>

<note> <to> Hobbit </to> <from> Orc </from> <heading> Note to Frodo </heading> <body> I want to eat you </body> </note>

SBML – Systems Biology Markup Language

XML has a hierarchical structure

<root> <child> <subchild>.....</subchild> </child></root>

Each node can also have optional attributes, eg <child name = “john”>

SBML – Example

<?xml version="1.0" encoding="UTF-8"?><!-- Created by XMLPrettyPrinter on 11/14/2002 --><sbml level = "1" version = "1" xmlns = "http://www.sbml.org/sbml/level1"> <!-- --> <!-- Model Starts Here --> <!-- --> <model name = "untitled"> <listOfCompartments> <compartment name = "uVol" volume = "1"/> </listOfCompartments> <listOfSpecies> <specie boundaryCondition = "false" compartment = "uVol" initialAmount = "0" name ="Node0"/> <specie boundaryCondition = "false" compartment = "uVol" initialAmount = "0" name = "Node1"/> <specie boundaryCondition = "false" compartment = "uVol" initialAmount = "0" name = "Node2"/> </listOfSpecies>

<listOfReactions>

<reaction name = "J0" reversible = "false"> <listOfReactants> <specieReference specie = "Node0" stoichiometry = "1"/> </listOfReactants> <listOfProducts> <specieReference specie = "Node1" stoichiometry = "1"/> </listOfProducts> <kineticLaw formula = "v"> </kineticLaw> </reaction>

<reaction name = "J1" reversible = "false"> <listOfReactants> <specieReference specie = "Node1" stoichiometry = "1"/> </listOfReactants> <listOfProducts> <specieReference specie = "Node2" stoichiometry = "1"/> </listOfProducts> <kineticLaw formula = "v"> </kineticLaw> </reaction> </listOfReactions> </model>

</sbml>

Other Related Efforts - CellML

CellML is a more comprehensive attempt at developing an

exchange standard, also defined in terms of XML.

However, it is much more complex and the designers of

CellML have not provided software support in the form of tools

and software libraries.

Data Formats

One other area which is even more difficult to resolve is

experimental data formats, microarray, proteomic, metabolmic,

basically all the omics.

Two projects are attempting to put some order in the data

format area, bioSPICE and particularly the DOE GTL project.

The Future

There is obviously a long long way to go.

Kinetic data must be more carefully curated.

Standards for exchanging data, models, including visualization

notations need to be developed further.

What are we dealing with?

Reaction systems working on multiple time scales:

1. Discrete deterministic events

2. Fast reactions

3. Continuous variables (modeled by ODES)

4. Continuous variables with additive and multiplicative noise

5. Stochastic discrete systems (Gillespie type)

Functional Motif Identification