1
Developing Standards:Case Studies
Herbert M Sauro
www.sys-bio.orgwww.sbml.orgwww.sbolstandards.orgblog.analogmachine.org
Dept. of BioengineeringUniversity of Washington, Seattle, WA
2
Importance of Standards
Imagine a world where:
Each company made its own incompatible nut, bold and screw?
Every town had its own way to measure time.
Every internet provider used different protocols for the ‘TCP/IP’ stack, email, web etc.
and so on
Standards are vital for the normal functioning of society
3
At least two ways to start a standard:
1. Top-down: institutionalized stick and carrot
2. Grass Roots
4
Two Examples
SBML: Systems Biology Markup Language
SBOL: Synthetic Biology Open Language
5
Simulation of Computational Models
Simulation
6
Why? Study Perturbations
Change the activity of aProtein, e.g. P53 by adding an inhibitor
What effect does this have onCell death and/or proliferation?
Apoptosis
http://www.sapphirebioscience.com
There may be multiple paths or multiple effects
7
How it started:SCAMP and Gepasi: 80/90s
SCAMP
X
8
Exchange of Computational Models
In 1999/2000 a project was started at Caltechwith initial funding from Japan to devise an interchange language:
SBML: Systems Biology Markup Language
9
SBMLSBML: Systems biology Markup Language
Used to represent homogenous multi-compartmental Biochemical Systems
10
SBML in a Nutshell“Systems Biology Markup Language”
• A machine-readable format for representing computational models in systems biology
• Domain: systems of biochemical reactions• Specified using XML• Components in SBML reflect the natural
conceptual constructs of the domain• Now over 200 tools use SBML
11
SBML in a Nutshell“Systems Biology Markup Language”
• Simple Compartments (well stirred reactor)
• Internal/External Species
• Reaction Schemes
• Global Parameters
• Arbitrary Rate Laws
• DAEs (ODE + Algebraic functions, Constraints)
• Physical Units/Model Notes
• Annotation – extension capability
• Events
12
SBML – Systems Biology Markup Language
13
Model Exchange Standards: SBML, CellML
SBML is primarily a way to describe the biology of cellular networks from which the mathematical models can be automatically derived.
CellML is a math based description from which the underling biological can be inferred.
14
There many modeling software tools that use SBML
www.sbml.org
15
SBML Ecosystem
SBML
Databases
Unambiguous Model
Exchange
Semantic Annotations
Simulator Comparison and
ComplianceJournals
Diagrams
SEDML: Simulation Experiment Description LanguageSBGN : Systems Biology Graphical Notation
16
Model repositories
BioModels.net
As of Sep 2011:
366 curated models
398 uncurated models.
http://www.ebi.ac.uk/biomodels/
Nicolas Le Novere
17
MIRIAM: Minimum Information Requested in the Annotation of biochemical Models
MIRIAM is not a file format but a minimum specification on how a model should be made available to the community: Reference correspondence – encoding a model in a recognized public standardized machine-readable format.
Attribution annotation - A model has to provide thecitation of the reference description, lists its creators, and be attached to some terms of distribution.
External resource annotation - each component of a model must be annotated to allow its unambiguous identification.
18
Semantic Annotations
1. SBO: Systems Biology Ontology (Quantitative terms)
2. MIASE: The Minimum Information About a Simulation Experiment
3. TEDDY: The Terminology for the Description of Dynamics
4. KiSAO: Simulation Algorithm Ontology
5. Missing: An audit trail of a modeling process.
19
SBO: Systems Biology Ontology
1. [Term] id: SBO:0000002 name: quantitative parameter def: "A number representing a quantity that defines certain characteristics of systems or functions. A parameter may be part of a calculation, but its value is not determined by the form of the equation itself, and may be arbitrarily assigned." [] relationship: part of SBO:0000000 ! Systems Biology Ontology
2. [Term] id: SBO:0000012 name: mass action kinetics def: "The Law of Mass Action, first expressed by Waage and Guldberg in 1864 (Waage, P., Guldberg, C. M. Forhandlinger: Videnskabs-Selskabet i Christiana 1864, 35) states that…..." [] is a: SBO:0000001 ! rate law.
Terms can be queried programmatically via a web service
20
Systems Biology Ontology in SBML<reaction sboTerm="SBO:0000062"> <listOfReactants>
<speciesReference species="S" sboTerm="SBO:0000015" /> </listOfReactants> <listOfProducts>
<speciesReference species="P" sboTerm="SBO:0000011" /> </listOfProducts> <listOfModifiers>
<speciesReference species="E" sboTerm="SBO:0000014" /> </listOfModifiers>
<kineticLaw sboTerm="SBO:0000031"> <listOfParameters>
<parameter id="Km" sboTerm="SBO:0000027" /> <parameter id="kp" sboTerm="SBO:0000025" /> </listOfParameters> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <divide/> <apply>
<times /> <ci>E</ci> <ci>kp</ci>
<ci>S</ci> </apply>
<apply> <plus /> <ci>Km</ci> <ci>S</ci></apply> </apply> </math> </kineticLaw> </reaction>
continuous framework
substrate
product
enzyme
Michaelis constant
catalytic rate constant
Briggs-Haldane equation
European Bioinformatics Institute
Application: Simulator ComplianceSBML Compliance
BioUMLCOPASI
JarnacJsim
MathSBMLOscill8
roadRunnerSBML ode Solver
SBToolBox2VCell
-10.00 10.00 30.00 50.00 70.00 90.00 110.00 130.00 150.00
# Simulation Results returned for 150 models
21
The Results
22
0% to 20% 20% to 40% 40% to 60% 60% to 80% 80% to 99% 100%0
10
20
30
40
50
60
70
80
% Agreement of Simulation Results
Num
ber o
f Mod
els
23
Other Proposed Standards
Standardizing the diagrammatic notationhttp://www.sbgn.org/Main_Page
24
What we all learned
25
Fact:
Developing a standard has both technical as well sociological challenges.
The sociological challenges may be greater, :(
26
Rule #1:
There must be a problem (i.e an actual need) that a particular community wants to solve.
• Clear scope• Covers what is needed• Doesn’t force you to deal with things that are not needed
27
Rule #2:
Building a community from day one isof the utmost importance.
• Build Trust• Build Consensus• Build Enthusiasm• Build Ownership
28
Rule #3:
For a standard to succeed, the central playersmust provide tools and documentation to helpthe community use the standard.
• Easy to implement• Low ‘buy in’ cost
29
Rule #4:
The process is long and drawn out, far beyond the normal patience of review panels andfunding agencies.
30
SummaryInitial cost for the SBML development:
Initial version was funded by JST (roughly 250K direct per year for three years). Could probably get by with 150K direct. This funds a core team which is involved in:
1. Documentation2. Organizing two workshops per year3. Developing the initial source libraries4. Develop a governance model5. Follow discussions on mailing lists/workshops to address the needs of the community6. Maintain civility during discussions !
31
Centralized development of supporting software libraries:
1) Prevented the standard from diverging
2) As extensions or modifications were agreed to by the community it was relatively easy for platform developers to incorporate the changes into their software.
3) Software developed in C/C++ to make the library cross-language (Java came later).
32
Current work of my group: Model Reproducibility
SBML
SEDML
Simulation Tool
Biology
Data
Data SEDML: What you didwith the model
33
Synthetic Biology
34
Synthetic biology
“The design and construction of new biological entities such as enzymes, genetic circuits, cells, and organs or the redesign of existing biological systems.”
Drew Endy (Stanford)
35
The Immediate NeedTake any current publication on a synthetic circuit and try to reproduce it, let me know how you get on.
36
Specification
DesignBuild
Testing/Analysis
GFP
(RFU
)
time
The long term vision: Design, Build, Test
37
Synthetic Biology Open Language (SBOL) – SBOL Semantic
semantic
Sequence Annotation
1-80
Terminator
81-88
BioBrick Scar
BioBrick Scar
89-129
Terminator
B0010 B0012
DNAComp-onent
B0015
Synthetic Biologist A
Synthetic Biologist B
Fabricate
Engineer
SBOL
visualDNA Components
New device
describe and send
38
Some History
The synthetic biology standardization effort was started with a grant from Microsoft in 2008 (100K). The first meeting was held in Seattle.
The first draft proposal was called PoBoL but has since beenrenamed to SBOL – Systems Biology Open Language
Since then we have (somehow) managed to organize two meetings a year since 2008, next one in Jan 2012 in Seattle.
39
Overall Aim of the Standardization Effort
To support the synthetic biology workflow:
1. Laboratory parts management2. Simulation/Analysis3. Design4. Codon optimization5. Assembly6. Repositories - preferably distributed
40
Overall Aim of the Standardization Effort
Specifically:
• To allow researches to electronically exchange designs with round-tripping.
• To send designs to bio-fabrication centers for assembly.
• To allow storage of designs in repositories and for publication purposes.
41
Synthetic Biology
Synthetic Biology is Engineering, i.e it is not biology*
Design Build Test
* Beware of sending synthetic biology grant proposals to a biology panel
42
Synthetic Biology
Synthetic Biology is Engineering, i.e it is not biology*
Design Build Test
Debugging
Verification
* Beware of sending synthetic biology grant proposals to a biology panel
43
Synthetic Biology
Synthetic Biology is Engineering, i.e it is not biology*
Design Build Test
Debugging
Verification
* Beware of sending synthetic biology grant proposals to a biology panel
44
A Real Network (E. coli)
0
0.2
0.4
0.6
0.8
1
1.2
0.001 0.01 0.1 1 10 100 1000
IPTG (mM)
Rel
ativ
e F
luo
resc
ence
Increased Repression
0
0.2
0.4
0.6
0.8
1
1.2
0.001 0.01 0.1 1 10
p1
p3
SimulationIncreased Repression
Entus et al, Systems and Synthetic Biology, 2007.
Host Context
Experimental Data
Design/Construction
http://www.agricorner.com/e-coli-outbreak-german-farm-in-uelzen-likely-source/
45
Synthetic NetworksConcentration Detector
Generic Design:
0
0.2
0.4
0.6
0.8
1
1.2
0.001 0.01 0.1 1 10
p1
p3
If we control the level of feed-forward Inhibition we can tune the circuit:
46
Synthetic Networks
Input: IPTG
Output: GFP
0
0.2
0.4
0.6
0.8
1
1.2
0.001 0.01 0.1 1 10 100 1000
IPTG (mM)
Re
lati
ve
Flu
ore
sc
en
ce
Concentration Detector
Generic Design:
47
CAD Software- Engineering Cycle
0
0.2
0.4
0.6
0.8
1
1.2
0.001 0.01 0.1 1 10
p1
p3
Simulation
Design
Fabrication
0
0.2
0.4
0.6
0.8
1
1.2
0.001 0.01 0.1 1 10 100 1000
IPTG (mM)
Re
lati
ve
Flu
ore
sc
en
ce
0
0.2
0.4
0.6
0.8
1
0.001 0.01 0.1 1 10 100 1000
IPTG (mM)
Flu
ore
sc
en
ce
Testing
48
Computational tools and information resources support each step
TinkerCell CAD
ApE Sequence Editor
Laboratory Information
Specification
DesignBuild
Analysis
Clotho
BIOFAB
GDice
iBioSim
Public Data
GenoCAD
49
Registry of Standard Biological Parts (BioBricks)
Endy D, 2005. Nature 438: 449-453
http://parts.mit.edu
Provides free access to an open commons of basic biological functions that can be used to program synthetic biological systems
Anybody may contribute, draw upon, or improve the parts maintained within the Registry.
50
Sequ
ence
Ann
otati
on
type
Sequence Feature
B0015
type
annotation
1-80
feature
Terminator
81-88
BioBrick Scar
feature
BioBrick Scar
89-129
feature
Terminator
annotation
annotation
type type
subClassOf
subClassOf
subClassOf
B0010 B0012
SBOL is extensible, allows us to form community subgroups
Experimental Measurements
Computational Models
Sample
Cell
SS002
pUW4510
MG1655
type
cell
dna
UW002strain
type
DNAPlasmidsubClassOf
Core SBOL
Physical and Host Context
Assembly Methods
Visualization
51
TinkerCell: Project to explore the potential of computer aided design in synthetic biology
First prototype called Athena developedby Bergmann and Chandran
52
Layered Architecture: Based on C++/Qt
Octave,
53
Each component in the TinkerCell diagram is associated with one or more tables
54
A TinkerCell model can be composed of sub-models
55
A TinkerCell model can be composed of sub-models
?
?
?
? ?
?
56
Availabilitywww.tinkercell.com (Windows, Mac and Linux, released under BSD)Contact author for details ([email protected])
57
Challenges in building SBOL
• Gaining consensus in a growing community – Identifying and engaging stakeholders
• Fast pace of in the field– Terminology evolution
• “BioBricks” “Parts” “DNA components”
– Stability of use cases• “Standard” and “Research needs” seem contradictory
– Software for synthetic biology is new
• Scarcity of data sources – Quality “knowledge” about elements– Heterogeneity of existing annotations
• Funding
58
Who is the we?
Boston UniversityDouglas Densmore
University of UtahBarry Moore Nicholas RoehnerChris J. Myers
BIOFABCesar Rodriguez Akshay Maheshwari (now UCSD)Drew Endy (Stanford)
Imperial College of LondonGuy-Bart Stan
Virginia Bioinformatics InstituteLaura AdamMatthew LuxMandy WilsonJean Peccoud
University of Washington Deepak ChandranJohn GennariMichal Galdzicki Herbert Sauro
University of California, BerkeleyJ. Christopher Anderson
University of TorontoRaik Gruenberg
Joint BioEnergy InstituteTimothy Ham
Recent Commercial InterestBBN, DNA 2.0, AgilentLife Technologies, AutoDesk
http://www.sbolstandard.org/
iBioSim
Newcastle University (UK)Aniel
59
Acknowledgements: The People and the Support
Hamid BolouriAndrew FinneyMike HuckaHerbert Sauro
Funding in chronological order(2000 -> 2011):
Frank BergmannDeepak ChandranVijay ChickarmaneMichal GaldzickiLucian Smith
……
60
Textbook Enzyme Kinetics for Systems Biology
• Available as e-book or paperback on www.analogmachine.org & • 318 pages, 94 illustrations and 75 exercises• E-book - $9.95• Paperback - $39.95• Author: H M Sauro