in silico systems biology:network reconstruction, analysis and network based modelling
DESCRIPTION
In silico systems biology:network reconstruction, analysis and network based modelling. EMBO practical course 10-13 April 2010, Hinxton, UK. Integration of genomic data with biological networks state of the art and future challenges. Laura I. Furlong - PowerPoint PPT PresentationTRANSCRIPT
In silico systems biology:network reconstruction, analysis and
network based modellingEMBO practical course
10-13 April 2010, Hinxton, UK
Integration of genomic data with biological
networksstate of the art and future
challengesLaura I. Furlong
Integrative Biomedical Informatics Group, Research
Unit on Biomedical Informatics (GRIB)
SNP
Phenotypic effect
Disease association
Functional effect(e.g. loss of binding
site)
Network modelling
Bauer-Mehren A, Furlong L, Rautschka M, Sanz F: From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways. BMC Bioinformatics 2009, 10(Suppl 8):S6.
Network visualization
Integration of SNPs and their effects with networks
Prediction of pathogenic effect of mutations and SNPs
Prediction of pathogenic effect of mutations and SNPs
EntrezGene
dbSNP
Cytoscape node attribute file
MySQL DB
SNP, mutagenesis information•Association to disease•Functional effect
Mapping to dbSNP
Mapping to NCBI Gene
Identification of GO concepts
A data integration approach
Biological network data
• More than 200 pathway repositories and over 60 specialized on reactions in human
• More than 200 curated models
Manually curated information on nsSNPs, mutations
•Association to disease•Results from mutagenesis experiments
Broad collection of SNPs and short range sequence variantsdbSNP
Sequence variation data
Visualization
28/64
S->A mutation at position 218 leads to protein inactivation
Modelling the impact of sequence variation
Birtwistle MR, Hatakeyama M, Yumoto N, Ogunnaike BA, Hoek JB, Kholodenko BN (2007) Ligand-dependent responses of the ErbB signaling network: experimental and modeling analyses. Mol Syst Biol 3: 144.
Modelling the impact of sequence variation
Concerning sequence variations
•Too few have been functionally characterized
•Synonymous (“silent”) mutations can also alter function, e.g. through modulation of splicing or altering protein folding
•Need of tools for prediction of the impact of coding and non coding SNPs on gene/protein function (and even on biological process)
Challenges
The IntAct project
1. Define a standard for the representation and annotation of molecular interaction data
2. provide a public repository
3. populate the repository with experimental data from project partners and curated literature data
4. provide modular analysis tools
5. provide portable versions of the software to allow installation of local IntAct nodes.
IntAct goals & achievements
- Curation manual available from home page- Member of the International Molecular interaction Exchange consortium (IMEx)
http://www.ebi.ac.uk/intactftp://ftp.ebi.ac.uk/pub/databases/intact
4200+ distinct publications, 209,000+ binary interactions, 63,000+ proteins imported from UniProt
Known installation: AstraZeneca, GSK, MERCK, MINT, Proteome Center of Shanghai
search & advanced search, hierarchView, pay-as-you-go, MiNe…
Master headline
“Lifecycle of an Interaction”
Publication(full text)
CVs Curation manual
.abstract annotate
p1
p2I
exp
curator Super curator
che
ck IMEx
MatrixDB Mint DIP
rejectPublic web site
FTP siteaccept
Sanity Checks(nightly)
report report
IntAct Curation
Public data
• All data is manually curated by expert curators
• Curation manual rigorously followed
• All curated data is reviewed by a senior curator
• All data is made available on FTP site:
(!) data updated every week
(!) format available:
ftp://ftp.ebi.ac.uk/pub/databases/intactData
Controlled vocabularies• Why do we use them ?
e.g. far too many ways to write: yeast two hybrid, Y2H, 2H, two-hybrid, …
• Full integration of PSI-MI ontology
• Over 1,500 terms, fully defined and cross-referenced
How to deal with Complexes
• Some experimental protocol do generate complex data:Eg. Tandem affinity purification (TAP)
• One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:
Both are somewhat wrong, spoke is said to generated 3 times less false positive (Bader et al.).
The IntAct web site
htt
p:/
/ww
w.e
bi.
ac.u
k/i
nta
ct
htt
p:/
/ww
w.e
bi.
ac.u
k/i
nta
ct
IntAct: Home page
UniProt Taxonomy PubMed Method (PSI-MI CV)Interaction details
Complex ?Interactors
IntAct: Search and results
IMEx dataOther PSICQUIC services
IntAct: Search and results
ExportCustom columns
Filters
IntAct: Browse
IntAct: Advanced search: Ontologies
IntAct > Advanced search: Fields
Filtering options
Add more filtering options
IntAct > Advanced search: MIQL
• Molecular Interaction Query Language
IntAct > Chemical search
1. Draw your compound
2. View matching molecules
3. View known interactions
IntAct > Interaction details
IntAct > Interaction details > More ..
IntAct > Interaction details > Find similar interactions
We search for similar interaction by looking for interactions sharing the same participants. Interactions having the most in commons are shown first.
So far all hits are shown, we will work at speeding up that view as it can be rather slow when many participants exist in the original interaction.
IntAct > List of interactors
IntAct > List of interactors > Compounds
IntAct > Graph view
IntAct > Linking to Cytoscape
Molecular Interaction Standards
Engineering 1850Engineering 1850
• Nuts and bolts fit perfectly together, but only if they originate from the same factory
• Standardisation proposal in 1864 by William Sellers
• It took until after WWII until it was generally accepted, though …
Proteomics 2003Proteomics 2003
•Proteomics data are perfectly compatible, but only if they are from the same lab / database / software
• “Publish and vanish” by data producers
• Collecting all publicly available data requires huge effort
• Urgent need for standardisation
• Community standard for Molecular Interactions
• XML schema and detailed controlled vocabularies
• Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others
• Version 1.0 published in February 2004The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data.Henning Hermjakob et al, Nature Biotechnology 2004.
• Version 2.5 published in October 2007Broadening the horizon - Level 2.5 of the HUPO-PSI format for molecular interactions.Samuel Kerrien et al., BMC Biology 2007.
PSI-MI XML format
57
IntAct specific columns (+11):• Experimental role(s) of interactors• Biological role(s) of interactors• Properties (CrossReference) of interactors• Type(s) of interactors• HostOrganism(s)• Expansion method(s)• Dataset name(s)
Standard columns (15):• ID(s) interactor A & B• Alt. ID(s) interactor A & B • Alias(es) interactor A & B• Interaction detection method(s)• Publication 1st author(s)• Publication Identifier(s)• Taxid interactor A & B• Interaction type(s)• Source database(s)• Interaction identifier(s)• Confidence value(s)
+
PSIMITAB Format
Standardization in progress !!
58
PSI-MI
Data format
Data distribution
Control vocabulary
Data submission
Website
SearchInteractionsInteraction detailsInteractorsMolecular viewGraph view
Standard format
Tools
PSICQUIC
PSI-MI CV
Reporting guideline MIMIx
Tools
PSI-MI XMLPSI-MITAB
XML Java APIMITAB Java API
XMLMakerFlattenerSemantic ValidatorRPsiXML (Bioconductor)
PSI-MI XML filesPSI Excel SheetPSI Web Form
Data
ServersRegistryClients
MIMIxMIMIx
•Experiments
•Interaction detection method (eg. Yeast two hybrid)
•Participant detection method (eg. Mass Spectrometry)
•Host organism
• Interactions
•Interactors
•Identifiers from public database
•Species of origin
•Biological/experimental roles (eg. enzyme,target / bait,prey)
•Confidence
IMEx: The International Molecular Exchange Consortium
• Group of major public interaction data providers sharing curation effort: DIP, IntAct, MINT, MPact, MatrixDB, MPIDB and BioGRID
• Independent molecular interaction resources
• Common curation standards for detailed curation
• Common data formats (PSI-MI XML, PSICQUIC)
• Common accession number space
• Coordinated & non-redundant curation
• In production mode since February 2010
• Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH-2007-223411, with additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai)
Imex.sf.net
IMEx website Imex.sf.net
Getting access to more data is easy !!
Data distribution: PSICQUIC• Proteomics Standards Initiative Common QUery InterfaCe.
• Community effort to standardise the way to access and retrieve data from Molecular Interaction databases.
• Widely implemented by independent interaction data resources.
• Based on the PSI standard formats (PSI-MI XML and MITAB)
• Not limited to protein-protein interactions, also e.g.• Drug-target interactions• Simplified pathway data
• A registry listing resources implementing PSICQUIC
• Documentation: http://psicquic.googlecode.com
PSICQUIC implementation
….…. ….....
….…. ….....
PSICQUIC PSICQUIC PSICQUIC
Sample
Observation error
Interaction databases
Publications
PSICQUIC sources
Annotation error
User
PSICQUIC Registry
PSICQUIC client
21.04.2365
Servicebroker
Serviceconsumer
Serviceprovider
ServiceContract......
Interact
PublishFind
Service Oriented Architecture
PSI-MI.........
PSICQUIC Registry
DAS ClientsDAS ClientsPSICQUICClients
Format
PSICQUICsourcesPSICQUICsourcesPSICQUICsources
PSICQUIC implementation• PSICQUIC Server (SOAP/REST web service)• PSICQUIC Registry• PSICQUIC Clients
• PSICQUIC view• Cytoscape• Envision2• …
PSICQUIC Registry• http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS
• 1.693.000 binary interactions !
What can I do with PSICQUIC• User can query the registry to get a list of available services
• Registry supports tagging
• Users can script against these services using pretty much any programming language (SOAP / REST)
• Easy to parse MITAB to extract data of interest
• Data can be loaded in cytoscape to visualize a network
PSICQUIC limitations• Currently users can only download MITAB format
• We are planning to enable PSI-MI XML download too so users can get the original complex
• We are currently working on adding additional data formats:
• BioPax (only in IntAct’s PSICQUIC so far)
• SBML
Systems
Biology
Markup
Language
Overview of SBML
A machine-readable format for representing computational models in systems biology
Tool-neutral exchange language for software applications
Declares model not procedure
Independent of modelling formalism
Overview of SBML
Overview of SBML
Expressed in XML
Not really meant for humans to read
SBML structure and syntax<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?><sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1">
</sbml>
SBML structure and syntax
<?xml version="1.0" encoding="UTF-8"?><sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1"> ... <model ...> ... </model></sbml>
SBML structure and syntax
<?xml version="1.0" encoding="UTF-8"?><sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1"> ... <model ...> <listOfXYZ>
... </model></sbml>
SBML structure and syntax
<?xml version="1.0" encoding="UTF-8"?><sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1"> ... <model ...> <listOfSpecies> <species … </listOfSpecies> ... </model></sbml>
SBML structure and syntax
Compartment
SBML structure and syntax
a container of finite size for well-stirred substances
<listOfCompartments> <compartment id="cell" spatialDimensions="3" size="2.3" units="litre" constant="true"/></listOfCompartments>
Species
SBML structure and syntax
a pool of a chemical substance
<listOfSpecies> <species id="s" compartment="cell" initialAmount="4.6" substanceUnits="mole" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false"/> </listOfSpecies>
Parameter
SBML structure and syntax
a quantity of whatever type is appropriate
<listOfParameters> <parameter id="p1" value="3000" constant="false"/> <parameter id="p2" value="8000" constant="true"/> </listOfParameters>
Reaction
SBML structure and syntax
a statement describing some transformation, transport or binding process that can change one or more species
ReactantsR
ProductsP
ModifiersM
‘Kinetic law’:v = f(R, P, M, parameters)
SBML structure and syntax
S0 S1
S2
rate law: k * S0 * S2
<listOfSpecies> <species id=“S0" compartment="comp1" initialAmount="1.66057788110262e-21“/> <species id=“S1" compartment="comp1" initialAmount="0“/> <species id=“S2” compartment=“comp1” initialAmount=“2e-21”/> </listOfSpecies>
<listOfCompartments> <compartment id="comp1" size="1e-16"/> </listOfCompartments>
SBML structure and syntax
<listOfReactions> <reaction> <listOfReactants> <speciesReference species=“S0"/> </listOfReactants> <listOfProducts> <speciesReference species=“S1"/> </listOfProducts> <listOfModifiers> <modifierSpeciesReference species=“S2”/>
SBML structure and syntax <listOfSpecies> <species id=“S0" compartment="comp1" initialAmount="1.66057788110262e-21/\> <species id=“S1" compartment="comp1" initialAmount="0“/> <species id=“S2” compartment=“comp1” initialAmount=“2e-21”/> </listOfSpecies>
<kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci> comp1 </ci> <ci> k </ci> <ci> S0 </ci> <ci> S2 </ci> </apply> </math></kineticLaw>
‘id’ of other elements
SBML structure and syntax
<kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci> comp1 </ci> <ci> k </ci> <ci> S0 </ci> <ci> S2 </ci> </apply> </math> <listOfLocalParameters> <localParameter id=“k” value=“2”/> </listOfLocalParameters></kineticLaw>
SBML structure and syntax
SBML structure and syntax
S0 S1
S2
rate law: k * S0 * S2
dS0/dt = - k * S0 * S2 * comp
dS1/dt = + k * S0 * S2 * comp
Rule
SBML structure and syntax
a mathematical expression that is added to the model equations
assignmentRule
rateRule
algebraicRule
x = f(y)
dx/dt = f(y)
f(x,y) = 0
<listOfEvents>
<event id=" Turn_on_current "> <trigger> … <listOfEventAssignments> <eventAssignment variable=“flag"> <math xmlns="http://www.w3.org/1998/Math/MathML"> <cn> 1 </cn> </math> </eventAssignment>
SBML structure and syntax
At specific point t > 30 flag = 1
ModelCompartment
Reaction
Species
Rule
Unit Parameter
Level 1 Version 1
Level 1 Version 2
Function
Event
Level 2 Version 1
InitialAssignment
Constraint CompartmentType
SpeciesType
Level 2 Version 2
Level 2 Version 3
Level 2 Version 4
SBML structure and syntax
SBML Level 3
additional information
spatial
qual
Submodel 1 Submodel 2
comp
layout
core
mathematically necessary for correct interpretation
possibly necessary
SBML Resources
181 applications
(that we know about)
SBML Resources
SBML ResourcesOnline
validator
SBML Resources
SBML ResourcesOnline
Test Suite
SBML Resources
SBML Resources
SBML Resources
SBML Resources
SBML Resources
MathSBML
Mathematica
SBMLToolbox
MATLAB
Octave
SBML Resources
SBML Resourcesconverters
BioPAX
SBML Resources
Conclusions
• No idea how to integrate discrete models
• No optimal solution how to fit data to the model for discrete modelling
• We are at the beginning of in silico systems biology
• New modelling, data analysis, integration approaches and tools are needed