linking resource description framework to …bulletin.acscinf.org/pdfs/240nm03.pdflinking resource...
TRANSCRIPT
Linking Resource Description Frameworkto Cheminformatics and
Proteochemometrics
Egon Willighagen <http://chem-bla-ics.blogspot.com/>
Bioclipse & Proteochemometric Group (Prof. Wikberg)Until 2010-09-30
Department of Pharmaceutical Biosciences
Uppsala University
2010-08-22
Problem
BuildingBlocks
Open Data
Application
Conclusion
Proteochemometrics
2010-08-22 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Data Analysis
2010-08-22 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Knowledge...
Solanum lycopersicum...
We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherDifferent representations:compatible?Information Loss!
2010-08-22 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Names...
benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacid
InChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)
p450 (which one?? all residues known?)Solanum lycopersicum (well....)
2010-08-22 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
... Molecular reality...
1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000
2010-08-22 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
... and Numbers
2010-08-22 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Main Theme
How do we navigate this dimensions space?How to include prior knowledge?Minimize information loss?With optimal knowledge extraction?Maximizing interpretability?Without ending up in random correlation?
2010-08-22 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
OpenMolecules RDF: dereferenceable URI
http://rdf.openmolecules.net/?InChI=1/CH4/h1H4
2010-08-22 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
OpenMolecules RDF: linked data
2010-08-22 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
The Chemistry Development Kit
A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)
Goalslibrary of cheminformatics algorithmseducational
UsageCDK: 100+ times cited in scientific literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...
C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006
2010-08-22 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Bioclipse
O. Spjuth et al., BMC Bioinformatics 2007O. Spjuth et al., BMC Bioinformatics 2010
2010-08-22 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Bioclipse-RDF
local RDF storage (memory, on disk)read/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa
Thanx to Open Source projects including Jena, SWI-Prolog,and Pellet.
2010-08-22 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
SPARQL end points (Open Data)
NMRShiftDB data (C. Steinbeck, EBI/UK)ChEMBL (J. Overingthon, EBI/UK)
2010-08-22 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Proteochemometrics: simple QSAR
E.L.Willighagen et al., J. Biomed. Sem., 2010, in print
2010-08-22 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Proteochemometrics: RDF input
2010-08-22 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Proteochemometrics: Bayesian + extraPriors
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●●
●
● ● ●
●
●●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
● ●
●
●
2 4 6 8 10 12
−5
05
1015
20(a)
Actual
Pre
dict
ed
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●●●
●
●●
●
●● ●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
2 4 6 8 10 12−
50
510
1520
(b)
Actual
Pre
dict
ed
2010-08-22 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
MyExperiment: Bioclipse ScriptingLanguage
myexperiment.search("RDF")myexperiment.downloadWorkflow(937)
2010-08-22 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Reasoning: Prolog and Pellet
Samuel Lampa, M.Sc. project2010-08-22 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Semantic Wikis
Samuel Lampa, Google Summer of Code 2010
2010-08-22 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
XHTML+RDFa
2010-08-22 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
OpenTox: downloading
2010-08-22 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
OpenTox: uploading
// requires an unspecified Bioclipse // development versionds = opentox.createDataset( "http://apps.ideaconsult.net:8080/ambit2/");opentox.addMolecule(ds, cdk.fromSMILES("CCCCC[N+](C)(C)C") )opentox.addMolecule(ds, cdk.fromSMILES("ClC(I)Br") )opentox.deleteDataset(ds);
2010-08-22 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Linked Data: Visualization
2010-08-22 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Substructure mining: ChEMBL
Annsofie Andersson, M.Sc. project2010-08-22 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Substructure mining: .. and MoSS
2010-08-22 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
What does this bring us?
Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridging Names to Numbers
2010-08-22 Bioclipse & Proteochemometric Group - 27 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
Acknowledgements
Maris Lapins, Martin Eklund: statisticsAnnsofie Andersson: ChEMBL + MoSS integrationSamuel Lampa: reasoning (Pellet/Prolog) and RDFIONina Jeliazkova: OpenTox integration
2010-08-22 Bioclipse & Proteochemometric Group - 28 - Egon Willighagen | chem-bla-ics.blogspot.com
Problem
BuildingBlocks
Open Data
Application
Conclusion
The Details
http://www.citeulike.org/user/
egonw/tag/papers
http:
//chem-bla-ics.blogspot.com
http://egonw.github.com
waveto:
2010-08-22 Bioclipse & Proteochemometric Group - 29 - Egon Willighagen | chem-bla-ics.blogspot.com