ompol – visualisation of large chemical spaces

42
OMPOL – Visualisation of large chemical spaces Peter Corbett, Colin Batchelor, Alexey Pshenichnov, Valery Tkachenko Royal Society of Chemistry ACS Spring 2016 San Diego, CA March 17 th 2016

Upload: valery-tkachenko

Post on 27-Jan-2017

230 views

Category:

Science


2 download

TRANSCRIPT

Page 1: OMPOL – visualisation of large chemical spaces

OMPOL – Visualisation of large chemical spaces

Peter Corbett, Colin Batchelor, Alexey Pshenichnov, Valery Tkachenko

Royal Society of Chemistry

ACS Spring 2016San Diego, CAMarch 17th 2016

Page 2: OMPOL – visualisation of large chemical spaces

CompoundsReactionAnalytical DataText and References

ChemSpider Synthetic Pages

Page 3: OMPOL – visualisation of large chemical spaces

Chemical space - 1060

Page 5: OMPOL – visualisation of large chemical spaces

RSC Data Repository

Data Repository

Properties Names and Identifiers Spectra Articles Data

Collections Patents Etc

Page 6: OMPOL – visualisation of large chemical spaces

RSC CompoundsRSC ReactionsRSC SpectraRSC CrystalsRSC PolymersRSC MaterialsRSC AssaysRSC AlgorithmsRSC Models…and on…

RSC Databases

Page 7: OMPOL – visualisation of large chemical spaces

Record labels

Page 8: OMPOL – visualisation of large chemical spaces
Page 9: OMPOL – visualisation of large chemical spaces

Need to be able to see what sorts of structures are in a collection, how they relate to each other, etc.Could use something like clusteringDimensionality Reduction – chemical structures -> fingerprints -> large dimensional space -> small dimensional spaceStandard technique – Principal Components Analysis (PCA)

Visualising Chemical Space

Page 10: OMPOL – visualisation of large chemical spaces

Dimensionality Reduction – First make a molecule-feature matrix

1 0 0 0 0 0 0 0 … 0

0 0 1 0 0 0 0 0 … 0

1 1 0 0 1 0 0 0 … 1

1 1 0 1 0 0 0 0 … 1

1 1 0 0 0 0 0 0 … 0

1 0 0 0 0 0 0 1 … 0

1 0 1 0 1 1 0 0 … 0

1 0 0 1 0 0 0 0 … 1

Page 11: OMPOL – visualisation of large chemical spaces

PCA/SVD

Page 12: OMPOL – visualisation of large chemical spaces

The result0.209 0.078 -0.368 …

0.030 0.297 0.174 …

0.509 0.005 0.343 …

0.514 -0.394 0.172 …

0.320 -0.034 -0.198 …

0.228 0.108 -0.791 …

0.338 0.812 0.151 …

0.403 -0.281 0.003 …

<--- Most important Least important --->

Page 13: OMPOL – visualisation of large chemical spaces

Plot on a graph

Page 14: OMPOL – visualisation of large chemical spaces

Need an interactive scatterplotWeb delivery => JavaScript

Need, at minimum, to click, mouseover, pan and zoomExisting scatterplot libraries, e.g. flot.js, are plentiful and

well supported……but do not scale well – become slow and unresponsive

with ~40,000 data points

The problem

Page 15: OMPOL – visualisation of large chemical spaces

Make your own graph-plotting toolOMPOL – One Million Points Of Light – an aspiration for scalability

HTML5 Canvas“Google maps” style drawing

Divide graph into panelsDraw panels as they come onto the screenAssemble display from pre-drawn panels

Opportunity for better ways of exploring the data

The solution

Page 16: OMPOL – visualisation of large chemical spaces

ChEBI~50000 compounds, of “Biological Interest”Has an ontology of compound types

Example data

Page 17: OMPOL – visualisation of large chemical spaces

Display data from dimensional reductionSelecting data points, sets of data points“Narrowing down” a cluster of compounds based on distribution in multiple dimensionsExporting dataUsing name and ontology information to select groups of points

What we’re going to show

Page 18: OMPOL – visualisation of large chemical spaces
Page 19: OMPOL – visualisation of large chemical spaces
Page 20: OMPOL – visualisation of large chemical spaces
Page 21: OMPOL – visualisation of large chemical spaces
Page 22: OMPOL – visualisation of large chemical spaces
Page 23: OMPOL – visualisation of large chemical spaces
Page 24: OMPOL – visualisation of large chemical spaces
Page 25: OMPOL – visualisation of large chemical spaces
Page 26: OMPOL – visualisation of large chemical spaces
Page 27: OMPOL – visualisation of large chemical spaces
Page 28: OMPOL – visualisation of large chemical spaces
Page 29: OMPOL – visualisation of large chemical spaces
Page 30: OMPOL – visualisation of large chemical spaces
Page 31: OMPOL – visualisation of large chemical spaces
Page 32: OMPOL – visualisation of large chemical spaces
Page 33: OMPOL – visualisation of large chemical spaces
Page 34: OMPOL – visualisation of large chemical spaces
Page 35: OMPOL – visualisation of large chemical spaces
Page 36: OMPOL – visualisation of large chemical spaces
Page 37: OMPOL – visualisation of large chemical spaces
Page 38: OMPOL – visualisation of large chemical spaces
Page 39: OMPOL – visualisation of large chemical spaces
Page 40: OMPOL – visualisation of large chemical spaces

Works very nicely with ~50000 data points and all featuresDuring development, was able to work with 1M and on occasion 10M data points

Only in 2D, didn’t have all features turned enabled

How scalable?

Page 41: OMPOL – visualisation of large chemical spaces

Interacting with large (tens of thousands to millions of data points) multidimensional data sets is now a definite possibility

Conclusion

Page 42: OMPOL – visualisation of large chemical spaces

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16