representation of chemical data in qsar and crystallography

20
Representation of chemical data in QSAR and crystallography Egon Willighagen, Lunteren 2005

Upload: egon-willighagen

Post on 27-Jan-2015

105 views

Category:

Health & Medicine


1 download

DESCRIPTION

My presentation at the Annual meeting NWO-CW section Analytical Chemistry, Lunteren, The Netherlands, 2005

TRANSCRIPT

Page 1: Representation of chemical data in QSAR and Crystallography

Representation of chemical data in QSAR and crystallography

–Egon Willighagen, Lunteren 2005

Page 2: Representation of chemical data in QSAR and Crystallography

Computer representation of molecular structures

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

•Connection Table

•Dietz Representation

•Physical Properties•Molecular Invariants

•Schrödinger Equation

•Spectra (NMR, IR, ...)

Page 3: Representation of chemical data in QSAR and Crystallography

Computer representations of molecular structures

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

•Connection Table

•Dietz Representation

•Physical Properties•Molecular Invariants

•Schrödinger Equation

•Spectra (NMR, IR, ...)

Page 4: Representation of chemical data in QSAR and Crystallography

Representing relations between descriptors

•• Descriptor ontology: explicit definition of descriptor

types and descriptor properties•••••

•C.Steinbeck, C.Hoppe, S.Kuhn, M.Floris, R.Guha, E.L.Willighagen, Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics, Current Pharmaceutical Design, accepted

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 5: Representation of chemical data in QSAR and Crystallography

Representation does make a difference

• Use of NMR spectra in Quantitative Structure Activity Relationship (QSAR) modeling

• Three representations: simulated 1H NMR, 13C NMR spectra and theoretical descriptors

• Three data sets:– water solubility of 431 compounds (WS)– boiling points of 277 compounds (BP)– LogP values of 154 compounds (LogP)–

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

•E.L.Willighagen, H.M.G.W.Denissen, R.Wehrens, L.M.C.Buydens, On the use of 1H and 13C NMR spectra as QSAR descriptors, submitted

Page 6: Representation of chemical data in QSAR and Crystallography

How the experiment is performed...

• Partial Least Squares

– 220 NMR bins– 220 randomly selected theoretical descriptors (Dragon)–– five random divisions in training and test sets–– number of latent variables chosen with leave-one-out cross

validation

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 7: Representation of chemical data in QSAR and Crystallography

Number of latent variables

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 8: Representation of chemical data in QSAR and Crystallography

1H and 13C NMR versus Dragon Descriptors

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 9: Representation of chemical data in QSAR and Crystallography

Prediction Errors

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 10: Representation of chemical data in QSAR and Crystallography

Model interpretation?

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 11: Representation of chemical data in QSAR and Crystallography

What can we conclude?

• Representation has a large effect

• 1H NMR models have no predictive power

• 13C NMR models have some predictive power ... but no advantages

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 12: Representation of chemical data in QSAR and Crystallography

Finding a representation for crystal structures

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 13: Representation of chemical data in QSAR and Crystallography

Electronic Radial Distribution Function (ReDF)

•• Describes patterns in atom interactions in and around the

unit cell

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

• Å

•E.L.Willighagen, R.Wehrens, P.Verwer, R.de Gelder, L.M.C.Buydens, A Method for the Computational Comparison of Crystal Structures, Acta.Cryst., 2005, B61, 29-36

Page 14: Representation of chemical data in QSAR and Crystallography

Quantifying Similarities

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 15: Representation of chemical data in QSAR and Crystallography

Quantifying Similarities

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

ReDF 1

ReDF 2

Weighted

Cross

Correlation

Similarity [0,1]

Page 16: Representation of chemical data in QSAR and Crystallography

Cephalosporin crystal structures

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 17: Representation of chemical data in QSAR and Crystallography

Polymorph Prediction

• Polymorphs: different crystal structures for the same molecular compound

• Polymorph Prediction: computational method to predict the polymorphs given a molecular structure

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 18: Representation of chemical data in QSAR and Crystallography

Polymorphic estrone crystal structures

• Better trend in similarity going from identical to different structures :

ReDF + WCC Cerius2

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 19: Representation of chemical data in QSAR and Crystallography

Conclusions

• It is important to pick a proper representation

• 1H and 13C NMR spectra are not good representations for QSAR models

• A new crystal structure descriptor for gives chemically better interpretable similarities

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005

Page 20: Representation of chemical data in QSAR and Crystallography

Acknowledgments

● Ron Wehrens, Lutgarde Buydens (supervisors)

● René de Gelder, Paul Verwer (crystal structures)

● Harm Denissen (QSAR)

● Peter Murray-Rust (Cambridge University, UK) Christoph Steinbeck (Cologne University, DE)

● NWO (for financial support)

Representation of chemical datain QSAR and crystallography

Egon Willighagen, Lunteren 2005