computational chemistry robots

52
Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots ACS Sep 2005 Computational Chemistry Robots J. A. Townsend, P. Murray- Rust, S. M. Tyrrell, Y. Zhang [email protected]

Upload: university-of-cambridge

Post on 15-Dec-2014

966 views

Category:

Technology


2 download

DESCRIPTION

describes how to design and implement a protocol for high-through put computation

TRANSCRIPT

Page 1: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Computational Chemistry Robots

ACS Sep 2005

Computational ChemistryRobots

J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang

[email protected]

Page 2: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

•Can high-throughput computation provide a reliable “experimental” resource for

molecular properties?

•Can protocols be automated?

•Can we believe the results?

Page 3: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Aspects of complete automation

• Humans must validate protocols rather than individual data

• Low rates of error must be addressed• Users should know the rates of error and degree

of conformance

Page 4: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Approaches to conformance

• Explore limits of job behaviour (times, convergence, etc.)

• Analyse reproducibility• Vary and analyse effects of parameters and

algorithms• Compare output with other “measurements” of

same quantity

Page 5: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

The overall view

molecules computation dissemination

Page 6: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

The overall view

molecules computation dissemination

Check results

Page 7: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Components of System

• Workflow for management of jobs (Taverna)• Natural Language Processing based parsing of

outputs (JUMBOMarker)• Pairwise comparison of data sets (R)• Analysis of mean and variance• Detection and analysis of outliers

Page 8: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Computing the NCI database

MOPACPM5a

aMOPAC PM5 – collaboration with J.J.P. Stewart

Page 9: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Protocol

Log Files

Parse

SystemCrashes

ScienceErrors

Analysis

PathologicalBehaviour

Statistics

Other Science DisseminateResults

UnsuitableData Program

Crashes

InformDeveloper

Page 10: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Taverna

•Workflow programs allow a series of small tasks to be linked together to develop more complex tasks

•Open Source

•myGRID, eScience

•European Bioinformatics Institute

•University of Manchester

Page 11: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

An Example Taverna Workflow

Page 12: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Parsing Log Files to CMLCoordinates

Molecular

Formula

Calculation Type

Point Group

Dipole

Total Energy

Computational Chemistry Log Files

Page 13: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005CompChem

Output

Coordinates

Energy Levels

Vibrations

Coordinates

Energy Level

Vibration

CML File

CMLCore

CMLCore

CMLComp

CMLSpect

Input/jobControl General

Parsers

Page 14: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Dissemination of resultsLOG FILE CML FILE HUMAN DISPLAY

WWMM* Server and DSpace Outside world

JUMBOMarker

NLP-based log file parser

* World Wide Molecular Matrix

Page 15: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

InChI: IUPAC International Chemical IdentifierA non-proprietary unique identifier for the representation of chemical structures.

A normal, canonicalised and serialised form of a chemical connection table.

InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/

Page 16: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Proteus molecules*

Calculation

JUNK Cured by MOPAC

* Proteus was a shape changing ocean deity

Page 17: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Proteus molecules

Calculation

Input JUNK

Page 18: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

How do we know our results are valid?

ComputationalMethod 1

ComputationalMethod 2 Experiment

Page 19: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

J.J.P. Stewart’s example

Calculated Hf – Expt Hf

Page 20: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

GAMESS

MOPACresults

GAMESSa

631G*B3LYP

Log Files

a Project with Kim Baldridge and Wibke Sudholt

Page 21: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Protocol

Log Files

Parse

SystemCrashes

ScienceErrors

Analysis

PathologicalBehaviour

Statistics

Other Science DisseminateResults

UnsuitableData Program

Crashes

InformDeveloper

Page 22: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Repeat runs, different methods

Multiple runs give same final structure from same input

Changing memory allocation doesn’t make a difference

Page 23: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Pathological behaviour - Early detection

100 min 631G*, B3LYP 200 min

15 min 631G*, B3LYP 10080 min

divinyl ether trans-Crotonaldehyde

Z matrix

Page 24: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Times to run jobs

0

40,000

80,000

120,000

0.E+00 5.E+08 1.E+09

(n basis functions)4

time

/ s

Page 25: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Analysis of different computational methods

Mean - Overall difference

Normality - Distribution of values

Outliers - Unusual molecules?

Variance - Spread of the data, depends

on both distributions.

(standard deviation)

Page 26: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Probability Plot (Normal QQ plot)

Page 27: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Mean of distribution(Approx - 0.03 Å)

Range over whichsample distribution is approximately normal Outliers

Probability Plot (Normal QQ plot)S.D. 0.020 Å

Page 28: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (MOPAC – GAMESS) / Å

* Excludes bonds to Hydrogenc

Page 29: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (MOPAC – GAMESS) / Å

Good agreement

Nearly normal

Outliers

S.D. 0.005 Å

* Excludes bonds to Hydrogenc

Page 30: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

NN

O2-

Bad molecules and data usually cause outliers

Na

P

O

OH

H

Page 31: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Mean r (M - G) / Å Standard Error of the Mean / Å

  C N O F S Cl

C-0.006 0.020 -0.010 -0.014 -0.040 -0.037

0.000 0.000 0.000 0.001 0.001 0.001

N  0.006 -0.037   -0.055  

  0.001 0.001   0.009  

O    -0.087   -0.070  

    0.004   0.014  

All values given to 3 significant figures

Page 32: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

r CC bonds (M - G) / Å

Page 33: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

r CC bonds (M - G) / Å

Good agreement

Nearly normal Outliers

S.D. 0.013 Å

JUNK

Page 34: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Selection of molecules with C C r (M - G) > 0.05 Angstroms

CF3

OH

OH CF3

H CF3

OCF3N

H

NH2

OHOHFF

OH CHF2

O

Page 35: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Y = 0.0277 X – 0.0061

Non aromatic C C bonds adjacent to CFn

Page 36: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

r NN bonds (M - G) / Å

Page 37: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Good agreement

Nearly normal

Kink

S.D. 0.022 Å

r NN bonds (M - G) / Å

Page 38: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Density plot of r NN bonds (M - G) / Å

Page 39: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

LEFT

RIGHT

Density plot of r NN bonds (M - G) / Å

Page 40: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Most common fragments found in Left set but not Right set

N

NC(sp3)C(sp3)

(sp3)S(sp2)

N(ar)

N (ar)

C(sp2)

S(sp2)

N(ar)

N (ar)

C(sp2)

Or

Page 41: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

GAMESS

Log Files

Comparison of theory and experiment

CIF*

CIF*

CIF*

CIF*

CIF*

CIF 2 CML

* CIF: Crystallographic Information File

Page 42: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Reading Acta Crystallographica Section E

Page 43: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (Cryst. – GAMESS) /Å Single molecules, no disorder

* Excludes bonds to Hydrogenc

Page 44: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (Cryst. – GAMESS) /Å Single molecules, no disorder

Mean r - 0.011 Å

Nearly normalOutliers

S.D. 0.014 Å

* Excludes bonds to Hydrogenc

Page 45: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

r CC bonds (C – G) /Å

Page 46: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005Mean r- 0.01 Å

Nearly normal

S.D. 0.009 Å

r CC bonds (C – G) /Å

Page 47: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

r CO bonds (C – G) /Å

Page 48: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Good agreement

Nearly normalOutliers ?

S.D. 0.011 Å

r CO bonds (C – G) /Å

Page 49: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

r = +0.08 Å

Chemistry can cause outliers

H movement

Page 50: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Conclusions

• Protocols can be automated

• Machines can highlight unusual behaviour,

geometries and distribution of results for

humans to consider

•Computational programs can provide high

quality “experimental” molecular properties

Page 51: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Thanks

J.J.P. Stewart

Kim Baldridge

Wibke Sudholt

Simon Tyrrell

Yong Zhang

Peter Murray-Rust

Unilever

Page 52: Computational Chemistry Robots

Computational Chemistry Robots

ACS Sep 2005

Questions

Homepage: http://wwmm.ch.cam.ac.uk

InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq

R: http:// www.r-project.org

Taverna: http://taverna.sourceforge.net/

MOPAC 2002: http://www.cachesoftware.com/mopac/

GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html