computational chemistry robots

Post on 15-Dec-2014

966 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

describes how to design and implement a protocol for high-through put computation

TRANSCRIPT

Computational Chemistry Robots

ACS Sep 2005

Computational Chemistry Robots

ACS Sep 2005

Computational ChemistryRobots

J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang

jat45@cam.ac.uk

Computational Chemistry Robots

ACS Sep 2005

•Can high-throughput computation provide a reliable “experimental” resource for

molecular properties?

•Can protocols be automated?

•Can we believe the results?

Computational Chemistry Robots

ACS Sep 2005

Aspects of complete automation

• Humans must validate protocols rather than individual data

• Low rates of error must be addressed• Users should know the rates of error and degree

of conformance

Computational Chemistry Robots

ACS Sep 2005

Approaches to conformance

• Explore limits of job behaviour (times, convergence, etc.)

• Analyse reproducibility• Vary and analyse effects of parameters and

algorithms• Compare output with other “measurements” of

same quantity

Computational Chemistry Robots

ACS Sep 2005

The overall view

molecules computation dissemination

Computational Chemistry Robots

ACS Sep 2005

The overall view

molecules computation dissemination

Check results

Computational Chemistry Robots

ACS Sep 2005

Components of System

• Workflow for management of jobs (Taverna)• Natural Language Processing based parsing of

outputs (JUMBOMarker)• Pairwise comparison of data sets (R)• Analysis of mean and variance• Detection and analysis of outliers

Computational Chemistry Robots

ACS Sep 2005

Computing the NCI database

MOPACPM5a

aMOPAC PM5 – collaboration with J.J.P. Stewart

Computational Chemistry Robots

ACS Sep 2005

Protocol

Log Files

Parse

SystemCrashes

ScienceErrors

Analysis

PathologicalBehaviour

Statistics

Other Science DisseminateResults

UnsuitableData Program

Crashes

InformDeveloper

Computational Chemistry Robots

ACS Sep 2005

Taverna

•Workflow programs allow a series of small tasks to be linked together to develop more complex tasks

•Open Source

•myGRID, eScience

•European Bioinformatics Institute

•University of Manchester

Computational Chemistry Robots

ACS Sep 2005

An Example Taverna Workflow

Computational Chemistry Robots

ACS Sep 2005

Parsing Log Files to CMLCoordinates

Molecular

Formula

Calculation Type

Point Group

Dipole

Total Energy

Computational Chemistry Log Files

Computational Chemistry Robots

ACS Sep 2005CompChem

Output

Coordinates

Energy Levels

Vibrations

Coordinates

Energy Level

Vibration

CML File

CMLCore

CMLCore

CMLComp

CMLSpect

Input/jobControl General

Parsers

Computational Chemistry Robots

ACS Sep 2005

Dissemination of resultsLOG FILE CML FILE HUMAN DISPLAY

WWMM* Server and DSpace Outside world

JUMBOMarker

NLP-based log file parser

* World Wide Molecular Matrix

Computational Chemistry Robots

ACS Sep 2005

InChI: IUPAC International Chemical IdentifierA non-proprietary unique identifier for the representation of chemical structures.

A normal, canonicalised and serialised form of a chemical connection table.

InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq/

Computational Chemistry Robots

ACS Sep 2005

Proteus molecules*

Calculation

JUNK Cured by MOPAC

* Proteus was a shape changing ocean deity

Computational Chemistry Robots

ACS Sep 2005

Proteus molecules

Calculation

Input JUNK

Computational Chemistry Robots

ACS Sep 2005

How do we know our results are valid?

ComputationalMethod 1

ComputationalMethod 2 Experiment

Computational Chemistry Robots

ACS Sep 2005

J.J.P. Stewart’s example

Calculated Hf – Expt Hf

Computational Chemistry Robots

ACS Sep 2005

GAMESS

MOPACresults

GAMESSa

631G*B3LYP

Log Files

a Project with Kim Baldridge and Wibke Sudholt

Computational Chemistry Robots

ACS Sep 2005

Protocol

Log Files

Parse

SystemCrashes

ScienceErrors

Analysis

PathologicalBehaviour

Statistics

Other Science DisseminateResults

UnsuitableData Program

Crashes

InformDeveloper

Computational Chemistry Robots

ACS Sep 2005

Repeat runs, different methods

Multiple runs give same final structure from same input

Changing memory allocation doesn’t make a difference

Computational Chemistry Robots

ACS Sep 2005

Pathological behaviour - Early detection

100 min 631G*, B3LYP 200 min

15 min 631G*, B3LYP 10080 min

divinyl ether trans-Crotonaldehyde

Z matrix

Computational Chemistry Robots

ACS Sep 2005

Times to run jobs

0

40,000

80,000

120,000

0.E+00 5.E+08 1.E+09

(n basis functions)4

time

/ s

Computational Chemistry Robots

ACS Sep 2005

Analysis of different computational methods

Mean - Overall difference

Normality - Distribution of values

Outliers - Unusual molecules?

Variance - Spread of the data, depends

on both distributions.

(standard deviation)

Computational Chemistry Robots

ACS Sep 2005

Probability Plot (Normal QQ plot)

Computational Chemistry Robots

ACS Sep 2005

Mean of distribution(Approx - 0.03 Å)

Range over whichsample distribution is approximately normal Outliers

Probability Plot (Normal QQ plot)S.D. 0.020 Å

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (MOPAC – GAMESS) / Å

* Excludes bonds to Hydrogenc

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (MOPAC – GAMESS) / Å

Good agreement

Nearly normal

Outliers

S.D. 0.005 Å

* Excludes bonds to Hydrogenc

Computational Chemistry Robots

ACS Sep 2005

NN

O2-

Bad molecules and data usually cause outliers

Na

P

O

OH

H

Computational Chemistry Robots

ACS Sep 2005

Mean r (M - G) / Å Standard Error of the Mean / Å

  C N O F S Cl

C-0.006 0.020 -0.010 -0.014 -0.040 -0.037

0.000 0.000 0.000 0.001 0.001 0.001

N  0.006 -0.037   -0.055  

  0.001 0.001   0.009  

O    -0.087   -0.070  

    0.004   0.014  

All values given to 3 significant figures

Computational Chemistry Robots

ACS Sep 2005

r CC bonds (M - G) / Å

Computational Chemistry Robots

ACS Sep 2005

r CC bonds (M - G) / Å

Good agreement

Nearly normal Outliers

S.D. 0.013 Å

JUNK

Computational Chemistry Robots

ACS Sep 2005

Selection of molecules with C C r (M - G) > 0.05 Angstroms

CF3

OH

OH CF3

H CF3

OCF3N

H

NH2

OHOHFF

OH CHF2

O

Computational Chemistry Robots

ACS Sep 2005

Y = 0.0277 X – 0.0061

Non aromatic C C bonds adjacent to CFn

Computational Chemistry Robots

ACS Sep 2005

r NN bonds (M - G) / Å

Computational Chemistry Robots

ACS Sep 2005

Good agreement

Nearly normal

Kink

S.D. 0.022 Å

r NN bonds (M - G) / Å

Computational Chemistry Robots

ACS Sep 2005

Density plot of r NN bonds (M - G) / Å

Computational Chemistry Robots

ACS Sep 2005

LEFT

RIGHT

Density plot of r NN bonds (M - G) / Å

Computational Chemistry Robots

ACS Sep 2005

Most common fragments found in Left set but not Right set

N

NC(sp3)C(sp3)

(sp3)S(sp2)

N(ar)

N (ar)

C(sp2)

S(sp2)

N(ar)

N (ar)

C(sp2)

Or

Computational Chemistry Robots

ACS Sep 2005

GAMESS

Log Files

Comparison of theory and experiment

CIF*

CIF*

CIF*

CIF*

CIF*

CIF 2 CML

* CIF: Crystallographic Information File

Computational Chemistry Robots

ACS Sep 2005

Reading Acta Crystallographica Section E

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (Cryst. – GAMESS) /Å Single molecules, no disorder

* Excludes bonds to Hydrogenc

Computational Chemistry Robots

ACS Sep 2005

All bonds* r (Cryst. – GAMESS) /Å Single molecules, no disorder

Mean r - 0.011 Å

Nearly normalOutliers

S.D. 0.014 Å

* Excludes bonds to Hydrogenc

Computational Chemistry Robots

ACS Sep 2005

r CC bonds (C – G) /Å

Computational Chemistry Robots

ACS Sep 2005Mean r- 0.01 Å

Nearly normal

S.D. 0.009 Å

r CC bonds (C – G) /Å

Computational Chemistry Robots

ACS Sep 2005

r CO bonds (C – G) /Å

Computational Chemistry Robots

ACS Sep 2005

Good agreement

Nearly normalOutliers ?

S.D. 0.011 Å

r CO bonds (C – G) /Å

Computational Chemistry Robots

ACS Sep 2005

r = +0.08 Å

Chemistry can cause outliers

H movement

Computational Chemistry Robots

ACS Sep 2005

Conclusions

• Protocols can be automated

• Machines can highlight unusual behaviour,

geometries and distribution of results for

humans to consider

•Computational programs can provide high

quality “experimental” molecular properties

Computational Chemistry Robots

ACS Sep 2005

Thanks

J.J.P. Stewart

Kim Baldridge

Wibke Sudholt

Simon Tyrrell

Yong Zhang

Peter Murray-Rust

Unilever

Computational Chemistry Robots

ACS Sep 2005

Questions

Homepage: http://wwmm.ch.cam.ac.uk

InChI FAQ: http://wwmm.ch.cam.ac.uk/inchifaq

R: http:// www.r-project.org

Taverna: http://taverna.sourceforge.net/

MOPAC 2002: http://www.cachesoftware.com/mopac/

GAMESS: http:// www.msg.ameslab.gov/GAMESS/GAMESS.html

top related