rm world 2014: big data for predicting chemical behavior

27
RapidMiner World 2014 | Boston, MA USA | August 19, 2014 AMAKURO Services | www.amakuro.com

Upload: rapidminer

Post on 18-Nov-2014

400 views

Category:

Software


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: RM World 2014: Big data for predicting chemical behavior

RapidMiner World 2014 | Boston, MA USA | August 19, 2014

AMAKURO Services | www.amakuro.com

Page 2: RM World 2014: Big data for predicting chemical behavior

RapidMiner World 2014 | Boston, MA USA | August 19, 2014

AMAKURO Services | www.amakuro.com

1 Introduction

1.1 Multimedia environmental models

1.2 Quantitative Structure-Activity Relationships (QSARs)

1.3 Big Data required

2 Predicting chemical behavior

2.1 Case study

2.2 Machine-learning methods

3 Concluding remarks

Page 3: RM World 2014: Big data for predicting chemical behavior

The Bhopal gas tragedy

- methyl isocyanate -

Bhopal, India

1984

BP’s deepwater oil spill

- oil -

Gulf of Mexico, US

2010

2011 Tōhoku

earthquake & tsunami

- nuclear material -

Tōhoku, Japan

2011

Page 4: RM World 2014: Big data for predicting chemical behavior

The problem:

“To predict the behavior of chemicals

WHEN

their properties and degradation data

are unavailable or noisy”.

“Exposure Science: A View of the Past and Milestones for the Future.”

Lioy PJ.

Environmental Health Perspectives. Aug 2010. Vol. 118, Issue 8, p1081-1090

Page 5: RM World 2014: Big data for predicting chemical behavior

Multimedia Environmental Models

estimate

the environmental distribution

of chemical pollutants.

“Multimedia environmental models—the fugacity approach.”

Mackay D.

2nd Edition. CRC Press. Feb 2001.

Page 6: RM World 2014: Big data for predicting chemical behavior

emission

of chemical

pollutants

in air

“Multimedia environmental models—the fugacity approach.”

Mackay D.

2nd Edition. CRC Press. Feb 2001.

Page 7: RM World 2014: Big data for predicting chemical behavior

emission

of chemical

pollutants

in water

“Multimedia environmental models—the fugacity approach.”

Mackay D.

2nd Edition. CRC Press. Feb 2001.

Page 8: RM World 2014: Big data for predicting chemical behavior

“Multimedia environmental models—the fugacity approach.”

Mackay D.

2nd Edition. CRC Press. Feb 2001.

Page 9: RM World 2014: Big data for predicting chemical behavior

X X

X

“Multimedia environmental models—the fugacity approach.”

Mackay D.

2nd Edition. CRC Press. Feb 2001.

Page 10: RM World 2014: Big data for predicting chemical behavior

In the EU, the REACH regulation:

a) requires the compilation of

physicochemical and

environmental data.

b) suggests the use of QSARs.

“…Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH)…”

European Parliament and Council.

Official Journal of the European Union. Dec 2006. L396.

Page 11: RM World 2014: Big data for predicting chemical behavior

X X

X

? “QSAR Modeling: Where Have You Been? Where Are You Going To?”

Cherkasov A, et al.

Journal of Medicinal Chemistry. 18 Dec 2013.

Page 12: RM World 2014: Big data for predicting chemical behavior

y = f(x1, x2, ..., xn)

⃝ attributes

targets

targets

“QSAR Modeling: Where Have You Been? Where Are You Going To?”

Cherkasov A, et al.

Journal of Medicinal Chemistry. 18 Dec 2013.

X

QSARs correlate

chemical activity

to molecular

attributes.

Page 13: RM World 2014: Big data for predicting chemical behavior

⃝ ⃝

⃝ attributes

predictions

predictions

“Finding and estimating chemical property data for

environmental assessment.”

Boethling RS, et al.

Environ. Toxicol. Chem. Oct 2004. Vol. 23, Issue 10.

y = f(x1, x2, ..., xn)

“Estimating biodegradation half-lives for

use in chemical screening. ”

Aronson D, et al.

Chemosphere. June 2006. Vol. 63, Issue 11

Both properties and

degradation data

are usually

predicted

via QSARs.

Page 14: RM World 2014: Big data for predicting chemical behavior

⃝ ⃝

⃝ attributes

predictions+errors

“Multimedia environmental chemical partitioning

from molecular information.”

Martínez I, et al.

STOTEN. Dec 2010. Vol. 409, Issue 2, p412–422.

predictions

+errors

MEM output (using experimental data)

ME

M o

utpu

t (u

sing

pre

dict

ed p

rope

rtie

s)

Errors

in predicted

properties and degradation data

distort

environmental assessments.

Page 15: RM World 2014: Big data for predicting chemical behavior

⃝ ⃝ attributes targets

“Multimedia environmental chemical partitioning

from molecular information.”

Martínez I, et al.

STOTEN. Dec 2010. Vol. 409, Issue 2, p412–422.

Alternatively,

Multimedia Environmental

Models can be emulated

too.

Page 16: RM World 2014: Big data for predicting chemical behavior

“Quantitative structure fate relationships for multimedia environmental analysis.”

Martínez I.

PhD thesis, Rovira I Virgili University.

logarithmic mass fractions in air

for a geographical scenario

Predictions

are

OK

when

relevant

training data

are

available!

Page 17: RM World 2014: Big data for predicting chemical behavior

“Quantitative structure fate relationships for multimedia environmental analysis.”

Martínez I.

PhD thesis, Rovira I Virgili University.

Predictions

are

OK

when

relevant

training data

are

available!

Page 18: RM World 2014: Big data for predicting chemical behavior

QSARs

can be deceiving

when used

outside of their

domain of applicability*

*set by training chemicals

“The importance of the domain of applicability in QSAR modeling”

Weaver S, Gleeson MP.

Journal of Molecular Graphics & Modelling. June 2008. Vol. 26, Issue 8.

√ X

Page 19: RM World 2014: Big data for predicting chemical behavior

FEW

HUNDRED

CHEMICALS

have

degradation data

experimentally measured.

“Estimating biodegradation

half-lives for

use in chemical screening. ”

Aronson D, et al.

Chemosphere. June 2006. Vol.

63, Issue 11

FEW

THOUSAND

CHEMICALS

have

physicochemical properties

experimentally measured.

“Finding and estimating chemical

property data for

environmental assessment.”

Boethling RS, et al.

Environ. Toxicol. Chem. Oct 2004.

Vol. 23, Issue 10.

50,000,000 CHEMICALS

were

registered

until 2009

“A scientific milestone”

Toussant M.

Chemical & Engineering News. 14

Sept 2009. Vol. 87, Issue 3, p3.

Page 20: RM World 2014: Big data for predicting chemical behavior

chemicals

for which all

properties

and

degradation data

are known

“Quantitative structure fate relationships for multimedia environmental analysis.”

Martínez I.

PhD thesis, Rovira I Virgili University.

“CHEMICAL SPACE”

unknown

properties and

degradation data

known

properties

known

degradation data

Page 21: RM World 2014: Big data for predicting chemical behavior
Page 22: RM World 2014: Big data for predicting chemical behavior
Page 23: RM World 2014: Big data for predicting chemical behavior
Page 24: RM World 2014: Big data for predicting chemical behavior
Page 25: RM World 2014: Big data for predicting chemical behavior

IMPORTANT REMARKS

Multimedia Environmental Models

and QSARs

are approximations of nature.

They should be updated as new data

become available.

“Wanted: Big Data for predicting chemical behavior”

Martinez I.

RapidMiner World 2014. Boston, MA USA . August 19, 2014

Page 26: RM World 2014: Big data for predicting chemical behavior

Using

it is possible to:

1) query chemical databases

2) train QSAR models on the fly

3) assess QSAR domains of applicability

4) select appropriate models

“Wanted: Big Data for predicting chemical behavior”

Martinez I.

RapidMiner World 2014. Boston, MA USA . August 19, 2014

Page 27: RM World 2014: Big data for predicting chemical behavior

RapidMiner World 2014 | Boston, MA USA | August 19, 2014

AMAKURO Services | www.amakuro.com