rm world 2014: big data for predicting chemical behavior
DESCRIPTION
TRANSCRIPT
RapidMiner World 2014 | Boston, MA USA | August 19, 2014
AMAKURO Services | www.amakuro.com
RapidMiner World 2014 | Boston, MA USA | August 19, 2014
AMAKURO Services | www.amakuro.com
1 Introduction
1.1 Multimedia environmental models
1.2 Quantitative Structure-Activity Relationships (QSARs)
1.3 Big Data required
2 Predicting chemical behavior
2.1 Case study
2.2 Machine-learning methods
3 Concluding remarks
The Bhopal gas tragedy
- methyl isocyanate -
Bhopal, India
1984
BP’s deepwater oil spill
- oil -
Gulf of Mexico, US
2010
2011 Tōhoku
earthquake & tsunami
- nuclear material -
Tōhoku, Japan
2011
The problem:
“To predict the behavior of chemicals
WHEN
their properties and degradation data
are unavailable or noisy”.
“Exposure Science: A View of the Past and Milestones for the Future.”
Lioy PJ.
Environmental Health Perspectives. Aug 2010. Vol. 118, Issue 8, p1081-1090
Multimedia Environmental Models
estimate
the environmental distribution
of chemical pollutants.
“Multimedia environmental models—the fugacity approach.”
Mackay D.
2nd Edition. CRC Press. Feb 2001.
emission
of chemical
pollutants
in air
“Multimedia environmental models—the fugacity approach.”
Mackay D.
2nd Edition. CRC Press. Feb 2001.
emission
of chemical
pollutants
in water
“Multimedia environmental models—the fugacity approach.”
Mackay D.
2nd Edition. CRC Press. Feb 2001.
“Multimedia environmental models—the fugacity approach.”
Mackay D.
2nd Edition. CRC Press. Feb 2001.
X X
X
“Multimedia environmental models—the fugacity approach.”
Mackay D.
2nd Edition. CRC Press. Feb 2001.
In the EU, the REACH regulation:
a) requires the compilation of
physicochemical and
environmental data.
b) suggests the use of QSARs.
“…Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH)…”
European Parliament and Council.
Official Journal of the European Union. Dec 2006. L396.
X X
X
? “QSAR Modeling: Where Have You Been? Where Are You Going To?”
Cherkasov A, et al.
Journal of Medicinal Chemistry. 18 Dec 2013.
⃝
y = f(x1, x2, ..., xn)
⃝ attributes
targets
targets
⃝
“QSAR Modeling: Where Have You Been? Where Are You Going To?”
Cherkasov A, et al.
Journal of Medicinal Chemistry. 18 Dec 2013.
X
QSARs correlate
chemical activity
to molecular
attributes.
⃝ ⃝
⃝ attributes
predictions
predictions
“Finding and estimating chemical property data for
environmental assessment.”
Boethling RS, et al.
Environ. Toxicol. Chem. Oct 2004. Vol. 23, Issue 10.
y = f(x1, x2, ..., xn)
“Estimating biodegradation half-lives for
use in chemical screening. ”
Aronson D, et al.
Chemosphere. June 2006. Vol. 63, Issue 11
Both properties and
degradation data
are usually
predicted
via QSARs.
⃝ ⃝
⃝ attributes
predictions+errors
“Multimedia environmental chemical partitioning
from molecular information.”
Martínez I, et al.
STOTEN. Dec 2010. Vol. 409, Issue 2, p412–422.
predictions
+errors
MEM output (using experimental data)
ME
M o
utpu
t (u
sing
pre
dict
ed p
rope
rtie
s)
Errors
in predicted
properties and degradation data
distort
environmental assessments.
⃝ ⃝ attributes targets
“Multimedia environmental chemical partitioning
from molecular information.”
Martínez I, et al.
STOTEN. Dec 2010. Vol. 409, Issue 2, p412–422.
Alternatively,
Multimedia Environmental
Models can be emulated
too.
“Quantitative structure fate relationships for multimedia environmental analysis.”
Martínez I.
PhD thesis, Rovira I Virgili University.
logarithmic mass fractions in air
for a geographical scenario
Predictions
are
OK
when
relevant
training data
are
available!
“Quantitative structure fate relationships for multimedia environmental analysis.”
Martínez I.
PhD thesis, Rovira I Virgili University.
Predictions
are
OK
when
relevant
training data
are
available!
QSARs
can be deceiving
when used
outside of their
domain of applicability*
*set by training chemicals
“The importance of the domain of applicability in QSAR modeling”
Weaver S, Gleeson MP.
Journal of Molecular Graphics & Modelling. June 2008. Vol. 26, Issue 8.
√ X
FEW
HUNDRED
CHEMICALS
have
degradation data
experimentally measured.
“Estimating biodegradation
half-lives for
use in chemical screening. ”
Aronson D, et al.
Chemosphere. June 2006. Vol.
63, Issue 11
FEW
THOUSAND
CHEMICALS
have
physicochemical properties
experimentally measured.
“Finding and estimating chemical
property data for
environmental assessment.”
Boethling RS, et al.
Environ. Toxicol. Chem. Oct 2004.
Vol. 23, Issue 10.
50,000,000 CHEMICALS
were
registered
until 2009
“A scientific milestone”
Toussant M.
Chemical & Engineering News. 14
Sept 2009. Vol. 87, Issue 3, p3.
chemicals
for which all
properties
and
degradation data
are known
“Quantitative structure fate relationships for multimedia environmental analysis.”
Martínez I.
PhD thesis, Rovira I Virgili University.
“CHEMICAL SPACE”
unknown
properties and
degradation data
known
properties
known
degradation data
IMPORTANT REMARKS
Multimedia Environmental Models
and QSARs
are approximations of nature.
They should be updated as new data
become available.
“Wanted: Big Data for predicting chemical behavior”
Martinez I.
RapidMiner World 2014. Boston, MA USA . August 19, 2014
Using
it is possible to:
1) query chemical databases
2) train QSAR models on the fly
3) assess QSAR domains of applicability
4) select appropriate models
“Wanted: Big Data for predicting chemical behavior”
Martinez I.
RapidMiner World 2014. Boston, MA USA . August 19, 2014
RapidMiner World 2014 | Boston, MA USA | August 19, 2014
AMAKURO Services | www.amakuro.com