chapter open source tools for read across and category - ambit
TRANSCRIPT
1
AMBIT AMBIT is open source software for chemoinformatics data management developed with funding
industry via CEFIC LRI funded project. AMBIT2 software consists of a database and functional
modules allowing a variety of queries and data mining of the information, stored in the database and
is distributed under xxx licence. AMBIT XT is a user friendly application with a graphical user
interface, based on AMBIT2 modules and is distributed under LGPL licence. AMBIT XT provides a set
of functionalities to facilitate evaluation and registration of the chemicals for REACH. AMBIT XT
introduces the concept of workflows, allowing guiding the users step by step towards achieving
particular goal, and provides workflows for analogue identification and PBT assessment. The
software is a standalone application, with an option to install the database on a server.
Modules
Ambit is organised in several modules with well defined dependency.
Table Error! No text of specified style in document..1
Database
AMBIT database is a relational database, consisting of several repositories for compounds,
properties, QSAR models, users, references, as well as several tables containing pre-processed
information which allows speeding up the substructure and similarity queries. The current
implementation is based on MySQL. Database functionality is provided by ambit2-db module.
Table Error! No text of specified style in document..2
Chemical compounds
The chemical compounds are stored in the table chemicals and assigned an unique number. If
connectivity is available, an unique SMILES, as well as InChI and molecular formula is generated and
stored. The database supports multiple 3D structures per compound, either coming from different
inventories, or generated by external programs and imported into the database. The chemical
structures are stored into table structure as a compressed text, where supported formats are SDF,
MOL and CML. The choice of text format makes the database transparent and easy to be used by
external software. Support of multiple formats is motivated by the need to keep the data in the
original format. If the original format is not one of the above formats, it is converted to MOL.
Support of internal formats will be extended in future releases.
2
Data provenance
The database provides means to identify the origin of the data, i.e., the specific inventory a
compound originated from. An inventory is identified by its name and reference (tables src_dataset).
Each compound might belong to multiple inventories (table struc_dataset), thus allowing users to
select the compounds of interest for specific regulatory purposes. Moreover, the data provenance
indicator can distinguish between different conformations, for example in cases where a particular
conformation of a compound comes from one inventory and a different conformation comes from
another inventory.
Updates of the chemical structures are recorded and subsequent versions are stored in the history
table. While importing structures from a file, they are stored in its original format into the structure
table. If the structure is subsequently updated as a result of a specific calculation (e.g. 3D
conversion) or another structure import step (e.g. updated version of the original file), the new
version will be stored and become currently available, while the previous version will be moved to
the history table.
Quality Assurance
The discrepancy between structures, available in chemical databases presents a challenge for AMBIT
as a data integration platform. In order to raise the awareness of possible incorrect structures that
might be imported from external sources, AMBIT allows assigning quality labels to each chemical. ,
as follows:
Manual verification by expert(s). Any user can assign quality labels and explain the
reason of the assignment (table quality_structure). The reasons can include
discrepancies between registry numbers, names and structure, expert knowledge,
manual comparison with external sources, etc.
o 'OK' – The structure is correct
o 'ProbablyOK' – Most probably the structure is correct, but some issues still need
to be verified.
o 'Unknown' – Not possible to assign a definite label.
o 'ProbablyERROR' – Most probably there is an error
o 'ERROR' – The structure is definitely wrong.
Automatically verified, by comparing the structures available under the same chemical
compound entry (e.g. imported from different sources) – table quality_chemicals
o ‘Consensus' – all structures under the same chemical compound entry are the
same
o 'Majority' – Majority of structures under the same chemical compound entry are
the same, but there are small number of structures, which differ from the
majority
o 'Ambiguous' – There is no majority of equal structures under the same chemical
compound entry (e.g. structures come from 3 different sources and all the three
structures are different)
3
o 'Unconfirmed' – The structure comes from a single source and it is impossible to
make a comparison.
o 'Unknown' – No information about the structure (e.g. no connectivity)
Potential examples of QA in Ambit
Automatic comparison with different sources of chemical structures may reveal discrepancies between, as illustrated below. The first one is the chemical with CAS 55-55-0. The structure provided in the set is incomplete.
In the second example the structure with CAS 39236-46-9 provided in tone of the datasets has erroneous structure, ethyl substitution on the wrong nitrogen in a ring.
Identifiers, Descriptors and Properties
4
The database schema is designed to provide unified storage for arbitrary number of text (e.g.
registry numbers or names), and numerical properties (e.g. descriptors, experimental data). The
properties are not predefined, but stored in the database on demand, e.g. AMBIT database is ready
to incorporate any number of chemical compounds, identifiers, descriptors and experimental data.
A property (table properties) is identified by a name and reference, thus allowing properties with
coinciding names, but originating from different sources to be distinguished (e.g. LogP calculated
internally by different methods and LogP imported from an external file). Every newly added
property or descriptor is added to a properties table, with information about the property/descriptor
name, units, alias and reference. The reference for a property, imported from a file is the name of
the file itself, while the reference for a descriptor contains the name of the software used for
calculation. The alias usually contains a copy of the name, except in cases, when the property is
recognised as a specific type of registry number or a chemical name. In this case, the alias is assigned
a fixed value (e.g. CasRN or Names).
Fields with the same meaning, but different names can be assigned the same alias, in order to
facilitate queries (e.g. species field same across all endpoints in order to be able to search for
species).
The flat list of properties provides a flexible storage, but presenting a long list of properties and
descriptors in the user interface might be confusing. Templates (tables template and template_def)
allows to organize properties in groups: Table Error! No text of specified style in document..3
Templates themselves can be organized hierarchically, with the help of table dictionary. The
database is distributed with a set of default templates, including top level templates Endpoints,
Identifiers, Datasets and Descriptors and a number of endpoints, according to ECHA endpoints
classification1. Convenience view ontology, combines the templates with its hierarchical
organisation. An excerpt of this view is shown below: Table Error! No text of specified style in
document..4
The user interface navigator allows viewing and organizing properties and templates in convenient mode.
5
By default, properties imported from a file with chemical compounds belong to the dataset of origin,
but can be moved to any user selected group.
Quality labels can also be assigned to any property value, stored in the database (table
quality_labels):
'OK' – The value assigned to property is correct
6
'ProbablyOK' – Most probably the value is correct, but some issues still need to be
verified.
'Unknown' – Not possible to assign a definite label.
'ProbablyERROR' – Most probably there is an error
'ERROR' – The value assigned to this property is definitely wrong.
Queries
The results of the searches a user performs are stored into query and queryresults tables. Besides
providing ability to record user actions, this enables browsing query results at a later moment and
combining queries with arbitrary logic.
Search methods
Exact structure, fixed sub-structures, similarity, SMARTS
The core substructure search functionality (graph isomorphism) is provided either by the
CDK cheminformatics library or by a faster algorithm, implemented in AMBIT. Substructure
search is an computationaly intensive (NP1-hard) problem, which means that the complexity
of the algorithm increases rapidly with the size of the molecule. To speed-up substructure
searching in large datasets, a pre-calculated fingerprints are used to identify structures,
potentially containing the substructure. The AMBIT database and software combines this
technique with fast relational database queries, which results in very fast substructure
searching in large datasets. In addition, fingerprints are a standard tool for representing
chemical structures to assess structural similarity by calculating Tanimoto coefficient
between two fingerprints.
Similarity
Fingerprint generation was based on the fingerprint implementation by open source
cheminformatics library, The Chemistry Development Kit and follows the ideas of Daylight
fingerprint theory that states2: (1) for a given molecule all possible paths for a predefined length
(default is 7) are generated, (2) the path is submitted to a hash function which uses it as a seed to a
pseudo-random generator, (3) the hash function outputs a set of bits, and (4) the set of bits thus
produced is added (with a logical OR) to the fingerprint. Ambit uses 1024 bit fingerprints by default.
The Tanimoto coefficient is calculated as Tanimoto NA∩NB/(NA+NB-NA∩NB), where NA is the number
of bits ‘‘on’’ in fingerprint A, NB is the number of bits ‘‘on’’ in fingerprint B and NA\NB is the number
of bits ‘‘on’’ in both fingerprints. Since Tanimoto distance is a pair-wise measure, and here the
objective is to assess the similarity to the set of molecules, we generate a consensus fingerprint,
which is again 1024 bit fingerprint where each bit is set.
1 non-deterministic polynomial-time
7
Atom environments (AE) can be regarded as fragments3,4, surrounding each atom in a molecule, up
to a predefined level. The calculation procedure is as follows. First, atom types to be included in the
generation of AEs are selected. We use 34 atom types, listed in table 2, which are very similar to
Sybyl atom types that have been recommended in Bender et al. The choice is based on the available
atom type parameterization in CDK library. Next, a vector of length (34 *L+1) is constructed for each
atom, where L is the maximum level for generating atom environments and L=3 by default. Third, for
each atom, neighbors at level 1, 2, 3 are identified and corresponding counts stored in the vector. An
example of a string representation of the result for a single atom (C.sp2) is:
C.sp2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
Note that if there are several C.sp2 atoms with the same neighbours up to 3rd level in the molecule,
they will have the same string representation. We will refer to this representation as a ‘‘fragment’’.
AEs could be compared by Tanimoto distance (see above), where NA is the number of fragments in
molecule A, NB is the number of fragments in molecule B and NA∩NB is the number of common
fragments between the two molecules. Here, we take average Tanimoto distance for the nearest
neighbours, instead of defining consensus fingerprint. For each molecule, the similarity measure is
the averaged Tanimoto distance between the molecule and its 5 nearest molecules.
Table Error! No text of specified style in document..5
Substructure search
The implementation of sub-structure searching allows only fixed structures (e.g. no wild cards for
atoms and bonds). The structure is drawn using structure diagram editor (JChemPaint 5) and
submitted as a hydrogen depleted structure, which might presents difficulties in distinguishing
certain types of functional groups (e.g. aldehydes vs. carbonyls, amines vs. nitro groups).
AMBIT also allows querying the database by Smiles ARbitrary Target Specification (SMARTS)
language6, accelerated with certain pre-processed information for structural features, stored in the
database. The SMARTS specification was originally developed and maintained by Daylight Inc.7, but is
supported by many more commercial and open source software suites8, 9, 10. A list of predefined
functional groups and their SMARTS definition is available but formulating more complex queries
requires knowledge of SMARTS. The SMARTS line notation allows extremely precise and
transparent substructural specification and atom typing. SMARTS expressions for atoms and
bonds can be combined by logical operators to form more complex queries. Recursive
SMARTS allow detailed specification of an atom's environment. For example the more
reactive (with respect to electrophilic aromatic substitution) ortho and para carbon atoms of
phenol can be defined as: [$(c1c([OH])cccc1),$(c1ccc([OH])cc1)]; Atoms that are in an
environment where (the atom is connected to an aliphatic oxygen) and where (the atom is
connected to two sequential aliphatic carbons) as [$(*O);$(*CC)].
8
Various queries and their combinations on properties, inventories and quality labels are available. A
query can be restricted to search for compounds within specified dataset or another query.
3D structure generation
AMBIT module ambit2-smi23d integrates the open source 3D coordinate generation smi23d for
generation of an initial 3D structure from connectivity matrix11. The initial structure is further
optimized by OpenMOPAC 7.112, which is embedded into ambit2-mopac module.
Molecular descriptors
Ambit provides facilities to calculate and store descriptors for all chemical structures in the database, as well as specification of search criteria based on descriptor values. The CDK library based descriptors 13 are shown in Table Error! No text of specified style in document..6. Descriptors, implemented in ambit descriptors module are listed in Table Error! No text of specified style in document..7. [Table Error! No text of specified style in document..6] [Table Error! No text of specified style in document..7.]
Workflow engine
9
A workflow engine is a software application, which manages and executes modelled business
processes. In general, the models can be edited by non-programmers, using workflow editors. The
workflow models might be as simple as series of sequential steps, but also be complex, including
many conditions and loops. The Workflow Management Coalition provides standards for defining
workflows in a XML based format14.
Following the recognized importance of support for workflows in AMBIT, number of existing open
source workflow engines were evaluated for suitability to be embedded into AMBIT XT application.
The final decision of embedding micro-workflow is based on a trade-off between simplicity and
available functionalities15. AMBIT XT is entirely based on micro-workflow engine, providing
extensible platform for workflow based wizards and facilitating recording of user actions.
Workflow for analogue identification
The workflow will consist of the following steps
1. Definition of the starting structure or set of structures. The structure(s) can be defined as:
Identifiers (e.g. CAS, EINECS number, name).
Structure, represented as SMILES, MOL, SDF, drawn manually by the structure diagram
editor, available in AMBIT or drawn using externally installed ISIS/Draw software, copied
to the system clipboard and then pasted into AMBIT user interface.
2. Basic analogue search, consists of a similarity search (hashed fingerprints compared by
Tanimoto distance by default).
3. The results are displayed in the Structure browser. The user can decide to restrict the
forthcoming queries within the set of selected structures.
4. Substructure search by user-defined fragment.
5. The results are displayed in the Structure browser. The user can decide to restrict
forthcoming queries within the set of selected structures.
6. Further filtering of the results by conducting additional compound profiling based on
experimental and calculated data (LogKow, Dmax, other 2D and 3D descriptors chosen by
the user).
7. The results are displayed in the Structure browser. The user can decide to restrict the
forthcoming queries within the set of selected structures.
8. The selected structures are grouped into typical chemical classes or by clustering, allowing
the user to inspect small groups of analogues and derive the final decision of the query
compound(s).
9. The system proposes to calculate the final value by average, min-max, Euclidean distance to
user selected properties
Workflow for REACH PBT and vPvB (Persistence, Bioaccumulation and Toxicity) assessment
10
REACH requires for every substance to be registered and not exempted a PBT & vPvB
Assessment if the tonnage exceeds 10 tons/year. The REACH PBT & vPvB Assessment allows
a straightforward, user friendly and quick assessment if the necessary information is
available. An important goal is to rapidly identify those REACH substances which are not PBT
& vPvB. In addition those substances identified as a potential PBT or vPvB can immediately
investigated in a higher tier assessment to find out what is necessary as a next step. Such
higher tier assessments are very often time consuming and costly and it has to be avoided
that the strict registration deadlines cannot be met due to an ongoing PBT assessment. As
the assessment is done transparently and always the same way it will allow a standardized
PBT & vPvB Assessement throughout the company, independent from personal judgments
of an assessor. Printing the result sheet e.g. as a PDF file allows proper documentation of
the PBT & vPvB Assessment
Only organic substances can be assessed. This workflow should not be applied to inorganic
or organometalic substances, polymers and mixtures. PBT assessment is visually organized
in five pages: definition of the substance, persistency check, bioaccumulation check, toxicity check
and presenting the final results.
Population of AMBIT database with data
The following datasets are imported and distributed with AMBIT database:
1. EINECS list.
2. Bioconcentration factor dataset 16
3. ECETOC Aquatic Toxicity data 17
4. Local Lymph Node Assay (LLNA) data 18
5. ECETOC Skin irritation data 19
The data is imported using the standard data import functionality. The EINECS list is publicly
available at ECB site and consists of 100204 chemicals20. Extensive verification of EINECS structures,
in order to improve their reliability, based on comparison of structures with matching registry
numbers and available from public sources. Quality labels has been assigned, as explained above.
Bioconcentration factor dataset is distributed without structural information and chemical
compounds are identified only by CAS numbers and chemical names. Structures has been retrieved
from publicly available sources and imported into database. Datasets 3-5 consist of relatively small
number of compounds and presumably contain high quality structures, manually checked by experts
before making them publicly available.
WWW- REST services
Web services, allowing to use AMBIT functionality from web applications are under development.
Similarity example:
11
--
Acknowledgements: AMBIT software was developed within the framework of CEFIC LRI project EEM-
9 “Building blocks for a future (Q)SAR decision support system: databases, applicability domain,
similarity assessment and structure conversions” and extended under subsequent CEFIC LRI contract
for developing AmbitXT.
12
13
14
Table Error! No text of specified style in document..1 Modules
Module Description
AmbitXT GUI application
AmbitXT plugin: Database search
and Analogue identification
Ambit XT plugin , allowing various database queries
and analogues identification.
AmbitXT plugin: Category building Ambit XT plugin for analogues identification
AmbitXT plugin: Database tools Ambit XT plugin for database import and
management
AmbitXT plugin: Database
administration
AmbitXT plugin for database administration activities
AmbitXT plugin: REACH PBT
assessment
AmbitXT plugin , implementing an workflow for
REACH compliant Persistence , Biodegradation and
Toxicity (PBT) assessment.
ambit2-base Base classes, without cheminformatics functionality
ambit2-core Core classes, with cheminformatics functionality
ambit2-hashcode Hashcodes
ambit2-smarts SMARTS parser
ambit2-db Database functionality
ambit2-smi23d Wrapper for Smi23d executables
http://www.chembiogrid.org/cheminfo/smi23d/
ambit2-mopac wrapper for OpenMopac
ambit2-ui User interface
ambit2-dbui Database user interface
ambit2-workflow Workflow module
ambit2-namestructure Chemical name to structure convertor, based on
OPSIN package
http://sourceforge.net/projects/oscar3-chem/files/
ambit2-model Similarity calculation, feature selection and QSAR
model development
ambit2-taglibs JSP tags
Pubchem utilities Pubchem access utilities
Ambit2 REST web services Allows to query AMBIT database by REST style web
services.
15
Table Error! No text of specified style in document..2 Tables in AMBIT2 database.
Table Description
Chemical structures
chemicals Chemical compounds
structure Chemical structures, conformers
history Previous versions of chemical structures
Inventories
src_dataset Datasets
struc_dataset Lookup table for structures, belonging to a
dataset
Identifiers, Descriptors, Properties
catalog_references References
properties Property definition (name,reference, units)
property_values Numerical property values or links to string
values
property_string String values
property_tuples Tuples of properties
tuples Tuples per dataset
template Templates
template_def Template definition (which properties belng to
a template)
dictionary Templates hierarchy
Queries
query Queries
query_results Structures per query
sessions Sessions
Users support
user_roles Roles, assigned to users
roles User roles
users Users
Quality assessment support
quality_chemicals Quality labels of structures and properties
quality_labels
quality_pair
quality_structure
Pre-processed data for substructure, similarity and SMARTS queries
fp1024 Pre-processed fingerprints for pre-screening
and similarity search fp1024_struc
sk1024 Pre-processed fragments for accelerating
SMARTS searches
16
atom_distance Pre-processed data for atom environments
similarity atom_structure
Schema version
version Database version
17
Table Error! No text of specified style in document..3 Templates
Template Relationship Parent template
Endpoints
Top level templates
Identifiers
Dataset
Descriptors
Other is_a Endpoint
Ecotoxic effects is_a Endpoint
Toxicokinetics is_a Endpoint
Environmental fate parameters is_a Endpoint
Human health effects is_a Endpoint
Physicochemical effects is_a Endpoint
Short-term toxicity to algae (inhibition of the
exponential growth rate)
is_a Ecotoxic effects
Toxicity to birds is_a Ecotoxic effects
Direct photolysis is_a Environmental fate
parameters
Oxidation is_a Environmental fate
parameters
BAF fish is_a Bioaccumulation
BAF other organisms is_a Bioaccumulation
BCF fish is_a Bioconcentration
BCF other organisms is_a Bioconcentration
CAS number is_a Identifier
RSCBook_Skinsens_dataset.sdf is_a Dataset
org.openscience.cdk.qsar.descriptors.molecular.HBon
dAcceptorCountDescriptor
is_a Descriptor
org.openscience.cdk.qsar.descriptors.molecular.HBon
dDonorCountDescriptor
is_a Descriptor
Verhaar scheme is_a Descriptor
Table Error! No text of specified style in document..4 An excerpt view of ontology
Template Relationship Parent template
Endpoints
Top level templates
Identifiers
Dataset
Descriptors
Other is_a Endpoint
Ecotoxic effects is_a Endpoint
Toxicokinetics is_a Endpoint
Environmental fate parameters is_a Endpoint
18
Human health effects is_a Endpoint
Physicochemical effects is_a Endpoint
Short-term toxicity to algae (inhibition of the
exponential growth rate)
is_a Ecotoxic effects
Toxicity to birds is_a Ecotoxic effects
Direct photolysis is_a Environmental fate
parameters
Oxidation is_a Environmental fate
parameters
BAF fish is_a Bioaccumulation
BAF other organisms is_a Bioaccumulation
BCF fish is_a Bioconcentration
BCF other organisms is_a Bioconcentration
CAS number is_a Identifier
RSCBook_Skinsens_dataset.sdf is_a Dataset
org.openscience.cdk.qsar.descriptors.molecular.
HBondAcceptorCountDescriptor
is_a Descriptor
org.openscience.cdk.qsar.descriptors.molecular.
HBondDonorCountDescriptor
is_a Descriptor
Verhaar scheme is_a Descriptor
Table Error! No text of specified style in document..5 types used to generate atom environments
H C.default N.sp2 P3 F I
Hplus Cplus.sp2 Nplus P4 F- I-
Hminus Cminus.sp2 Nplus.sp3 S2 Cl Misc
C.sp3 Caromatic.sp2 O.sp2 S2- Cl-
C.sp2 Cminus Oplus S4 Br
C.sp N Ominus S Br-
19
Table Error! No text of specified style in document..6 The CDK library based descriptors
ALogP and Molar Refractivity Largest Chain
Atomic Polarizabilities Largest Pi System
Amino Acids Count Largest Aliphatic Chain
Aromatic Atoms Count Moments of Inertia
Aromatic Bonds Count Petitjean Number
Atom count Petitjean Shape Indices
BCUT Rotatable Bonds Count
Bond Polarizabilities Lipinski's Rule of Five
Bond Count Topological Polar Surface Area
Charged Partial Surface Area Vertex adjacency information
magnitude
Gravitational Index WHIM
Hydrogen Bond Acceptors Wiener Numbers
Hydrogen Bond Donors XLogP
Kier and Hall kappa molecular shape
indices
Zagreb Index
20
Table Error! No text of specified style in document..7 AMBIT Descriptors
Common functional groups ToxTree classification schemes 21:
pKa 22, Cramer rules
Molecule Size (3D) , Extended Cramer rules
Molecular weight Verhaar scheme
Electronic descriptors, calculated by
OpenMopac
Eye irritation rules
EHOMO Skin irritation rules
ELUMO Benigni/Bossa rulebase for
mutagenicity and
carcinogenicity
TOTAL ENERGY Structural rules for Michael
acceptors
FINAL HEAT OF FORMATION Structure Alerts for the in vivo
micronucleus assay in rodents
IONIZATION POTENTIAL
ELECTRONIC ENERGY
CORE-CORE REPULSION
MOLECULAR WEIGHT
21
References 1http://guidance.echa.europa.eu/docs/guidance_document/information_requirements_r6_en.
pdf?%20vers=20_08_08 , accessed on June-13 2009.
2 http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
3 L. Xing, R.C. Glen. J. Chem. Inf. Comput. Sci. 42, 2002, p 796
4 A. Bender, H.Y. Mussa, R.C. Glen, S. Reiling. J. Chem. Inf. Comput. Sci. 44, 2004, p 170.
5 http://sourceforge.net/apps/mediawiki/cdk/index.php?title=JChemPaint , accessed on June-
13 2009.
6 Daylight SMARTS theory. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html,
accessed on June-8 2008.
7 Daylight Inc. http://www.daylight.com/, accessed on June-8 2008.
8 Open Babel SMARTS implementation. http://openbabel.sourceforge.net/wiki/SMARTS ,
accessed on June-8 2008.
9 The Chemical Development Library SMARTS implementation. http://cdk.sourceforge.net/,
accessed on June-8 2008.
10 JOELIB http://www-ra.informatik.uni-tuebingen.de/software/joelib/, accessed on June 8th
2008.
11 smi23d - 3D Coordinate Generation, http://www.chembiogrid.org/cheminfo/smi23d,
accessed on June-8 2008.
12 OpenMopac 7.1 http://www.openmopac.net/ , accessed on June-8, 2008.
13 A subset of descriptors, listed at http://qsar.sourceforge.net/dicts/qsar-
descriptors/index.xhtml , accessed on July-13 2009.
14 http://www.wfmc.org/standards/docs.htm , accessed on June-8 2008.
15 http://sourceforge.net/projects/micro-workflow/ , accessed on June-14 2009.
16 Bioconcentration factor (BCF) Gold Standard Database
http://www.euras.be/eng/project.asp?ProjectId=92, accessed on June-8 2008.
17 ECETOC Aquatic Toxicity (EAT) Database. Supplement to ECETOC., Aquatic Hazard
Assessment II. Technical Report No. 91, European Centre for Ecotoxicology and Toxicology of
Chemicals, Brussels, 2003.
22
18 Gerberick GF, Ryan CA, Kern PS, Schlatter H, Dearman RJ, Kimber I, Patlewicz G, Basketter DA,
Compilation of historical local lymph node assay data for the evaluation of skin sensitization
alternatives. Dermatitis 16(4), 2005, pp 157-202.
19 ECETOC Technical Report No. 66 Skin irritation and corrosion Reference Chemicals data base,
1995.
20 http://ecb.jrc.it/qsar/information-sources/, accessed on June-8 2008.
21 http://toxtree.sourceforge.net , accessed on July-13 2009.
22 Adam C. Lee, Jing-yu Yu, and Gordon M. Crippen, pKa Prediction of Monoprotic Small
Molecules the SMARTS Way, J. Chem. Inf. Model.48(10), 2008, pp 2042–2053.