evaluation of the chemical inventories in the us fda’s ... · endpoint alerts training set #...

1
Endpoint Alerts training set # Compounds (POS / NEG) # Alerts (PPV range) 9,697 (5,116 / 4,581) 50 (0.65-0.94) 1,928 (849 / 1079) 13 (0.70-0.81) 556 (122 / 434) 7 (0.60-0.75) 1,419 (83 / 1336) 24 (0.03-0.78) 602 (212 / 390) 5 (0.71-0.87) Endpoint Global QSAR training set # Compounds (POS / NEG) PPV / NPV* Ames 2825 (935 / 1,890) 0.72 / 0.78 iviCA 1389 (671 / 718) 0.67 / 0.62 ivvMN 540 (119/421) 0.60 / 0.65 Cleft palate 270 (81 / 189) 0.53 / 0.79 SkinSensHaz 602 (212 / 390) 0.67 / 0.69 Evaluation of the chemical inventories in the US FDA’s Office of Food Additive Safety for human health endpoints using a toxicity prediction system Arvidson K 1 , Rathman J 2,4 , Volarath P 1 , Mostrag A 2 , Tarkhov A 3 , Bienfait B 3 , Vitcheva V 2 , Yang C 2,3,4 1 U.S. FDA CFSAN, College Park, MD 2 Altamira LLC, Columbus OH, USA, 3 Molecular Networks GmbH, Erlangen, Germany, 4 Ohio State University, Columbus OH, USA Abstract #2467 / P533 U.S. FDA CFSAN INVENTORY PREDICTION ACCURACY FOR BACTERIAL MUTAGENESIS MODEL CONCLUSIONS AND FURTHER STEPS Validation set: 115 common InChI keys (computational forms representing 157 CRS- IDs) with experimental data VALIDATION OF BACTERIAL MUTAGENESIS MODEL Validation set: Of the 2372 CRS-IDs, only 157 compounds currently have data (115 InChI keys) in CERES The study source for the validation set was mostly U.S. FDA PAFA “C” studies, which are not used in the QSAR training set. QSAR training set: 33% POS; Validation set: 5% predicted POS (model not biased towards POS) QSAR Training Set Subset of ToxGPS vetted for data quality and balancing structure space 2,825 InChI keys : 1890 non-mutagenic and 935 mutagenic structures U.S. FDA CFSAN inventory o 2,372 CRS-IDs o 2,275 InChI keys 115 ToxGPS Knowledgebase Large collection of public Ames data Over 10,000 test substances 7,722 InChI keys (computational form) Only 20 structures were common between the QSAR training set and the U.S. FDA CFSAN test list. Most of these compounds were non-mutagens. 2 false positives Exp. POS Exp. NEG Total Pred. POS 6 2 8 Pred. NEG 3 145 147 Not Pred. 0 2 2 Quinoline Michael acceptor The Chemical Evaluation and Risk Estimation System (CERES) at the U.S. FDA’s Office of Food Additive Safety (OFAS) implemented the ChemTunes prediction system within the workflows for pre-market reviews and post- market monitoring of food ingredients and packaging materials. Present work demonstrates how post-marketing evaluation based on a profiling analysis of the historical inventories provides opportunities to enhance the current workflow by applying the advanced methods in the ChemTunes knowledgebase within CERES. The OFAS operates under the U. S. FDA’S Center for Food Safety and Applied Nutrition (CFSAN) to ensure the safety of all food additives and ingredients used in the U. S. The office is comprised of three divisions : (1) DPR (Division of Petition Review): Premarket reviews of direct food and color additive petitions (2) DFCN (Division of Food Contact Substance Notification Review): Reviews of food contact substances or indirect food additives (e.g., food packaging) (3) DBGNR (Division of Biotechnology and GRAS Notice Review): Consultations with industry - on bio-engineered food products and Generally Recognized as Safe substances FDA PROGRAMS ABSTRACT CERES CERES is a substance-centric database that houses the OFAS’s food additive data and cheminformatics platform for data analysis (Abstract #2617/P115) and toxicity prediction ChemTunes models are implemented in CERES for: genetic toxicity (Ames); in vitro chromosome aberrations (ivtCA); in vivo micronucleus (ivvMN)), tumorigenicity (mouse and rat); developmental toxicity (cleft palate) and skin sensitization CERES MODELS FOR TOXICITY PREDICTION CERES ChemTunes models are based on ToxGPS knowledgebase of in vivo and in vitro toxicity data compiled from regulatory, primary and literature sources. Predictions are based on chemotype alerts and mode-of-action (MoA) informed QSAR models by applying a quantitative weight of evidence (WOE) method. Prediction accuracies were validated by experimental data. The domains of applicability of models and chemotype alerts were systematically addressed. Structure selection Structure QC Final computational form used for predictions: 2,372 unique CRS-IDs 2,275 InChI keys All structures reviewed by chemists using CORINA CLEAN workflow in Corina Symphony (Molecular Networks) Processing options: remove small fragments, neutralize, generate 3-D structure, flag duplicates for user decision Total structures: 10,884 Removed IOM(s), natural products, mixtures, polymers, and other Ill-defined compounds Predicted endpoint profile of inventories Concordance: 95% Sensitivity: 67% Specificity: 97% INVENTORY PROFILING OF CHEMICAL SPACE – PCA projection PCs projections based on ToxPrint Chemotypes space (www.toxprint.org). High loading ToxPrint Chemotypes include: carboxylic esters; alcohols; sulfhydrides; sulfonates; alkanes: branched, cyclic, oxy-, and linear C2-C16; alkenes: branched; aromatic rings with heteroatoms (PC1); carboxylic acids; alkanes: linear C2-C4 and aromatic; alkenes: linear; aromatic rings: benzene, phenyl (PC2); Michael acceptors; aromatic amines; alkenes: linear and aromatic (styrene); aromatic rings: benzene, phenyl, biphenyl (PC3). Principal Components (PCs) projections based on Corina Symphony physical-chemical properties space. High loading physicochemical properties include: number of atoms, bonds, and rotatable bonds, MW, complexity, McGowan volume, polarizability (PC1), # H-donors, TPSA, XLogP (PC2), and ring complexity (PC3). The advanced methods in the ToxGPS knowledgebase within CERES are used to profile historical inventories. Fewer than 5% the structures in the combined inventory are predicted positive in more than one mutagenicity or clastogenicity endpoint. Prioritize harvesting of CFSAN regulatory submissions to increase availability of toxicity data in CERES IOM: inorganic, organometallic, metal complexes and metals Final counts and overlap Food Ingredient Inventory # REC # Structure EAFUS: Everything Added to Food in the U.S. 3,968 2,443 FCS: Food Contact Substances 1,155 391 FEMA: Flavor and Extract Manufacturer's Association 2,758 1,742 GRAS: Generally Recognized As Safe 572 40 INDIRECT: Indirect Food Additives 3,237 1,790 PAFA: Priority-based Assessment of Food Additives 7,202 4,341 SCOGS: Select Committee of GRAS Substance 373 137 TOTAL 19,265 10,884 Physicochemical properties ToxPrint chemotypes bond:C(=O)O_carboxylicAcidEster_generic bond:C=O_aldehyde_generic bond:CC(=O)C_ketone_generic bond:CN_amine_aromatic_generic bond:CN_amine_generic bond:CN_amine_pri-NH2_aromatic bond:COC_ether_aliphatic bond:COH_alcohol_aromatic_phenol bond:COH_alcohol_generic bond:CS_halide_aliphatic bond:CX_halide_aromatic-X_generic bond:CS_sulfide bond:N=N_azo_generic bond:NC=O_urea_generic bond:S(=O)O_sulfonate chain:alkaneLinear_dodedyl_C12 (>=12) chain:alkaneLinear_hexadecyl_C16 (>=C16) chain:alkaneLinear_octyl_C8 group:carbohydrate ring:hetero_[5]_O_furan_oxolane ring:hetero_[5]_Z_1_3-Z ring:hetero_[6]_N_pyridine_generic ring:hetero_[6]_O_pyran_generic Disclaimer: References to ChemTunes or CORINA Symphony are not endorsement by U.S. FDA. CHEMICAL SPACE COMPARISON OF MODEL AND FDA INVENTORY High specificity indicates accurate classification of negatives Due to low %POS in validation set, the sensitivity cannot be reliably estimated. However, low number of false positives shows the model is not biased towards POS even though %POS in training set was higher (33%) than that of the validation set. Ratio of the % frequency ToxPrint Name QSAR set FDA Inventory

Upload: others

Post on 14-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of the chemical inventories in the US FDA’s ... · Endpoint Alerts training set # Compounds (POS / NEG) # Alerts (PPV range) 9,697 (5,116 / 4,581) 50 (0.65-0.94) 1,928

Endpoint Alerts training set

# Compounds (POS / NEG)

# Alerts (PPV range)

9,697 (5,116 / 4,581) 50 (0.65-0.94)

1,928 (849 / 1079) 13 (0.70-0.81)

556 (122 / 434) 7 (0.60-0.75)

1,419 (83 / 1336) 24 (0.03-0.78)

602 (212 / 390) 5 (0.71-0.87)

Endpoint

Global QSAR training set

# Compounds (POS / NEG)

PPV / NPV*

Ames 2825 (935 / 1,890) 0.72 / 0.78

iviCA 1389 (671 / 718) 0.67 / 0.62

ivvMN 540 (119/421) 0.60 / 0.65

Cleft palate 270 (81 / 189) 0.53 / 0.79

SkinSensHaz 602 (212 / 390) 0.67 / 0.69

Evaluation of the chemical inventories in the US FDA’s Office of Food Additive Safety for human health endpoints using a toxicity prediction system

Arvidson K1, Rathman J2,4, Volarath P1, Mostrag A2, Tarkhov A3, Bienfait B3, Vitcheva V2, Yang C2,3,4 1U.S. FDA CFSAN, College Park, MD 2Altamira LLC, Columbus OH, USA, 3Molecular Networks GmbH, Erlangen, Germany, 4Ohio State University, Columbus OH, USA

Abstract #2467 / P533

U.S. FDA CFSAN INVENTORY

PREDICTION ACCURACY FOR BACTERIAL MUTAGENESIS MODEL

CONCLUSIONS AND FURTHER STEPS

Validation set: 115 common InChI keys (computational forms representing 157 CRS-IDs) with experimental data

VALIDATION OF BACTERIAL MUTAGENESIS MODEL

• Validation set: Of the 2372 CRS-IDs, only 157 compounds currently have data (115 InChI keys) in CERES

• The study source for the validation set was mostly U.S. FDA PAFA “C” studies, which are not used in the QSAR training set.

• QSAR training set: 33% POS; Validation set: 5% predicted POS (model not biased towards POS)

QSAR Training Set • Subset of ToxGPS vetted for data

quality and balancing structure space • 2,825 InChI keys : 1890 non-mutagenic

and 935 mutagenic structures

U.S. FDA CFSAN inventory o 2,372 CRS-IDs o 2,275 InChI keys

115

ToxGPS Knowledgebase • Large collection of public Ames data • Over 10,000 test substances • 7,722 InChI keys (computational form)

Only 20 structures were common between the QSAR training set and the U.S. FDA CFSAN test list. Most of these compounds were non-mutagens.

2 false positives

Exp. POS

Exp. NEG

Total

Pred. POS 6 2 8

Pred. NEG 3 145 147

Not Pred. 0 2 2

Quinoline

Michael acceptor

The Chemical Evaluation and Risk Estimation System (CERES) at the U.S. FDA’s Office of Food Additive Safety (OFAS) implemented the ChemTunes prediction system within the workflows for pre-market reviews and post-market monitoring of food ingredients and packaging materials. Present work demonstrates how post-marketing evaluation based on a profiling analysis of the historical inventories provides opportunities to enhance the current workflow by applying the advanced methods in the ChemTunes knowledgebase within CERES.

The OFAS operates under the U. S. FDA’S Center for Food Safety and Applied Nutrition (CFSAN) to ensure the safety of all food additives and ingredients used in the U. S. The office is comprised of three divisions :

(1) DPR (Division of Petition Review): Premarket reviews of direct food and color additive petitions

(2) DFCN (Division of Food Contact Substance Notification Review): Reviews of food contact substances or indirect food additives (e.g., food packaging)

(3) DBGNR (Division of Biotechnology and GRAS Notice Review): Consultations with industry - on bio-engineered food products and Generally Recognized as Safe substances

FDA PROGRAMS

ABSTRACT

CERES • CERES is a substance-centric database that

houses the OFAS’s food additive data and cheminformatics platform for data analysis (Abstract #2617/P115) and toxicity prediction

• ChemTunes models are implemented in CERES for: genetic toxicity (Ames); in vitro chromosome aberrations (ivtCA); in vivo micronucleus (ivvMN)), tumorigenicity (mouse and rat); developmental toxicity (cleft palate) and skin sensitization

CERES MODELS FOR TOXICITY PREDICTION CERES ChemTunes models are based on ToxGPS knowledgebase of in vivo and in vitro toxicity data compiled from regulatory, primary and literature sources. Predictions are based on chemotype alerts and mode-of-action (MoA) informed QSAR models by applying a quantitative weight of evidence (WOE) method. Prediction accuracies were validated by experimental data. The domains of applicability of models and chemotype alerts were systematically addressed.

Structure selection Structure QC

Final computational form used for predictions: • 2,372 unique CRS-IDs • 2,275 InChI keys

• All structures reviewed by

chemists using CORINA CLEAN workflow in Corina Symphony (Molecular Networks)

• Processing options: remove small fragments, neutralize, generate 3-D structure, flag duplicates for user decision

• Total structures: 10,884 • Removed IOM(s),

natural products, mixtures, polymers, and other Ill-defined compounds

Predicted endpoint profile of inventories

• Concordance: 95% • Sensitivity: 67% • Specificity: 97%

INVENTORY PROFILING OF CHEMICAL SPACE – PCA projection

PCs projections based on ToxPrint Chemotypes space (www.toxprint.org). High loading ToxPrint Chemotypes include: carboxylic esters; alcohols; sulfhydrides; sulfonates; alkanes: branched, cyclic, oxy-, and linear C2-C16; alkenes: branched; aromatic rings with heteroatoms (PC1); carboxylic acids; alkanes: linear C2-C4 and aromatic; alkenes: linear; aromatic rings: benzene, phenyl (PC2); Michael acceptors; aromatic amines; alkenes: linear and aromatic (styrene); aromatic rings: benzene, phenyl, biphenyl (PC3).

Principal Components (PCs) projections based on Corina Symphony physical-chemical properties space. High loading physicochemical properties include: number of atoms, bonds, and rotatable bonds, MW, complexity, McGowan volume, polarizability (PC1), # H-donors, TPSA, XLogP (PC2), and ring complexity (PC3).

• The advanced methods in the ToxGPS knowledgebase within CERES are used to profile historical inventories.

• Fewer than 5% the structures in the combined inventory are predicted positive in more than one mutagenicity or clastogenicity endpoint.

• Prioritize harvesting of CFSAN regulatory submissions to increase availability of toxicity data in CERES

IOM: inorganic, organometallic, metal complexes and metals

Final counts and overlap Food Ingredient Inventory # REC # Structure

EAFUS: Everything Added to Food in the U.S. 3,968 2,443

FCS: Food Contact Substances 1,155 391

FEMA: Flavor and Extract Manufacturer's

Association2,758 1,742

GRAS: Generally Recognized As Safe 572 40

INDIRECT: Indirect Food Additives 3,237 1,790

PAFA: Priority-based Assessment of Food

Additives7,202 4,341

SCOGS: Select Committee of GRAS Substance 373 137

TOTAL 19,265 10,884

Physicochemical properties ToxPrint chemotypes

bond:C(=O)O_carboxylicAcidEster_generic bond:C=O_aldehyde_generic

bond:CC(=O)C_ketone_generic bond:CN_amine_aromatic_generic

bond:CN_amine_generic bond:CN_amine_pri-NH2_aromatic

bond:COC_ether_aliphatic bond:COH_alcohol_aromatic_phenol

bond:COH_alcohol_generic bond:CS_halide_aliphatic

bond:CX_halide_aromatic-X_generic bond:CS_sulfide

bond:N=N_azo_generic bond:NC=O_urea_generic

bond:S(=O)O_sulfonate chain:alkaneLinear_dodedyl_C12 (>=12)

chain:alkaneLinear_hexadecyl_C16 (>=C16)

chain:alkaneLinear_octyl_C8 group:carbohydrate

ring:hetero_[5]_O_furan_oxolane ring:hetero_[5]_Z_1_3-Z

ring:hetero_[6]_N_pyridine_generic ring:hetero_[6]_O_pyran_generic

Disclaimer: References to ChemTunes or CORINA

Symphony are not endorsement by U.S. FDA.

CHEMICAL SPACE COMPARISON OF MODEL AND FDA INVENTORY

• High specificity indicates accurate classification of negatives

• Due to low %POS in validation set, the sensitivity cannot be reliably estimated. However, low number of false positives shows the model is not biased towards POS even though %POS in training set was higher (33%) than that of the validation set.

Ratio of the % frequency

ToxPrint Name QSAR set FDA

Inventory