assessing chronic liver toxicity based on relative gene expression data

ARTICLE IN PRESS

Journal of Theoretical Biology 254 (2008) 308– 318

Contents lists available at ScienceDirect

Journal of Theoretical Biology

0022-51

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/yjtbi

Assessing chronic liver toxicity based on relative gene expression data

Kedar Kulkarni a, Peter Larsen b, Andreas A. Linninger a,�

a Laboratory for Product and Process Design, Department of Chemical and Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USAb Core Genomics Laboratory, University of Illinois at Chicago, Chicago, IL 60607, USA

a r t i c l e i n f o

Article history:

Received 12 December 2007

Received in revised form

16 May 2008

Accepted 19 May 2008Available online 25 July 2008

Keywords:

Toxicology

Gene interaction networks

Micro-array analysis

Problem inversion

93/$ - see front matter & 2008 Elsevier Ltd. A

016/j.jtbi.2008.05.032

esponding author. Tel.: +1312 413 7743; fax:

ail address: [email protected] (A.A. Linninger)

a b s t r a c t

The risk associated with exposure to hepatotoxic drugs is difficult to quantify. Animal experiments to

assess their chronic toxicological impact are time consuming. New quantitative approaches to correlate

gene expression changes caused by drug exposure to chronic toxicity are required. This article proposes

a mathematical model entitled Toxicologic Prediction Network (TPN) to assess chronic hepatotoxicity based

on subchronic hepatic gene expression data in rats. A directed graph accounts for the interactions

between the drugs, differentially expressed genes and chronic hepatotoxicity. A knowledge-based

mathematical model estimates phenotypical exposure risk such as toxic hepatopathy, diffuse fatty

change and hepatocellular adenoma for rats. The network’s edges encoding the interaction strength are

determined by solving an inversion problem that minimizes the difference between the observed and the

predicted relative gene expressions as well as the chronic toxicity data. A realistic case study

demonstrates how chronic health risk of three halogenated aromatic hydrocarbons can be inferred from

subchronic gene expression data. The advantages of the TPN are further demonstrated through two

novel applications: Estimation of toxicological impact of new drugs and drug mixtures as well as

rigorous determination of the optimal drug formulation to achieve maximum potency with minimum

side-effects. Prediction of animal toxicity may be relevant for assessing risk for humans in the future.

& 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Many therapeutic drugs for the treatment of severe illnesseshave strong side-effects. Often only qualitative relationshipsbetween the drugs and the harmful effects are known. Experi-ments demonstrating the risk of chronic exposure to these drugsrequire several years for tumors or hepatotoxic effects to becomeevident. Although long-term animal testing could provide in-formation about toxicity risk, this procedure for each and everydrug is intractable due to time, money and animal welfareconstraints. Moreover, chronic experimental toxicity studies yieldstatistics of animal responses establishing dose–toxicity relation-ships only after several years. On the other hand, genome-scalemicro-array analysis can detect the toxicological impact on thegene expression of living cells even after short exposure.Differential gene expression can already be detected after severalweeks of exposure. The alteration of relative gene expressionvisible in subchronic experiments precedes chronic toxicologicaleffects of long-term drug exposure. This work proposes to relatesubchronic relative gene expression to chronic hepatotoxicity.

ll rights reserved.

+312 413 7803.

.

Many factors make modeling and prediction of toxicity a verychallenging problem. On the one hand, experimental andbiological data are subject to assay variability and individualuncertainty in biological responses to toxicity. On the other hand,information on gene interactions and signaling pathways is oftenincomplete thus making gene expression information difficult toincorporate consistently. Several approaches have been proposedto predict the toxicological effects of drugs. Holistic methods aim atidentifying all relevant components in a toxicologically significantsystem. For example, genome-scale metabolic flux balanceanalysis can be used to analyze, interpret and predict cellularphenotypes from differential gene expression in response to drugexposure (Varma and Palsson, 1994a; Gombert and Nielsen, 2000;Covert et al., 2001). Despite successful reconstruction of entiregenome-scale metabolic networks for simple systems likebacteria, fungi, yeast and animal cells, these methods requiredetermination of a massive number of variables (Edwards andPalsson, 2000; Schilling et al., 2002; Forster et al., 2003; Oliveiraet al., 2005; Sheikh et al., 2005). Reductionist methods relate toxicagents directly to their biological consequences without specifi-cally determining all causal agents. Quantitative Structure– Activity

Relationships (QSAR) identify statistically significant relationshipsbetween molecular structure and biological effects, but cannotexplain gene interactions and signaling pathways (Simmon-Hettich et al., 2006; White et al., 2003; Katritzsky and Tatham,

www.sciencedirect.com/science/journal/yjtbi

www.elsevier.com/locate/yjtbi

dx.doi.org/10.1016/j.jtbi.2008.05.032

mailto:[email protected]

ARTICLE IN PRESS

Nomenclature

B chronic toxicological effect (biological outcome)B̄ set of all chronic toxicological effects (biological

outcomes)B̄

dset of desired outcomes

B̄u

set of side-effects (undesired effects)d number of drug dosages used for inversiond̄ set containing numbers of drug dosages used for

inversionD equivalent drug dosageD̄ set of all equivalent drug dosagesm number of drugs in the optimal drug formulationn number of genes in the regulatory network for

optimal drug formulationp number of biological outcomes in optimal drug

formulationR gene expression change relative to the control state

R̄ set of relative gene expression changes for all genes inthe network

Greek symbols:

a biological risk parameterb drug potenciesv gene–gene interaction strengthsr maximum allowable risk for the undesired outcomeo pre-determined weights for chronic toxicity risk in

optimal drug formulationF function used to quantify gene up- or down-regula-

tion risk

Superscripts:

exp experimental data

K. Kulkarni et al. / Journal of Theoretical Biology 254 (2008) 308–318 309

2001; Barratt and Rodford, 2001; Visco et al., 2002; Weis et al.,2005). QSAR can also predict biological consequences of pre-viously uncharacterized chemical structures. However, QSAR ishard to apply for very complex and large molecules.

This article proposes a rigorous mathematical programmingapproach to correlate chronic toxicity of chemical substances tosubchronic relative gene expression changes caused by exposure.The proposed methodology has the following features:

(i)
quantification of toxicity risk as a function of a single drugdosage or a cocktail of several active substances administeredorally to rats;
(ii)
assessment of likely toxicity risk of new drugs with similarmode of action represented in identical gene expression profiles;
(iii)
optimal drug formulation of drug mixtures to minimizeundesirable effects.
The paper is organized as follows. Section 2 introduces the novel
methodology of Toxicologic Prediction Networks (TPN). It also explainshow the parameters of this knowledge-based model are obtained asa solution to a rigorous inversion problem. Section 3 demonstrates asuccessful application of TPN to predict chronic toxicity risks ofhepatotoxic halogenated aromatic hydrocarbons (HAHs) in rats.Section 4 presents techniques to estimate the likely toxicologicalimpact of a new drug or drug mixtures. Based on the trained TPN, anovel optimal drug mixing problem is formulated to determine theoptimal composition of a drug cocktail with minimum side-effects,while maximizing drug potency. Section 5 closes the paper withconclusions and future work.
Fig. 1. Schematic representation of a toxicologic prediction network (TPN)

composed of three layers: Layer I consists of drug nodes, Layer II contains the

gene nodes, Layer III represents the biological outcomes. The interactions between

the nodes are of three types: Type A represent drug–gene interactions, bi,j, Type B

are gene–gene interactions, vi,j, and Type C refer to gene–biological outcome

interactions, ai,j.

2. Methodology

This section introduces the novel modeling methodologyentitled TPN. The TPN aims at estimating chronic biologicaloutcomes as a function of drug dosage with the help of subchronicgene expression data. A rigorous mathematical programmingapproach will be proposed to determine the network parameterssuch as drug potencies, gene– gene interaction strengths andbiological risk parameters.

2.1. The toxicologic prediction network (TPN)

Gene-regulatory networks represent genetic control mechan-isms as directed graphs, in which genes are the nodes and the

connecting edges signify regulatory interactions (Weaver et al.,1999). In the present study, we propose a novel TPN to relate drugdosage to the corresponding chronic toxicological effects via agene-regulatory network as depicted in Fig. 1. The novel TPN hasthree layers:

Layer I (Drugs): This layer contains drug nodes representing thetoxicologically equivalent drug dosages, Di 2 D̄. Equivalent do-sages are typically measured in daily quantities of drug adminis-tered orally to test animals. Specifically, this study will considerin vivo drug administration to rats. We will attempt to estimatetoxicity risk given only drug dosage, without accounting formaximum or average plasma concentrations, cmax, caverage or areaunder the curve. Fig. 1 depicts two drugs altering the geneexpression through direct action on enzymes, membranes orreceptors.

Layer II (Genes): This layer incorporates a specific gene-regulatory network. These networks can be generated by severaltechniques. Biochemical analysis at the cellular level yieldsgenome-scale expression flux pathways. Alternatively, lexico-

graphic methods can be used to extract gene–gene interaction on

ARTICLE IN PRESS

K. Kulkarni et al. / Journal of Theoretical Biology 254 (2008) 308–318310

the basis of their statistical significance from literature searchengines (Draghici, 2003). The gene-regulatory network serves asan input to the TPN, and is therefore not a result of the proposedmethodology. However, it will be shown that the TPN is able todetect incomplete gene-regulatory networks as discussed inSection 4.

Each node in this layer, Ri 2 R̄, represents changes in relativegene expression due to drug exposure of rats. We use subchronicmicro-array data collected after 13 weeks of exposing femaleHarlan Sprague–Dawley rats to drugs (Vezina et al., 2004) as wellas chronic data for the same drug dosages available from a 2-yearstudy published by the National Toxicology Program (2006a–c).Gene nodes show the changes in gene expression after adminis-tering a certain drug dose relative to gene expression of controlanimals that received no drug. A relative gene expression of unitymeans that the drug administration causes no change. Valuesbelow and above unity signify under- and over-expression of thegenes, respectively. Pavlidis template matching (PTM) withr2¼ 0.9 was used to identify genes co-expressed in livers of rats

exposed to the toxins. The time course of gene expression was notaccounted for, since all gene expression data were collected after afixed amount of time. The inclusion of time-dependent experi-mental values was limited by the availability of experimentaldata. The methodology can be modified to incorporate time-dependent gene expression data. Toxicity modeling of dynamicgene expression trajectories can be addressed with dynamicinversion techniques as described in Tang et al. (2005), but isbeyond the scope of the current work. Fig. 1 depicts a schematic ofgene interaction between eight significant genes.

Layer III (Chronic toxicological effects): The third layer containsthe chronic biological outcome nodes, Bi 2 B̄. Its values measurethe fraction of the rat population affected with a specificpathological condition. Biological outcome nodes take scalarvalues between zero and unity representing percentages ofaffected rats. The example of Fig. 1 depicts two chronic biologicaleffects affected by the terminal genes 8, 9 and 10.

2.2. Network connectivity and information

The edges joining the nodes represent interaction strengthbetween the nodes. These interactions are of three types: (i)Drug–gene interactions signify the quantitative impact of thetoxicologically equivalent drug dosage on the relative geneexpression. (ii) Gene interactions measure the differential geneexpression change of genes that depend on control genes orbelong to the signaling pathway. The control genes are receptorsthat display structural affinity for the toxic ligands. (iii) Gene–biological outcome interactions quantify the effect of the terminalgenes on the chronic toxicity.

The aim of the proposed TPN is to determine the network edgestrengths with data from different sources and time scales. Thus,the TPN aims at quantifying the gene dependency and signalingpathways responsible for hepatotoxicity. The heterogeneity ofinformation makes necessary a set of three logical connectivityrules described next:

Rule 1. Drug– gene interactions: In the top layer, control geneswith structural affinity to the ligands respond to drug dosages. Forthe example in Fig. 1, genes 3 and 4 are affected by drugs 1 and 2.Their relative expression, Rj, depends on drug dosage, Di, and thedrug potency parameter, bi,j, as shown in

Rj ¼Y

i

ð1þ DiÞbi;j 8Rj linked_to Di (1)

This expression safeguards the positive definiteness of Rj as wellas a value of unity for the control dosage of, Di ¼ 0, meaning that

no drug is administered to control animals. The unknown drugpotency appears in the exponent, making the TPN a nonlinear

function of dosage, in general.

Rule 2. Gene interactions: Most genes belong to the gene-regulatory cascade in which their expression levels are controlledby upstream regulatory genes. The relative expression of thesenodes varies as a function of the differential expression of theregulatory genes controlling them. For example, in Fig. 1, genes 7,8, 9 and 10 belong to the signaling pathway of control gene 4. Thegain of the regulatory genes is postulated to be additive and linear.Regulatory genes can both up- or down-regulate gene expression,depending on the sign of the gene– gene interaction parameter, vi,j.The gene–gene interaction is modeled by evaluating relativeexpressions of the target genes as linear combinations of relativeexpressions of the regulating genes. The gene–gene interactions,vi,j, are described mathematically in

Rj ¼Y

i

vi;jRi 8Rj linked_to Ri (2)

Mixed drug and gene interactions: The relative expression of somegenes may also be receptive to both drug dosage as well asregulatory genes. The relative expressions of these hybrid genesare functions of both drug dosage and expression level of theupstream regulatory genes, as for example for gene 5 of Fig. 1.Thus, Eq. (3) models these mixed dependencies as a combinationof drug dosage and gene regulation

Ri ¼Y

j

ð1þ DjÞbj;i

Xk

vk;iRk

8Ri : ðRi linked_to DjÞ ^ ðRi linked_to RkÞ (3)

Rule 3. Gene– biological outcome interactions: Chronic toxicity isreported as incidences of test animals affected by variousneoplastic and non-neoplastic responses. By subchronic exposurewe mean drug exposure of rats over a period of 13 weeks.Typically, chronic toxicity measures the percentage of ratsaffected with specific toxicological outcomes after 2 years. Thelong-term effects imply systemic toxicity because the entireorganism of the test rats is affected. These fractions can also beinterpreted as the risk associated with a drug at the currentdosage. The biological exposure risk is causally correlated toover- or under-expression of differentially expressed hepaticgenes. We propose to use the risk function in Eq. (4) toquantify the risk associated with up- or down-regulation of aconnected gene:

FðRj;aÞ ¼Ra

j 0oRjp1; aX0 gene j down-regulated

R�aj Rj41; aX0 gene j up-regulated

((4)

The input, Rj, is the relative expression value of the upstream genej and hence is always positive. As long as the biological risk

parameter a is positive, the function in Eq. (4) will always bebounded between zero and unity. For biological outcomescontrolled by multiple terminal genes, the product of riskfunctions has to be taken, as in

Bj ¼ 1�Y

i

FðRi;ai;jÞ 8Bj linked_to Ri (5)

2.3. The control state

The gene expression level of control animals not exposed to thedrugs sets the reference state. This reference state is needed todefine differential gene expression of animals to which the drugdosages are administered. For consistency, the predicted geneexpressions must be unity for zero drug dosage. This conditionenforces additional constraints on the gene–gene interactions, vi,j,

ARTICLE IN PRESS


for the no-drug condition of all drugs. The expression level ofRj ¼ 1 for Di ¼ 0 is enforced mathematically by the equalityconstraint inX

i

vi;j ¼ 1 8vi;j linking Rj with Ri (6)

2.4. Determination of optimal drug potencies, gene– gene interaction

strengths and biological risk parameters

The optimal drug potencies, bi,j, the gene– gene interaction

strengths, vi,j, as well as the biological risk parameters, ai,j, of theTPN in Fig. 1 can be determined by solving an inversion problem.This inversion problem minimizes the least-squares error be-tween the predicted, Bi, and observed chronic biological outcomes,Bi

exp, as well as subchronic relative gene expression changes, Ri

and Riexp, for different toxic drug dosages. The solution of the

inversion problem yields the desired network model parameters,bi,j, vi,j, ai,j; the gene expressions and toxicity risks, Ri and Bi, aresimultaneously computed.

The nonlinear mathematical program in Eqs. (7)–(21) repre-senting the network of Fig. 1 can be obtained by imposing thelogical connectivity Rules 1–3 and the control condition describedpreviously

Minvi;j ;ai;j ;Ri ;Bi

Xi

ðBi � Bexpi Þ

2þX

i

ðRi � Rexpi Þ

2 (7)

s.t.Drug–gene interactions

R3 ¼ ð1þ D1Þb1;3 (8)

R4 ¼ ð1þ D1Þb1;4 (9)

Gene–gene interactions

R6 ¼ v3;6R3 þ v4;6R4 (10)

R7 ¼ v4;7R4 þ v5;7R5 (11)

R8 ¼ v6;8R6 þ v7;8R7 (12)

R9 ¼ v7;9R7 (13)

R10 ¼ v7;10R7 (14)

Hybrid drug and gene interactions

R5 ¼ ð1þ D2Þb2;5 ðv4;5R4Þ (15)

Gene–biological outcome interactions

B11 ¼ 1�FðR8;a8;11ÞFðR9;a9;11Þ (16)

B12 ¼ 1�FðR10;a10;12Þ (17)

Control condition

v3;6 þ v4;6 ¼ 1 (18)

v4;7 þ v5;7 ¼ 1 (19)

v6;8 þ v7;8 ¼ 1 (20)

v4;5 ¼ v7;9 ¼ v7;10 ¼ 1 (21)

For this case, the least-squares inversion problem inEqs. (7)–(21) is a nonlinear programming problem (NLP) with 10state variables (8 gene expression ratios and 2 biological out-comes) and 15 parameters (3 drug–gene interactions, 9 gene–geneinteractions and 3 gene–biological outcome interactions). Here,we solved the TPN inversion with a globally convergent trustregion method (Zhang et al., 2007). Robust methods to solve

highly nonlinear inversion problems are described elsewhere(Kulkarni et al., 2006). Problems with solution multiplicity andhow to overcome them are methods beyond the scope of thisarticle, but are discussed elsewhere (Lucia and Feng, 2003; Zhanget al., 2007).

3. Application—prediction of chronic hepatoxicity

In order to demonstrate the efficacy of our modeling method,we constructed a TPN to assess the hepatotoxic risk of three HAHs.Necessary gene expression data were taken from a previoustoxicological study by Vezina et al. (2004). About 8799 gene probesets contained on Affymetrix RGU34A GeneChips were used. Inorder to narrow down the gene selection to eight, we used a toolcalled ‘Pathway Studio V5’ that uses lexicographical searchmethods to determine statistically significant genes. In this study,female Harlan Sprague–Dawley rats were treated for 13 weekswith toxicologically equivalent doses of three HAHs: 3,30,4,40,5-pentachlorobiphenyl (PCB 126), 2,20,4,40,5,50-hexachlorobiphenyl(PCB 153), 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). PCB 126and TCDD are known to have similar aryl hydrocarbon (AhR)receptor gene-dependent toxicity mechanisms. PCB 153 is a non-AhR ligand.

Chronic toxicity data were collected from the NationalToxicology Program (2006a–c) study that involved exposingfemale Harlan Sprague–Dawley rats over a period of 2 years witha dose range of the same three drugs: PCB126, PCB153 and TCDD.The drug dosage ranges were 0–1000 ng/kg/day for PCB 126,0–3000mg/kg/day for PCB 153 and 0–100 ng/kg/day for TCDD.Chronic liver damage to the rats was reported as fractions of therat population affected. The NTP studies observed a dose-relatedincrease in hepatic toxicity. Neoplastic effects examined in theliver were cholangiocarcinoma, hepatocellular adenoma andhepatocholangioma. In addition, increased incidence of non-neoplastic effects in the liver was observed such as: hepatocytehypertrophy, diffuse fatty change, bile duct hyperplasia, necrosis,pigmentation, cholangiofibrosis, toxic hepatopathy and severalother pathological changes. Similar observations were investi-gated for the lung (metaplasia), the pancreas, the kidney, theheart, thyroid glands, thymus and spleen.

For our prediction method we chose the following non-neoplastic effects (NTP, 2006a–c): diffuse fatty change in the liver(fatty change), toxic hepatopathy in the liver (liver toxicity). Theneoplastic effect we aimed at predicting was hepatocellularadenoma in the liver (oncogenesis). It is possible to also correlatedifferential gene expression to other toxicological effects listedabove, but such analysis is beyond the scope of this initial studyfocusing on the new methodology.

Modeling the TPN for the liver toxicity. There are several tools forderiving putative gene interaction networks from micro-arraydata. We used Pathway Studio V5 (Ariadne Genomics) which usesan automated natural language search algorithm to draw factsfrom published scientific literature (Pandey et al., 2004). PathwayStudio includes a database of molecular networks automaticallyassembled from scientific abstracts on PubMed. It also featuresthe lexicographical search engine MedScan to efficiently pre-process textual input to extract relevant published information(Novichkova et al., 2003). The final result includes custom-definedgene-regulatory pathways represented as directed graphs withnodes and arcs for gene–ligand or protein–protein interactions.

An experimental toxicity study (Vezina et al., 2004) investi-gated gene expression changes in response to three toxins. The setof genes with significant over- or under-expression common to allthree toxic ligands was selected as the base set. The statisticalcriterion for selecting co-expressed genes with significant relative

ARTICLE IN PRESS

Fig. 3. Symbolic representation of the TPN parameters. The drug nodes are

represented by the respective dosages D1, D2 and D3. The gene nodes hold the

relative changes in gene expression, Ri (i ¼ 4,y,11). The biological outcomes record

the fractions of the rat population affected by three specific toxicity responses:

bilirubin (B12), oncogenesis (B13) and fatty change (B14).


expression change was identified through a PTM method withr2¼ 0.9. The set of co-expressed genes, the names of toxic

compounds and the observed toxicological effects served as inputto the Pathway Studio software.

Lexicographical search was used for identifying gene–geneinteractions. The Pathway Assist feature helped build a networkmodel for the regulatory and signaling pathway for the interactionbetween the three toxins and receptors. Entry-level genes for thenetwork were defined as genes not regulated by other genes withstructural affinity to the ligands. Thus, the top level comprisesregulatory genes assumed to be directly affected by the toxicligands. Interactions of type ‘regulation’ defined by PathwayStudio were used to draw possible interactions between genesbelonging to the base set obtained by statistically relevant relativegene expression changes. Genes directly implicated with thebiological outcomes were categorized with lexicographical meth-ods to form the set of genes in the third layer. The entireregulation and signaling pathway linked toxic dosage nodes(ligand nodes) through differentially expressed genes tochronic toxicological effects. We performed a further networksimplification step. Genes without connections to the thirdlayer (dangling genes) were eliminated and the shortest pathbetween each drug and the biological outcomes through the genenetwork was selected. This step drastically reduced the number ofgenes to be considered. Alternative statistical methods (Werner,2008) could be used to construct an input pathway networkmodel for our estimation methodology without impacting itsapplicability.

The resulting interaction network is shown in Fig. 2. The drugnodes represent the three toxic drugs: PCB126, PCB153 and TCDD.For the current study, eight statistically significant genes wereidentified. The biological outcome nodes represent fractions of ratpopulation affected with the three pathological conditions.

The TPN parameters of this network are symbolically repre-sented in Fig. 3. The drug nodes take values represented by thedrug dosages: D1 for PCB 126 (1 unit ¼ 1mg/kg/day), D2 for PCB153 (1 unit ¼ 1 mg/kg/day) and D3 for TCDD (1 unit ¼ 100 ng/kg/day). The relative expressions of the entry-level genes (Cycs, Bcl2,Il2 and Cyp1a2) are functions of the toxicologically equivalentdosages of the three drugs. The relative expressions of the middle-level genes (Mapk14, Ets1, Bmp4 and Cav1) are functions ofdifferential expressions of their regulator genes. Differential gene

Fig. 2. Schematic representation of the toxicologic prediction network (TPN) for the li

equivalent dosage of the three toxins: PCB 126, PCB 153 and TCDD and three observabl

changes in expression of eight significant genes.

expression data for each drug dosage are listed in Table 1. Thefractions of rat population affected by the three chronictoxicological effects of liver toxicity, fatty change and oncogenesis

are functions of the relative expressions of the last layer of genes,and are listed in Table 2 for different drug dosages.

Problem inversion for liver toxicity. The gene expression andbiological outcome data for different drug dosages were used toformulate the inversion problem in Eqs. (22)–(37). The objectivefunction (22) minimizes the least-squares error between theobserved and the predicted biological outcomes and relative geneexpressions for specific dosages, di 2 d̄. The constraints of theinversion problem, Eqs. (23)–(37), represent the TPN mathema-tical model constructed with the three network connectivityrules. The control conditions for this network are enforced byEqs. (34)–(37). An edge-weight value of unity (e.g. v6,8 ¼ 1) doesnot imply that the corresponding genes do not contribute to thechronic toxicological outcomes, but only that the gain of theregulatory gene expression over the controlled gene expression isunity. The outcomes of this inversion problem are the drug

potencies, bi,j, gene– gene interaction strengths of the gene signaling

ver toxicity case study. The network depicted in this figure relates toxicologically

e biological outcomes of liver toxicity, oncogenesis and fatty change through relative

ARTICLE IN PRESS

Table 1Relative expression data for the eight genes from Vezina et al. (2004) correspond-

ing to one specific dosage of each drug

Gene Relative expression values

PCB 126

(dosage ¼ 1 ng/

kg/day)

PCB 153

(dosage ¼ 1mg/

kg/day)

TCDD

(dosage ¼ 100 ng/

kg/day)

Cycs 4.4 1 1.2

Bcl2 1.2 0.1 0.6

Il2 2.3 9.2 3.2

Cyp1a2 2.9 1.6 3.7

Ets1 3.3 1 2.7

Mapk14 0.7 0.2 0.8

Bmp4 2.3 1 3

Cav1 15.3 1 1.3

Table 2Training dataset summarizing biological responses for 6 dosages of PCB 126, 5

dosages of PCB 153 and 5 dosages of TCDD

Drug Dosage Bilirubin

(%)

Oncogenesis

(%)

Fatty

change (%)

PCB 126 (ng/kg/

day)

0 2 0 0

30 11 4 13

100 42 4 26

175 51 0 42

550 100 19 88

1000 92 60 89

PCB 153 (mg/kg/

day)

0 0 0

10 0 0 13

300 0 0 21

1000 0 0 40

3000 0 0 33

TCDD (ng/kg/day) 0 0 0 0

3 40 4

10 15 0 23

46 85 9 57

100 100 72 91

These data were used as an input to the inversion problem in Eqs. (22)–(37).

Table 3The parameters obtained by problem inversion of Eqs. (22)–(37) with data from

Vezina et al. (2004) and National Toxicology Program (2006a–c)

Drug potencies Gene–gene interactions Biological risk parameters

b1,4 0.8244 v4,5 1 a4,12 0.0123

b1, 5 0.9283 v5,9 0.0751 a9,12 31.6201

b2,4 0 v6,5 0 a9,13 5.1660

b2,5 0 v7,9 0.9249 a5,13 0

b2,6 0.2065 v8,10 0 a10,14 18.0959

b3,7 0.1403 v8,5 0 a11,14 1.9375

v9,10 1

v11,9 0


pathway, vi,j, as well as the biological risk parameters, ai,j

Minvi;j ;ai;j ;Ri ;Bi

X3

i¼1

Xdi

j¼1

ðBij12 � Bexp;ij

12 Þ2þ ðBij

13 � Bexp;ij13 Þ

2

8<:

þðBij14 � Bexp;ij

14 Þ2þX11

k¼4

ðRik � R̄

ikÞ

2

9=; (22)

s.t.Entry-level nodes

R4 ¼ ð1þ D1Þb1;4 ð1þ D2Þ

b2;4 (23)

R5 ¼ ð1þ D1Þb1;5 ð1þ D2Þ

b2;5 ðn4;5R4 þ n6;5R6 þ n8;5R8Þ (24)

R6 ¼ ð1þ D2Þb2;6n7;6R7 (25)

R7 ¼ ð1þ D3Þb3;7 (26)

Middle-level nodes

R8 ¼ n6;8R6 (27)

R9 ¼ n5;9R5 þ n7;9R7 þ n11;9R11 (28)

R10 ¼ n9;10R9 þ n8;10R8 (29)

R11 ¼ n6;11R6 (30)

Biological outcome nodes

B12 ¼ 1�FðR4;a4;12ÞFðR9;a9;12Þ (31)

B13 ¼ 1�FðR9;a9;13ÞFðR5;a5;13Þ (32)

B14 ¼ 1�FðR10;a10;14ÞFðR11;a11;14Þ (33)

Control conditions

n4;5 þ n6;5 þ n8;5 ¼ 1 (34)

n5;9 þ n7;9 þ n11;9 ¼ 1 (35)

n9;10 þ n8;10 ¼ 1 (36)

n7;6 ¼ n6;8 ¼ n6;11 ¼ 1 (37)

The inversion problem in Eqs. (22)–(37). is a nonlinearmathematical programming problem (NLP) with 11 state variables(8 gene expression ratios and 3 biological outcomes) and 24parameters (6 drug–gene interactions, 11 gene–gene interactionsand 7 gene–biological outcome interactions). The inversionconverged in 162 iterations with a CPU time of 8.03 s on aPentium IV machine with 1 GB RAM. Table 3 summarizes theoptimal network parameters, bi,j, vi,j, ai,j, obtained by solvingthe inversion problem. The model also predicted the chronictoxicity effects for the jth dosage of the ith drug. In addition, therelative expressions at different drug dosages were estimatedsimultaneously.

4. Results and discussion

The predicted and the experimental biological outcomes aresummarized in Fig. 4. The mean deviations of the predictedchronic toxicity risk from the experimental biological outcomedata are in the order of 4–8%; one datum is 11% off as given inTable 3. These results show a satisfactory match between theexperimentally obtained fractions of affected rat population andthe model-predicted outcomes for different dosages of the threetoxic drugs.

Fig. 5 compares the predicted relative gene expression changeagainst the experimentally obtained gene expression data forthree drug dosages (Vezina et al., 2004). While most relative geneexpressions are adequately matched, the predictions for twogenes, Il2 and Cav1, exhibit significant deviations. This mismatchis an indicator that the original network input might not containall significant interactions of the signaling pathway for these twogenes. Other significant genes interacting with these two genes inthe gene-regulatory network may have gone unidentified by thelexicographic search. Since the gene-regulatory network is an

ARTICLE IN PRESS

Fig. 4. Predicted chronic biological responses of bilirubin, oncogenesis and fatty change versus experimental results for various dosages of each drug (Experimental K Liver

toxicity, ’ Oncogenesis, m Fatty change DATA). These predictions (thick lines—MODEL) were computed using a knowledge-based model with parameters obtained by

problem inversion of Eqs. (22)–(37). (a) PCB 126, (b) PCB 153 and (c) TCDD.


input based on statistically significant genes implicated andstudied in the past, the TPN cannot guarantee its completeness.However, the TPN approach can be used as a hypothesis testing tool

to detect possible problem sources in the underlying gene-regulatory network and signaling pathways. In hypothesis testing,the experimental data are not used to determine parameters butserve as data source to validate the model integrity and accuracyof the model predictions (Table 4).

4.1. Validation of chronic toxicity predictions

In the earlier section we used the relative gene expressions inTable 1 and the biological outcomes in Table 2 as training data setsto solve the inversion problem (22)–(37). Some biological out-comes for specific drug dosages not included in the training setwere used to validate the TPN predictions. The results of thevalidation data are summarized in Table 5 and indicate that the

mean difference between the validation data set and the model-predicted chronic toxicities is about 4%. This outcome issatisfactory given the wide range of uncertainties in experimentsand data analysis.

The proposed mathematical TPN method also enables quanti-tative risk assessment as a function of drug dosages, administra-tion of drug mixtures as well as the optimal formulation of drugcocktails. Three such techniques will be introduced:

(i)
prediction of chronic toxicity risk of drug cocktails withknown composition,
(ii)
estimation of the likely toxicity risk of new drugs with genedependency and signaling pathways similar to the drugsalready studied,
(iii)
reduction of chronic toxicity side-effects by determining theoptimal drug-cocktail composition. Each method will beintroduced in the next subsections.

ARTICLE IN PRESS

Fig. 5. Model prediction of subchronic relative gene expression ratios for the three drugs at specific dosage. These bar graphs indicate model-predicted subchronic gene

expression (blue) and experimentally observed gene expression ratios (red) by Vezina et al. (2004). (a) PCB 126, (b) PCB 153 and (c) TCDD.

Table 4Average deviation of the predicted chronic toxicity risk from the experimental

toxicity data

Drugs Biological outcomes Average prediction deviation

from experimental data (%)

PCB 126 Bilirubin 4

Oncogenesis 6

Fatty change 5

PCB 153 Bilirubin 0

Oncogenesis 0

Fatty change 8

TCDD Bilirubin 6

Oncogenesis 11

Fatty change 4


4.2. Chronic toxicity for drug mixtures

The model proposed in Eqs. (23)–(33) along with the optimalparameters, as summarized in Table 3 can be used to predict

biological outcomes for mixtures of the three toxic chemicalcompounds. The predictions are summarized in Table 6. Thebiological risks were estimated for the co-administration of two orthree toxins in various dosages as shown in Table 6. The first threecolumns of the table represent the predicted biological outcomesfor the pure drug dosages: 700 ng/kg for PCB 126, 2.3 mg/kg forPCB 153 and 200 ng/kg for TCDD. Using this method we can predictbeforehand the risk associated with drug mixtures without newanimal testing data. For instance, which dosage of drugs incurs a50% cancer risk (oncogenesis) in rats? The network of the livertoxicity case study in Fig. 3 suggests that the equivalent 50% cancerrisk dosages are 840 ng/kg of pure PCB 126, and 180 ng/kg of pureTCDD. The third drug, PCB 153, is not carcinogenic. The TPN canalso predict the 50% cancer risk of drug mixtures. For example, amixture of 400 ng/kg of PCB 126, co-administered with 85 ng/kg ofTCDD is estimated to cause a 50% chronic toxicity risk of cancer inrats. Another combination of 248 ng/kg of PCB 126, co-adminis-tered with 120 ng/kg of TCDD is equally carcinogenic. The riskassessment of drug mixtures is not arithmetically additive. Therisk of drug co-administration depends on gene expression ratherthan simple algebraic mixing rules.

ARTICLE IN PRESS

Table 5Chronic toxicity predictions for validation data via the TPN with drug potencies, gene–gene interaction strengths and biological risk parameters obtained by inversion

Drug Dosage Bilirubin (%) Oncogenesis (%) Fatty change (%)

Experimental

toxicity risk

TPN

prediction

Deviation Experimental

toxicity risk

TPN

prediction

Deviation Experimental

toxicity risk

TPN

prediction

Deviation

PCB 126 (ng/

kg/day)

300 74 75.72 �1.72 13 20.07 �7.07 57 55.96 �1.04

PCB 153 (mg/

kg/day)

100 0 0.11 �0.11 0 0.024 �0.02 4 4.09 �0.09

TCDD (ng/kg/

day)

22 57 52.88 4.12 2 15.10 �13.1 32 45.68 �13.68

Table 6Prediction of biological outcomes for toxic drug mixtures

Pure drug dosages Drug mixtures

D1 D2 D3 D1+D2 D1+D3 D2+D3 D1+D2+D3

Drugs

PCB 126 (D1) (ng/kg) 700 – – 700 700 0 700

PCB 153 (D2) (mg/kg) – 2.3 – 2.3 0 1 2.3

TCDD (D3) (ng/kg) – – 200 0 200 1 200

Biological outcomes

Liver toxicity (%) 96.8 0 52.7 96.8 98.3 52.7 98.3

Oncogenesis (%) 43.0 0 11.5 43.0 49.0 11.5 49.0

Fatty change (%) 86.1 37.9 38.0 91.3 91.0 61.5 94.4


4.3. Estimation of the biological outcomes of a new drug

The proposed methodology offers the possibility of estimatingchronic toxicological effects of new drugs. The estimation may bevalid under the following assumptions:

(i)
subchronic relative expression data for the new drug on theentry genes are available,
(ii)
the new drug has the same gene dependency and signalingpathways as the ligands for which the TPN has beengenerated; no significant gene interactions not incorporatedin the TPN occur,
(iii)
the signaling pathways of downstream genes are regulatedindependently of how the upstream relative gene expressionis induced.
Under these circumstances, the mathematical model in Eqs.
(38)–(44) can provide an initial estimate of the chronic impact ofthe new drug based only on differential gene expression data ofthe entry genes. Subchronic data can be gathered in a smallnumber of subchronic experiments without the need for extensivechronic toxicological studies over several years.
The potential toxicity risk of a new drug dosage can beestimated through TPN. Consider the network shown in Fig. 6which is similar to the network in Fig. 1. The relative geneexpressions of the entry genes: R4, R5 and R6 are determinedthrough micro-array analysis of liver tissue of rats exposed to thenew drug in subchronic experiments. The drug potency para-meters, b3,4, b3,5 and b3,6, of the new drug can be determinedusing a reduced inversion problem with the gene interactions, vi,j,and the biological risk parameters, ai,j, determined previously forsimilar ligands left unchanged. Thus, the chronic toxicity of thenew drug can be written as a function of the new drug dosages asin Eqs. (38)–(44). This system of algebraic equations can be solvedwith numerical methods

R7 ¼ v4;7R4 þ v5;7R5 (38)

R8 ¼ v5;8R5 þ v6;8R6 (39)

R9 ¼ v7;9R7 þ v8;9R8 (40)

R10 ¼ v8;10R8 (41)

R11 ¼ v8;11R8 (42)

B12 ¼ RiskðDnewÞ ¼ 1�FðR9;a9;12ÞFðR10;a10;12Þ (43)

B13 ¼ RiskðDnewÞ ¼ 1�FðR11;a11;13Þ (44)

Thus, TPN method enables the initial estimation of biologicaloutcomes for new drugs in short amounts of time, usually severalweeks needed for subchronic experiments which show differen-tial gene expression before tumors or other hepatotoxic effectscan be detected.

4.4. Reducing side-effects by choosing optimal

drug-cocktail composition

Consider chemotherapy for destroying tumor cells, whileminimizing undesired effects like liver toxicity. TPN can be usedto find optimal drug mixtures to achieve the desired outcomes,such as destroying cancer cells, while at the same time remainingbelow a certain acceptable risk. In this demonstration, it isassumed that the relationship between drug dosage and pharma-cokinetic effects is known and expressed in a mathematical formrelating dosage to toxicity as introduced in Section 2. The TPNenables a rigorous design problem formulation to determine theoptimal composition of the drug mixture that ensures treatmentpotency, without exceeding the tolerable risk threshold ofunavoidable side-effects. To solve this problem with TPN,biological outcomes are classified into desired, B̄

d, and undesired

outcomes, B̄u. A desired outcome would be tumor cell death; side-

effects are classified as undesired biological outcomes like liverdamage.

ARTICLE IN PRESS

Fig. 6. Prediction of the chronic toxicity risk of a novel drug with similar genetic effects. The proposed methodology can be used to predict the chronic biological outcomes

of a new drug under the following assumptions: (i) Subchronic relative expression data for the new drug on the entry genes are available. (ii) The new drug affects the same

genes for which the toxicologic prediction network has been generated; no significant gene interactions outside the existing TPN occur. (iii) The gene expressions are

regulated independently of how the upstream relative gene expressions occur.

Fig. 7. This figure depicts the TPN network for m drugs affecting n genes leading to p biological responses. p1 denotes the number of side-effects; desired outcomes (p–p1)

ensure drug potency. The objective is to find the optimal composition of the m-drug cocktail that maximizes the desired effects,Bdi 2 B

d, while not exceeding tolerable risk,

riu(i ¼ 1,y,p1) of side-effects, Bu

i 2 Bu.


The network in Fig. 7 depicts the effect of m drugs through n

genes resulting in p biological outcomes with model parameters,bi,j, vi,j, ai,j, determined previously. The parameters obtained fromthe inversion problem will now be used to predict the impact ofmixtures whose compositions will be determined so as to achievea desired combination of effects. The ability to target desiredranges of effects by adjusting exactly the composition of drugmixtures is another advantage of the mathematical approachproposed in this work. The objective is to find out the compositionof the drug cocktail that maximizes the desired effects, whileminimizing the undesired effects. The optimal drug-mixingproblem for this network is formulated in Eqs. (45)–(49) whichis a classical optimization approach to multi-objective design. The

desired outcomes,Bdi 2 B

d, are maximized, while the undesired

outcomes or the side-effects, Bui 2 B

u, should not exceed the

tolerable thresholds, rui ði ¼ 1; . . . ; p1Þ.

MaxDi ;Rj ;Bi

Xi

oiBdi (45)

s.t.

0pBui pru

i Bui 2 B̄

u(46)

Rj ¼Y

i

ð1þ DiÞbi;j 8Rj linked_to Di (47)

ARTICLE IN PRESS


Rj ¼X

i

vi;jRi 8Rj linked_to Ri (48)

Bl ¼ 1�Y

j

FðRj;aj;lÞ 8Bl linked_to Rj (49)

This NLP can be readily solved with standard optimizationalgorithms. More details on solving trade-off problems withmathematical programming can be found elsewhere (Chakrabortyand Linninger, 2002, 2003; Chakraborty et al., 2003).

5. Conclusions

This study proposes a rigorous method inspired by expressionflux analysis to predict chronic toxicity risk of halogenatedchemical compounds. The method uses subchronic micro-arraygenomic expression data to construct a TPN. The TPN correlatesdrug dosages with chronic toxicity based on changes in sub-chronic differential gene expressions. The model parameters likethe drug potencies, gene– gene interaction strengths and thebiological risk parameters are determined through the solution ofan inversion problem which optimally aligns drug dosages, geneexpressions and chronic toxicity data.

The rigorous TPN method has several advantages. It can predicttoxicity levels to assess the health risk caused by different drugs. Italso quantifies the toxicity risk of co-administration of multiplechemical species. The proposed method can also be used to estimatethe chronic toxicological impact of novel drugs, based on relativegene expression data only. Another novel and interesting applicationof the TPN methodology is the optimal drug mixing problem. Usingthe knowledge derived from the predictive model, the optimalformulation of a drug cocktail to maximize the drug potency whileminimizing side-effects can be determined rigorously.

The proposed method correlated gene expression data from afixed time point to predict chronic liver toxicity after 2 years in rats.Our paper does not provide means to analyze dynamic data.Moreover, the utilization of 13 weeks relative gene expression datafrom rats may not be the best time point to assess the systemictoxicity in a general case. However, given relative gene expression atdifferent time points, the method can be used to test which timepoint leads to the strongest correlations. It is also conceivable toincorporate dynamic relative gene expression changes in the analysis.The proposed mathematical framework in combination with dy-namic inversion techniques may still apply (Tang et al., 2005).

Another limitation of the proposed method is the omission ofchemical drug–drug interactions. The method, however, accountsfor the combined effect of drugs through the proposed mathe-matical rules 1–3. A direct chemical interaction between ligandfunctional groups is not accounted for.

Acknowledgements

The authors gratefully acknowledge the NSF Grant CBET-0730048 for providing partial support to perform the researchpresented in this article.

References

Barratt, M.D., Rodford, R.A., 2001. The computational prediction of toxicity. Curr.Opin. Chem. Biol. 5, 383–388.

Chakraborty, A., Linninger, A.A., 2002. Plant-wide waste management. 1. Synthesisand multi-objective design. Ind. Eng. Chem. Res. 41, 4591–4604.

Chakraborty, A., Linninger, A.A., 2003. Plant-wide waste management. 2. Decisionmaking under uncertainty. Ind. Eng. Chem. Res. 42, 357–369.

Chakraborty, A., Colberg, R.D., Linninger, A.A., 2003. Plant-wide waste manage-ment. 3. Long-term operation and investment planning under uncertainty. Ind.Eng. Chem. Res. 42, 4772–4788.

Covert, M.W., Schilling, C.H., Palsson, B., 2001. Regulation of gene expression in fluxbalance models of metabolism. J. Theor. Biol. 213 (1), 73–88.

Draghici, S., 2003. Data analysis tools for DNA microarrays. Chapman & Hall/CRCPress, Boca Raton, FL, 437–441.

Edwards, J.S., Palsson, B.O., 2000. Metabolic flux balance analysis and the in silicoanalysis of Escherichia coli K-12 gene deletions. BMC Bioinform. 1, 1.

Forster, L., Famili, I., Fu, P., Palsson, B.O., Nielsen, J., 2003. Genome-scalereconstruction of the Saccharomyces cerevisiae metabolic network. GenomeRes. 13, 244–253.

Gombert, A.K., Nielsen, J., 2000. Mathematical modeling of metabolism. Curr. Opin.Biotechnol. 11, 180–186.

Katritzsky, A.R., Tatham, D.B., 2001. Theoretical descriptors for the correlation ofaquatic toxicity of environmental pollutants by quantitative structure–toxicityrelationships. J. Chem. Inf. Comp. Sci. 41, 1173–1176.

Kulkarni, K., Zhang, L., Linninger, A.A., 2006. Model and parameter uncertainty indistributed systems. Ind. Eng. Chem. Res. 45, 7832–7840.

Lucia, A, Feng, Y., 2003. Multivariable terrain methods. AIChE J. 49, 2553–2563.National Toxicology Program, 2006a. NTP technical report on the toxico-

logy and carcinogenesis studies of 2,3,7,8-tetrachlorodibenzo-p-dioxin(TCDD) (CAS no. 1746-01-6) in female Harlan Sprague–Dawley rats(Gavage studies). National Toxicological Program Technical Report Series 521,pp. 4–232.

National Toxicology Program, 2006b. NTP technical report on toxicology andcarcinogenesis studies of 3,30 ,4,40 ,5-pentachlorobiphenyl (PCB 126) (CAS no.57465-28-8) in female Harlan Sprague–Dawley rats (Gavage studies). NationalToxicological Program Technical Report Series 520, 4-246.

National Toxicology Program, 2006c. NTP technical report on toxico-logy and carcinogenesis studies of 2,20 ,4,40 ,5,50-hexachlorobiphenyl(PCB 153) (CAS no. 35065-27-1) in female Harlan Sprague–Dawley rats(Gavage studies). National Toxicological Program Technical Report Series 529,pp. 4–168.

Novichkova, S., Egorov, S., Daraselia, N., 2003. MedScan, a natural languageprocessing for MEDLINE abstracts. Bioinformatics 19 (13), 1699–1706.

Oliveira, A.P., Nielsen, J., Forster, J., 2005. Modeling Lactococcus lactis using agenome-scale flux model. BMC Microbiol. 5, 39.

Pandey, R., Guru, R.K., Mont, D.W., 2004. Pathway miner: extracting geneassociating networks from molecular pathways for predicting the biologicalsignificance of gene expression microarray data. Bioinformatics 20 (13),2156–2158.

Schilling, C.H., Covert, M.W., Famili, I., Church, G.M., Edwards, J.S., Palsson, B.O.,2002. Genome-scale metabolic model of helicobacter pylori 26695,J. Bacteriology 184 (16), 4582–4593.

Sheikh, K., Forster, J., Nilesen, J., 2005. Modeling hybridoma cell-metabolism usinga generic genome-scale metabolic model of Mus musculus. Biotechnol. Prog. 21,112–121.

Simmon-Hettich, B., Rothfuss, A., Steger-Hartmann, T., 2006. Use of computer-assisted prediction of toxic effects of chemical substances. Toxicology 224,156–162.

Tang, W., Zhang, L., Linninger, A.A., Tranter, R.S., 2005. Solving kinetic inversionproblems via a physically bounded Gauss–Newton method. Ind. Eng. Chem.Res. 44 (10), 3626–3637.

Varma, A., Palsson, B.O., 1994. Metabolic flux balancing: basic concepts, scientificand practical use. Bio/Technology 12, 994–998.

Vezina, C.M., Walker, N.J., Olson, J.R., 2004. Subchronic exposure to TCDD, PeCDF,PCB126 and PCB153: effect on hepatic gene expression. Environ. Health.Perspect. 112, 1636–1644.

Visco, D., Pophale, R., Rintoul, M., Faulon, J., 2002. Developing a methodology for aninverse quantitative structure–activity relationship using the signaturemolecular descriptor. J. Mol. Graph. Model. 20, 429–438.

Weaver, D., Workman, C., Stormo, G., 1999. Modeling regulatory networks withweight matrices. Pacific Symp. Biocomp. 99 (4), 112–123.

Weis, D., Faulon, J., LeBorne, R., Visco, D., 2005. The signature molecular descriptor.5. The design of hydrofluororther foam blowing agents using inverse-QSAR.Ind. Eng. Chem. Res. 44, 8883–8891.

Werner, T., 2008. Bioinformatics applications for pathway analysis of microarraydata. Curr. Opin. Biotechnol. 19 (1), 50–54.

White, A.C., Mueller, R.A., Gallavan, R.H., Aaron, S., Wilson, A.G., 2003. A multiple insilico program approach for the prediction of mutagenicity from chemicalstructure. Mutat. Res. 539, 77–89.

Zhang, L., Kulkarni, K., Somayaji, M.R., Xenos, M., Linninger, A.A., 2007. Discovery oftransport and reaction properties in distributed systems. AIChE J. 53 (2),381–396.

assessing chronic liver toxicity based on relative gene expression data

Documents