discovering contextual connections between biological ... · discovering contextual connections...

Discovering contextual connections between biological

processes using high-throughput data

Christopher D. Lasher

Dissertation submitted to the Faculty of the

Virginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in

Genetics, Bioinformatics, and Computational Biology

T. M. Murali, Co-Chair

Padmavathy Rajagopalan, Co-Chair

Richard F. Helm

Madhav V. Marathe

Naren Ramakrishnan

September 12, 2011

Blacksburg, Virginia

Keywords: computational systems biology, liver, Markov chain Monte Carlo, molecular

interactions, gene expression

Copyright 2011, Christopher D. Lasher

Discovering contextual connections between biological processes usinghigh-throughput data

Christopher D. Lasher

ABSTRACT

Hearkening to calls from life scientists for aid in interpreting rapidly-growing reposi-tories of data, the fields of bioinformatics and computational systems biology continueto bear increasingly sophisticated methods capable of summarizing and distilling perti-nent phenomena captured by high-throughput experiments. Techniques in analysis ofgenome-wide gene expression (e.g., microarray) data, for example, have moved beyondsimply detecting individual genes perturbed in treatment-control experiments to report-ing the collective perturbation of biologically-related collections of genes, or “processes”.Recent expression analysis methods have focused on improving comprehensibility of re-sults by reporting concise, non-redundant sets of processes by leveraging statistical mod-eling techniques such as Bayesian networks.

Simultaneously, integrating gene expression measurements with gene interaction net-works has led to computation of response networks—subgraphs of interaction networksin which genes exhibit strong collective perturbation or co-expression. Methods that inte-grate process annotations of genes with interaction networks identify high-level connec-tions between biological processes, themselves. To identify context-specific changes inthese inter-process connections, however, techniques beyond process-based expressionanalysis, which reports only perturbed processes and not their relationships, responsenetworks, composed of interactions between genes rather than processes, and existingtechniques in process connection detection, which do not incorporate specific biologicalcontext, proved necessary.

We present two novel methods which take inspiration from the latest techniques in process-based gene expression analysis, computation of response networks, and computation ofinter-process connections. We motivate the need for detecting inter-process connectionsby identifying a collection of processes exhibiting significant differences in collective ex-pression in two liver tissue culture systems widely used in toxicological and pharma-ceutical assays. Next, we identify perturbed connections between these processes via anovel method that integrates gene expression, interaction, and annotation data. Finally,we present another novel method that computes non-redundant sets of perturbed inter-process connections, and apply it to several additional liver-related data sets. These appli-cations demonstrate the ability of our methods to capture and report biologically relevanthigh-level trends.

This work was supported by NSF CBET #0933225, the ICTAS ISBET at Virginia Tech, andthe GBCB Interdisciplinary Ph.D. Program of Virginia Tech.

Acknowledgments

The author thanks the following people and organizations for their contributions to this

dissertation: Prof. T.M. Murali, Prof. Padma Rajagopalan, Prof. Rich Helm, Prof. Madhav

Marathe, and Prof. Naren Ramakrishnan for their continued guidance through the course

of the author’s graduate research; the past and present members of the Rajagopalan

group, especially Dr. Yeonhee Kim, Dr. Christopher Detzel, and Mr. Adam Larkin for their

tremendous effort in obtaining gene expression data from hepatocyte cultures used in this

work, and for their helpful discussions and collegial support; the past and present mem-

bers of the Murali group, including Dr. Arjun Krishnan, Mr. Christopher Poirel, Mr. Yared

Kidane, Mr. Naveed Massjouni, Ms. Danielle Choi and Mr. Ahsanur Raman for helpful

discussions and collegial support, and Mr. Phillip J. Whisenhunt for assistance in creating

plots for Chapter 4; the staff at the Virginia Bioinformatics Institute (VBI) Core Labo-

ratory facility for their assistance with obtaining gene expression measurements for the

liver tissue cultures; Dr. Bryan Lewis and Dr. Keith Bissett of the Network Dynamics and

Simulation Science Laboratory at the Virginia Bioinformatics Institute at Virginia Tech for

helpfully providing access to computational resources.

iii

Financial support for this work was generously provided by National Science Foun-

dation (NSF) Chemical, Bioengineering, Environmental, and Transport Systems (CBET)

#0933225 “Transcriptional Signatures of 3D Liver Mimetic Architectures”, the Institute

for Critical Technology and Applied Sciences (ICTAS) Center for Systems Biology of En-

gineered Tissues (ISBET) at Virginia Tech, and the Genetics, Bioinformatics, and Compu-

tational Biology (GBCB) Interdisciplinary Ph.D. Program of Virginia Tech.

The author thanks the Python Software Foundation, particularly Mr. Jesse Noller, Mr. Steve

Holden, Mr. Van Lindberg, Mr. Jacob Kaplan-Moss, and the Python community, in gen-

eral, for their maintenance of not only an arguably superior programming language, but

a fantastic programming community as well. The author thanks the Stack Overflow com-

munity for their collective support in answering many, many questions raised through

the process of programming the software used throughout this dissertation.

The author gives his heartfelt thanks to the following people who made completion of this

work possible: his family (Mom, Dad, Richard, Bammee, Pop Pop, Grandma, Grandpa,

Aunt Nell, Uncle Jim, and all the cousins) for their continuing love and support; Mrs. Den-

nie Munson, who plays second-mom to so many of poor, wistful graduate students; the

Executive Board (Dr. Tsai-Tien Tseng, Dr. Marcus Chibucos, Dr. Bryan Lewis, Dr. An-

drea Apolloni, Mr. Tim Driscoll, and Mr. Andrew Warren) for making the rules around

here; the Girlfriend Advisory Board (Mrs. Katie Younger Gehrt, Ms. Rachel DeLauder,

and Dr. Charley Kelly) for their stalwart efforts in the face of dire chances; The Amelia

Earharts (Mr. Ian Firkin, Ms. Joelle Hackney, and Ms. Megan Tiller) for sharing wonderful

iv

musical and life experiences; Mr. Patrick Butler and Dr. Kiran Pashikanti for much needed

company and entertaining conversations; the Lovely Ladies of Clay Street (Ms. Phoebe

Williams and Ms. Kristi Steiner) for providing lodging during the final push; Mr. Wes

Smith for the much-needed breaks away from Blacksburg and computers; Mr. Kyle Parker

and Ms. Jessica Frisch for sending sunny Florida vibes to Blacksburg; Ms. Judy Kuhn for

being a great finder of excellent music, an all-around wonderful person, and a dear friend;

Dr. Mary Ann Moran and Dr. Barny Whitman for their encouragement to pursue a higher

degree; Dr. James Henriksen and Dr. Emily DeCrescenzo Henriksen for their encourage-

ment on finishing the degree; Mrs. Carol Zank-Rehwaldt for caring enough to teach us all

to write clear, coherent prose; and Dr. Kate Janean Steklachich for her adoration, affection,

and irresistible loveliness.

v

Dedication

Dedicated to Mr. Hanks, who spread not only the joy of the study of life, but also the joy

of life, itself.

vi

Contents

1 Introduction 1

1.1 Significant contributions of this dissertation . . . . . . . . . . . . . . . . . . . 5

1.2 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Process-based analysis of gene expression . . . . . . . . . . . . . . . . 6

1.2.2 Computing response networks from treatment-control data . . . . . 10

1.2.3 Computation of inter-process connections . . . . . . . . . . . . . . . . 11

1.3 Overview of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Discovering temporal patterns of expression in hepatocyte cultures 15

2.1 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

vii

2.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.2 Hepatocyte Isolation and Culture . . . . . . . . . . . . . . . . . . . . 21

2.4.3 RNA Extraction and Gene Chip Hybridization . . . . . . . . . . . . . 22

2.4.4 Microarray Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.5 Gene Set Enrichment Analysis . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.1 The transcriptional program in CS cultures steadily and compre-

hensively diverges from that in HMs . . . . . . . . . . . . . . . . . . . 26

2.5.2 Liver-specific gene sets are up-regulated starting on day 1 or day 2

in CS cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5.3 Gene sets involved in cholesterol, fatty-acid, alcohol, and carbohy-

drate metabolism are significantly up-regulated starting on day 1 or

day 2 in CS cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.4 Mono-oxygenases are initially not differentially expressed but re-

cover after day 3 in CS cultures . . . . . . . . . . . . . . . . . . . . . . 39

2.5.5 Cell-cycle activity decreases significantly in CS cultures . . . . . . . . 41

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

viii

3 Discovering Networks of Perturbed Biological Processes in Hepatocyte Cultures 46

3.1 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48


3.4.1 Measuring perturbation from gene expression data . . . . . . . . . . 51

3.4.2 Scoring a link between a pair of processes . . . . . . . . . . . . . . . . 51

3.4.3 Extending the score to include transcriptional data and interaction

weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.4 Assessing the statistical significance of links . . . . . . . . . . . . . . 55

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5.1 Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5.2 Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.5.3 Liver Specific Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.5.4 Liver Specific Gene Sets Regulated by HNF1 . . . . . . . . . . . . . . 73

3.5.5 Lipid Homeostasis and Bile Acid Synthesis . . . . . . . . . . . . . . . 77

3.5.6 Interpretation of Links in CBPLNs . . . . . . . . . . . . . . . . . . . . 87

ix

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 Discovering descriptive networks of processes from gene expression and molec-

ular interaction network data 92

4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


4.3.1 Computing gene expression perturbation . . . . . . . . . . . . . . . . 98

4.3.2 Selection of processes for computation of BPNs . . . . . . . . . . . . 98

4.3.3 The MCMC-BPN algorithm . . . . . . . . . . . . . . . . . . . . . . . . 99

4.3.4 Computation of BPLNs . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.3.5 Measuring redundancy within a BPN . . . . . . . . . . . . . . . . . . 107

4.3.6 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.4.1 CS versus HM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.4.2 Acetaminophen Exposure . . . . . . . . . . . . . . . . . . . . . . . . . 125

4.4.3 Cirrhosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

4.4.4 Very Advanced HCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

x

4.4.5 Behavior of the MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . 144

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5 Conclusion 150

5.1 Summary of presented work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.3 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.4 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Bibliography 156

xi

List of Abbreviations

BP biological process

BPLN Biological Process Linkage Network

BPN Biological Process Network

CBPLN Contextual Biological Process Linkage Network

CC cellular component

CORUM Comprehensive Resource of Mammalian protein complexes

CS collagen sandwich

EBI European Bioinformatics Institute

GenMAPP Gene Map Annotator and Pathway Profiler

GEO Gene Expression Omnibus

GSEA Gene Set Enrichment Analysis

HCC hepatocellular carcinoma

HCV hepatitis C virus

HM hepatocyte monolayer

JI Jaccard Index

KEGG Kyoto Encyclopedia for Genes and Genomes

MCMC Markov chain Monte Carlo

MF molecular function

MiMI Michigan Molecular Interactions

xii

MSigDB Molecular Signatures Database

NCBI National Center for Biotechnology Information

NCI PID National Cancer Institute Pathway Interaction Database

STRING the Search Tool for the Retrieval of Interacting Genes/Proteins

xiii

List of Figures

1.1 Bipartite graph model of GenGO. . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Schematics of two popular liver cell culture systems. . . . . . . . . . . . . . . 18

2.2 Liver-specific up-regulated gene sets. . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Up-regulated gene sets involved in cholesterol, fatty-acid, alcohol, and car-

bohydrate metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Pathway for cholesterol metabolism that shows gene sets involved in this

process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Pathway for alcohol metabolism that shows gene sets involved in this pro-

cess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Up-regulated gene sets involved in urea production. . . . . . . . . . . . . . . 37

2.7 Pathway for urea production that shows gene sets involved in this process. 38

2.8 Gene sets that show recovery after day 3. . . . . . . . . . . . . . . . . . . . . 39

xiv

2.9 Down-regulated gene sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.10 Gene set level network of up-regulated processes in CS cultures. . . . . . . . 44

3.1 Calculating the links score σ(a, b) in an example network. . . . . . . . . . . . 55

3.2 Scatter plots of link p-values for links found to be significant. . . . . . . . . . 66

3.3 CS vs. HM CBPLN on day 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68




3.7 Context-free BPLN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.8 Network of functional interactions resulting in the link between V$HNF1 -

Q6 and HSIAO LIVER SPECIFIC GENES on day 8. . . . . . . . . . . . . . . 75

3.9 The liver regulates two tightly coupled pathways: bile acid synthesis and

fatty acid metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.10 Subgraphs of the CBPLNs involving nuclear receptors and the PPAR sig-

naling, bile acid biosynthesis, and fatty acid metabolism pathways. . . . . . 78

3.11 Network of functional interactions resulting in the link between NUCLEAR -

RECEPTORS and HSA03320 PPAR SIGNALING PATHWAY on day 8. . . . 82

xv

3.12 Network of functional interactions resulting in the link between HSA03320 -

PPAR SIGNALING PATHWAY and HSA00071 FATTY ACID METABOLISM

on day 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.1 An example interaction network and Bayesian network model. . . . . . . . 101

4.2 Pairwise overlaps for CS vs. HM BPNs. . . . . . . . . . . . . . . . . . . . . . 116

4.3 Redundancy of processes and links in CS vs. HM BPNs. . . . . . . . . . . . . 118

4.4 A BPN computed for the CS vs. HM contrast. . . . . . . . . . . . . . . . . . . 120

4.5 Interactions explained by the link between two PPAR-related processes. . . 123

4.6 Redundancy of processes and links in the CS vs. HM BPLN at significance

threshold of 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

4.7 Overlap of interactions explained by the Acetaminophen BPNs. . . . . . . . 126

4.8 Redundancy of processes and links in Acetaminophen BPNs. . . . . . . . . 127

4.9 A BPN computed for the Acetaminophen contrast. . . . . . . . . . . . . . . . 128

4.10 Interactions explained by the link between KEGG VALINE LEUCINE AND -

ISOLEUCINE DEGRADATION and REACTOME METABOLISM OF LIPIDS -

AND LIPOPROTEINS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.11 Redundancy of processes and links in the Acetaminophen BPLN at signif-

icance threshold 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

xvi

4.12 Overlap of interactions explained by the Cirrhosis BPNs. . . . . . . . . . . . 133

4.13 Redundancy of processes and links in Cirrhosis BPNs. . . . . . . . . . . . . . 134

4.14 A BPN computed for the Cirrhosis contrast. . . . . . . . . . . . . . . . . . . . 135

4.15 Redundancy of processes and links in the Cirrhosis BPLN at significance

threshold 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.16 Overlap of interactions explained by the Very Advanced HCC BPNs. . . . . 139

4.17 Redundancy of processes and links in Very Advanced HCC BPNs. . . . . . 140

4.18 A BPN computed for the Very Advanced HCC contrast. . . . . . . . . . . . . 141

4.19 Interactions explained by the link between REACTOME INNATE IMMU-

NITY SIGNALING and REGULATION OF MITOTIC CELL CYCLE. . . . . 143

4.20 Distributions of states and visitation frequencies. . . . . . . . . . . . . . . . . 146

xvii

List of Tables

2.1 Contrasts analyzed using GSEA. . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 The number of differentially-expressed genes in the each of the four CS vs.

HM contrasts at different p-value cutoffs. . . . . . . . . . . . . . . . . . . . . 26

3.1 Contrasts analyzed for contextual BPLNs. . . . . . . . . . . . . . . . . . . . . 59

3.2 Gene sets from MSigDB selected for our analyses. . . . . . . . . . . . . . . . 61

3.3 Comparison of the properties of the CBPLNs computed by using each hy-

pothesis test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4 Comparison of the number of links in the BPLN to the number of links in

the CBPLNs, computed without and with normalization. . . . . . . . . . . . 64

4.1 Data sources for each contrast. . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.2 Statistics on inputs by contrast. . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.3 BPLN statistics for the CS vs. HM contrast. . . . . . . . . . . . . . . . . . . . 122

xviii

4.4 BPLN statistics for the Acetaminophen contrast. . . . . . . . . . . . . . . . . 129

4.5 BPLN statistics for the Cirrhosis contrast. . . . . . . . . . . . . . . . . . . . . 136

4.6 BPLN statistics for the Very Advanced HCC contrast. . . . . . . . . . . . . . 144

xix

Chapter 1

Introduction

Thanks to the continuous improvement in high-throughput experimental techniques in

the life sciences, this past decade has seen a tremendous explosion in the quantity of pub-

licly available biological data—from whole genome sequences [1], to genome-wide gene

expression measurements [2], to gene and protein interactions [3–17]—propelling biol-

ogy, out of necessity, from a reductionist science to a study of systems. This trend has been

most obvious in the evolution of analysis of gene expression data, as exemplified by the

increase in the number of data sets available in public repositories such as the Gene Ex-

pression Omnibus (GEO) [2]. Much gene expression data derives from treatment-control

experiments, where RNA samples are taken from samples exposed to some condition and

contrasted to those not exposed to that condition, or taken from another condition, e.g.,

cancerous versus non-cancerous biopsies, treatment with a drug versus no treatment, or

liver versus a background mix of tissues.

1

2

Early approaches to analysis of treatment-control gene expression experiments reported

lists of genes that exhibited strong perturbation in expression using the t-test and other

univariate test statistics [18, 19], and later, using more sophisticated techniques, such as

fitting linear models to the expression measurements [20]. These methods may report

hundreds or even thousands of genes as significantly perturbed for a given contrast [21],

making interpretation of results difficult. Such voluminous results motivated the devel-

opment of methods and tools to analyze gene expression on the basis of coherent collec-

tions of genes which we refer to in this dissertation as “biological processes” (also known

in the literature as gene sets or pathways). Initial methods focused on detecting over-

representation of genes belonging to a process among the list of perturbed genes [22–25].

Later techniques [26–28] instead sought to identify significant differences in collective ex-

pression of genes belonging to a process, allowing all genes, even those with insignificant

perturbation when measured individually, to contribute to the analysis. The methods

provide a higher level view of the biological phenomena underlying the gene expression

data, however, due to overlap in gene membership in processes, the lists of perturbed

processes can suffer from a large amount of redundancy [29]. The most recent techniques

in process-based expression analysis have thus emphasized computing concise sets of

perturbed processes that nonetheless explain much of the overall perturbation among the

genes [30–33].

Systems biologists have also sought methods to integrate high-throughput data from

multiple sources as a means of gaining biological insights that each source, alone, couldn’t

3

provide [34]. Among these efforts are integration of gene or protein interaction networks

with gene expression data to compute response networks—the subnetworks of genes and

interactions that display a large amount of perturbation or activity in response to some

condition or stimulus [35,36], or for which the genes exhibit significant co-expression [37–

40]. Response networks may contain many thousands of genes and interactions, in much

the same way that gene expression analysis could reveal overwhelmingly large lists of

differentially expressed genes. In a similar approach to that of process-based expression

analysis, researchers turned to summarizing response networks in terms of processes in a

method. For each process, one can assess the enrichment of genes belonging to that pro-

cesses among the genes in the response network, then report the processes significantly

overrepresented in the response network [40, 41].

Recently, the integration of gene interaction networks with process annotations for each

gene has led to novel insights as to how pairs of processes themselves interact on the

basis of the interactions between their respective genes [42–45]. These methods detect

inter-process connections on the basis of static data, more representative of the potential

for interaction rather than relevance to any specific biological condition. We reason that

as the phenotypes of cells or tissues change, so too do the connections between the pro-

cesses perturbed in the response to change. Surprisingly, none of these existing methods

also considered integrating gene expression perturbation to give context to the process

interaction network à la response networks. This gap in methodology provided the mo-

tivation for the work we present in this dissertation.

4

Inspired by methods for computing inter-process connections and response networks,

we developed the first published method which integrates genome-wide gene perturba-

tion measurements, gene annotations to biological process, and a gene interaction net-

work to identify the connections between processes perturbed under a particular biolog-

ical context. We then drew further inspiration from state-of-the-art methods in process-

based expression analysis to develop another novel method that computes a concise, non-

redundant set of perturbed inter-process connections that explains the perturbation of the

underlying gene-gene interactions that serve as the interfaces between the processes.

In this dissertation, we have applied these methods, as well as an existing process-based

expression analysis method [26] to liver-related data sets. In our first study, we analyzed

gene expression data collected from two rat hepatocyte in vitro cultures, the hepatocyte

monolayer (HM) and collagen sandwich (CS) (Chapter 2) [46]. The liver maintains a vari-

ety of physiological functions including lipid, carbohydrate, and amino acid metabolism,

immune response, and detoxification of xenobiotics, and hepatocytes, which comprise

60–70% of the liver, perform the bulk of its metabolic functions [47, 48]. While Dunn et

al. first compared CS and HM occured over two decades ago ago [49, 50], our study rep-

resents the first systems-level analysis of differences in gene expression of hepatocytes in

these cultures.

In our second study, we developed our method for computing context-specific connec-

tions between process, and used it to elucidate the relationships between the processes

we found perturbed in our first study (Chapter 3) [51]. In the final study (Chapter 4),

5

we developed our method for computing non-redundant sets of context-specific inter-

process connections, and apply it to the CS and HM data, as well as to data from a study

on the effects of acetaminophen [52], a model hepatotoxicant [53], as well as data col-

lected from liver biopsies of human patients suffering from infection with Hepatitis C

Virus [54], a major cause of liver damage and hepatocellular carcinoma, one of the most

deadly forms of cancer [55].

1.1 Significant contributions of this dissertation

This dissertation presents methodological advances computational systems biology ca-

pable of reporting high-level trends from high-throughput biological experiments. We

summarize the contributions of this dissertation as follows:

(i) We performed the first systems-level comparison of two important and common

hepatocyte culture systems, identifying biological processes perturbed between the

two conditions.

(ii) We developed a novel method to compute perturbed connections between pro-

cesses, and applied it to the processes identified in (i).

(iii) We developed a novel method to compute a concise, non-redundant set of perturbed

connections between processes that best summarizes the perturbation of the gene-

gene interactions that lie at the interfaces of the processes.

6

1.2 Prior work

In this section, we discuss the techniques of historical importance to those we have de-

veloped. These techniques fall into three categories, each with a respective subsection

below: computation of response networks, process-based analysis of gene expression ex-

periments, and computation of inter-process connections.

1.2.1 Process-based analysis of gene expression

A very common experimental design in genome-wide gene expression experiments (e.g.,

microarrays) partitions the set of biological samples into two subsets, with one subset cor-

responding to an experimental treatment and another subset corresponding to a control.

We call such designs treatment-control experimental designs. Early methods for investi-

gating gene expression perturbation in treatment-control microarray experiments com-

puted lists of differentially expressed genes [18, 19]. These lists may contain thousands

of perturbed genes [21], giving rise to methods for summarizing the results in terms of

biological processes known as enrichment analysis. Enrichment analysis asks whether

more genes belonging to a biological process appear in the list of perturbed genes than

expected by chance (i.e., the process is “over-represented”); a process over-represented in

the list of perturbed genes may be interpreted as perturbed, itself. For each process, one

tests the over-representation of each process using a statistical test of significance such as

Fisher’s Exact Test, then reports those that meet a significance threshold [22–25].

7

Enrichment analysis methods have several shortcomings that lead to loss of sensitivity.

For one, the user must choose the significance threshold at which to declare a gene sig-

nificantly perturbed, which can change the results of enrichment analysis [21, 29]. Ad-

ditionally, they limit computation of process perturbation by considering differences in

expression only in the most perturbed genes [21, 29]. To address these issue, Subrama-

nian et al. developed a method called Gene Set Enrichment Analysis (GSEA) [26]. GSEA

takes a holistic view of finding significant collective perturbation of processes. GSEA con-

siders the difference in expression of all genes in a treatment-control experiment, and so

each gene belonging to a process may contribute to the cumulative perturbation of that

process, regardless of the magnitude of difference for that individual gene. First, Subra-

manian et al. sort the genes into a ranked list on the basis of the difference in expression

between the treatment condition versus the control. They then score each process based

on the concentration of its genes either towards the top or bottom of the sorted list; pro-

cesses composed of genes that consistently show strong correlation with the treatment or

with the control receive high scores. They assess the significance of the score by construct-

ing an empirical distribution of scores after shuffling the treatment and control labels of

the samples and re-calculating scores, then ranking the original score among the empiri-

cal distribution of scores from the randomized data.

GSEA and other methods which test significance on a process-by-process basis suffer

from several drawbacks. For one, they must employ multiple testing correction. For

GSEA and related methods that rely on empirical distributions, the number of random-

8

ized samples required to give the precision necessary to declare significance after cor-

rection can increase rapidly in comparison to the number of tests, which in return can

greatly increase computational time. Additionally, in cases where two or more processes

have many common genes, they may both be reported as significant in the results, while

providing little additional information. Sources of process annotation that organize pro-

cesses as a hierarchy, such as the Gene Ontology (GO) [56], exacerbate the redundancy,

as the more specific processes are composed of subsets of genes in the less specific pro-

cesses [32].

Recent efforts in gene set enrichment have tackled both the challenge of reducing the

number of tests and also reporting a non-redundant significant processes. In 2008, Lu et

al. proposed one such method called GENerative GO Analysis (GenGO) [32], where they

considered the perturbation of individual genes as a noisy observation, generated by a

specific collection of processes that cells activated in response to a particular condition or

stress. The objective of GenGO is to identify a non-redundant set of processes that best

explains the gene expression perturbations observed. To do this, Lu et al. conceptualized

the biological processes and genes forming a bipartite graph, where edges connect genes

to processes in which they participate (see Figure 1.1.

This bipartite graph serves as a generative model: a set of processes is proposed as the set

of perturbed processes, which have generated the perturbation observed at the level of the

individual genes. Thus, genes connected to processes chosen as part of the generating set

are expected be observed as perturbed, while genes connected to no activated processes

9

Figure 1.1: Bipartite graph model of GenGO. Each node on the left, representing aprocess (GO Nodes), is connected by edges to nodes representing genes that belong to

that processes (Gene Nodes). Activation edges are drawn from an included process to itsgenes. [Modified from from Lu et al. [32] under the Creative Commons Attribution

License v2.5.]

are expected to be observed as unperturbed. Lu et al. constructed a likelihood function

to calculate how well the observed perturbations fit a selection of processes proposed as

activated. High scores are given to selection of processes connected to many perturbed

genes, connected to few unperturbed genes, and have few genes in common. Lu et al. use

a greedy algorithm to find selections of processes of high likelihood scores.

Bauer et al. proposed method similar to GenGO, which they called model-based gene set

analysis (MGSA) [33]. Like GenGO, they considered unobservable activation of processes

as generating observable but noisy perturbations in gene expression. In MGSA connec-

tions from the processes to the genes are modeled as a Bayesian network. Bauer et al.

calculated likelihood a given set of perturbed processes generated the observed gene ex-

pression perturbations as a combination of Bernoulli distributions based on several prior

10

probabilities: (i) the probability a gene connected to a perturbed processes will not be ob-

served as perturbed, (ii) the probability a gene connected to no perturbed processes will

be observed as perturbed, and (iii) the prior probability of observing any given process

as perturbed. Bauer et al. used a Markov chain Monte Carlo (MCMC) approach to find

collections of processes of high likelihood, and reported the probability a process should

be considered perturbed as the number of steps in the MCMC in which the process was

selected.

1.2.2 Computing response networks from treatment-control data

Ideker et al. published the first method capable of computing response networks from

treatment-control contrasts, called ActiveModules [35]. ActiveModules discovers sub-

graphs composed of genes with large differential expression. They define score for a

subgraph as a Liptak-Stouffer z-score, computed as the sum of the gene expression per-

turbation measurements divided by the square root of the number of genes within the

subgraph. They then calculate the significance of a subgraph’s score by its deviation from

a mean, estimated from an empirical distribution of scores of subgraphs created by sam-

pling uniformly at random a number of genes equal to the number in the original sub-

graph. Since the problem of finding the highest scoring subgraph is NP-complete [35],

Ideker et al. employ simulated annealing and report the highest scoring connected com-

ponent during its execution.

11

Later, Dittrich et al. extended ActiveModules, using a different method to score subgraphs

and an alternative search algorithm to identify the highest scoring subgraphs [36]. Rather

than simply use the p-values of the perturbation of expression, they model the perturba-

tion as coming from both a “signal component” and a “noise component”, each with its

own distribution. Genes with a large signal component have large positive scores, while

those with a large noise component have negative scores. The authors defined the sub-

graph score as the sum of the scores of the genes within the subgraph. Dittrich et al. de-

fined the problem of finding the connected subgraph of largest score as a Prize-Collecting

Steiner Tree (PCST), which they then solved using integer linear programming [57].

1.2.3 Computation of inter-process connections

As interpretation of gene expression data has moved from the level of individual genes

to processes composed of these genes, so too has the level of analysis of molecular inter-

actions begun to move from gene-gene interactions to discovering interactions between

processes.

Pandey et al. proposed a method to detect chains of connected processes of a specified

length k using gene regulatory networks and gene annotations as the inputs [42]. In this

method, a chain of processes of length k is formed if, for each process in the chain, there

is a corresponding gene in a path of genes in the regulatory network. Each chain of pro-

cesses has a frequency corresponding to how many such pathways in the gene regulatory

12

network exist. Pandey et al. proposed a statistical model based on the coupling of hyper-

geometric distributions for each process in the path, and used this model to compute the

significance of each process chain based on its frequency [58].

Li, Agarwal, and Rajagopalan developed a method that computes “crosstalk” based on

the number of interactions between the genes of each process [43]. They began by count-

ing the number of gene-gene interactions between each pair of processes. To determine

whether this number of interactions is greater than expected by chance, they proceeded

to build empirical distributions of counts. For each process, they create a new random-

ized process by sampling genes uniformly at random such that each gene originally in

the process is replaced by one with an equal number of gene-gene interactions. They then

re-compute the interactions between genes of each pair of processes. Li et al. repeat these

steps enough times to build distributions of sufficient sizes, and then compute the mean

number of interactions between pairs of processes for the pair’s distribution, as well as

the mean number over interactions for all pairs of processes over all randomizations. Fi-

nally, for each pair of processes, they compute the significance of the number of their

interactions using Fisher’s Exact Test, where the terms include the number of interactions

between the processes observed in the original data, the mean number of interactions for

that pair’s distribution, the mean for all distributions, and the total number of interactions

overall. If the p-value is found significant after multiple testing correction, they report the

two processes as having crosstalk.

Dotan-Cohen et al. proposed another method for finding connections between processes

13

called “Biological Process Linkage Networks” (BPLN) [44]. Similar to the approach of

Li et al. [43], BPLN declares one process linked to another process if genes in the first

process have significantly more interactions with genes in the second process than would

be expected by chance. BPLN computes this significance directly using Fisher’s Exact

Test, however, rather than building empirical distributions as described in the approach

by Li et al.

Wang et al. developed a method which calculates a type of connection between two pro-

cesses that they refer to as “functional similarity” [45]. Like BPLN, they consider interac-

tions between genes of two processes in their measure of the strength of the inter-process

connection, however, rather than include only immediate neighboring connections, they

score the connection by the sum of the distances of all genes in the two processes. They

then compare this score to an empirical distribution computed by re-sampling.

1.3 Overview of Chapters

We present here a brief description of the remainder of this document. Chapter 2 de-

scribes a process-level analysis of gene expression data for hepatocytes HM versus CS

cultures. We originally published this work in 2010 in the journal Tissue Engineering Part

C: Methods [46]. Chapter 3 describes a method which integrates gene expression data

with gene interaction networks to detect connections between the processes we detected

as significantly perturbed in Chapter 2. We originally published this work in the journal

14

PLoS ONE in 2011 [51]. Finally, Chapter 4 presents advances which allow computation

of comprehensible, non-redundant sets of perturbed inter-process connections that best

explain the underlying perturbation of expression in the gene interaction network.

Chapter 2

Discovering temporal patterns ofexpression in hepatocyte cultures

2.1 Attribution

This chapter contains material originally published as Yeonhee Kim, Christopher D Lasher,

Logan M Milford, T.M. Murali, Padmavathy Rajagopalan (2010) A comparative study of

Genome-Wide transcriptional profiles of primary hepatocytes in collagen sandwich and

monolayer cultures. Tissue Engineering Part C: Methods 16: 1449–1460. [46].

Dr. Yeonhee Kim performed the work described in Sections 2.4.1–2.4.3, while Mr. Lasher

performed the work described in Section 2.4.5.

15

Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 16

2.2 Abstract

Two commonly used culture systems in hepatic tissue engineering are the collagen sand-

wich (CS) and monolayers of cells. In this study, genome-wide gene expression profiles

of primary hepatocytes were measured over an 8-day period for each cell culture sys-

tem using Affymetrix GeneChips and compared via gene set enrichment analysis to elicit

biologically meaningful information at the level of gene sets. Our results demonstrate

that gene expression in hepatocytes in CS cultures steadily and comprehensively diverges

from that in monolayer cultures. Gene sets up-regulated in CS cultures include several

associated with liver metabolic and synthesis functions, such as metabolism of lipids,

amino acids, carbohydrates, and alcohol, and synthesis of bile acids. Monooxygenases

such as Cytochrome-P450 enzymes do not show any change between the culture systems

after 1 day, but exhibit significant up-regulation in CS cultures after 3 days in comparison

to hepatocyte monolayers. These data provide insights into the up- and down-regulation

of several liver-critical gene sets and their subsequent effects on liver-specific functions.

These results provide a baseline for further explorations into the systems biology of engi-

neered liver mimics.


2.3 Introduction

As one of the important organs in our bodies, the liver performs many essential functions

such as metabolism, synthesis, secretion, and, detoxification [48]. Hepatocytes are the

principal cells in the liver, comprising over 80% of its mass. Hepatocytes perform sev-

eral characteristic functions of the liver, such as lipid metabolism, glucose homeostasis,

regulation of urea, production of plasma proteins, alcohol clearance, and biotransforma-

tion of xenobiotics [48]. In hepatic tissue engineering, two widely used culture systems

are hepatocyte monolayers (HMs) (Figure 2.1a) and the collagen sandwich (CS) (Figure

2.1b) [49, 50]. In HMs, hepatocytes are cultured on a single-collagen gel. Such cells pro-

gressively lose their phenotypic characteristics over time. In CS cultures, hepatocytes

are maintained between two collagen gels and remain stable over extended periods of

time [59, 60]. Studies have indicated that CS cultures exhibit the preservation of differ-

entiated functions including secretion of urea, expression of plasma proteins such as al-

bumin and fibrinogen, polygonal morphology, the presence of bile canaliculi, as well as

the synthesis of gap junction and tight junction proteins [59,60]. Although morphological

and physiological characteristics of hepatocytes in CS cultures have been studied exten-

sively, comprehensive evaluations of temporal genome-wide gene expression programs

in these culture systems have not been reported. Global gene expression of human hep-

atocellular carcinoma cells (HepG2) in monolayer and spheroidal cultures revealed up-

regulated metabolic functions in spheroids but not in monolayer cultures [61]. Since these

data were taken at a single time point, they did not reveal temporal variations. Another


study that monitored temporal gene expression in hepatocyte monolayers cultured over a

three day time period revealed the down-regulation of cytochrome-P450 expression [62].

However, neither did this study investigate longer time points nor did it compare mono-

layers to other, more stable culture conditions. DNA microarray measurements have also

been used to study specific pathways through which toxicity was conferred in human

hepatoblastoma cells [63] and to understand the effects of non-parenchymal cells in 2D

cocultures of hepatocytes with fibroblasts or sinusoidal endothelial cells [64, 65].

Hepatocyte Monolayer (HM)

Collagen gel Hepatocyte(a)

Collagen Sandwich (CS)

Collagen gel Bile CanaliculiHepatocyte(b)

Figure 2.1: Schematics of two popular liver cell culture systems. (a) In a hepatocytemonolayer (HM), hepatocytes reside on a layer of collagen. The hepatocytes show a

progressive loss of phenotype, including cell-shape. (b) In a collagen sandwich (CS), anadditional layer of collagen overlays the hepatocytes. Hepatocytes retain their

phenotype for several weeks.

We hypothesized that the enhanced in vivo liver-like phenotypes in CS cultures were a

result of the underlying differences in the transcriptional program between hepatocytes

cultured in CS and HMs. Accordingly, genome-wide gene expression profiles of primary

hepatocytes were measured at four different time points over an 8-day period for each

cell culture system using Affymetrix GeneChips. Among the wide range of techniques

that are available to analyze DNA microarray data, a method was desired that would

summarize, at the level of predefined biological pathways, the differences between the


culture conditions at each time point. Gene set enrichment analysis (GSEA) [26] was se-

lected since it satisfies this criterion. GSEA is one among a family of techniques that can

summarize differential expression at the level of gene sets [66]. GSEA is widely used,

generates detailed information on the results, and has shown very good performance in

a comparison of methods that compute enrichment at the level of gene sets [67]. Fur-

thermore, GSEA has been used to identify pathways involved in liver toxicity in human

hepatoblastoma cells [63]. GSEA is designed to identify predefined gene sets that are dif-

ferentially expressed in a treatment and a control. All the genes expressed on each gene

chip are ranked based upon their differential expression in CS and HM cultures. There-

fore, a gene set could be important if its members are clustered within the ranked gene

list. GSEA measures the statistical significance of the distribution of ranks within the gene

set against the background of the ranks of all the genes.

Over the 8-day culture period, the gene expression program of hepatocytes in CS cul-

tures monotonically diverged from cells cultured as a monolayer. Gene sets that were

up-regulated to a statistically significant extent in CS cultures included those associated

with liver-specific functions such as bile acid synthesis and lipid, amino acid, carbohy-

drate, and alcohol metabolism. Nuclear receptors, which play a key role in control-

ling the transcriptional activation of target proteins, were up-regulated in CS cultures

on day 1 in culture. Sets containing genes whose expression is mediated by nuclear

receptors were up-regulated in CS systems after 1 day. Gene sets related to xenobiotic

metabolism and monoxygenase activity were not differentially expressed after 1 or 2


days, but showed highly significant up-regulation after 3 days, suggesting a recovery

in expression of the genes in these sets. Numerous gene sets related to the cell cycle

were down-regulated, suggesting that the cell cycle was arrested in hepatocytes main-

tained in CS culture systems in comparison to HMs. These findings recapitulated well-

known aspects of liver function, thereby suggesting that DNA microarrays are a powerful

tool for shedding light on the transcriptional signatures that underlie differences between

these two culture systems. The DNA microarray data generated in this study are avail-

able at NCBI’s Gene Expression Omnibus under accession number GSE20659 at http:

//www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659. All our results are avail-

able at the following supplementary website: http://bioinformatics.cs.vt.edu/∼murali/

supplements/2010-kim-tissue-engineering.

2.4 Materials and Methods

2.4.1 Materials

Dulbecco’s Modified Eagle Medium (DMEM) containing 4.5 g/L glucose, phosphate-

buffered saline (PBS), penicillin, streptomycin, and trypsin-EDTA were obtained from In-

vitrogen Life Technologies (Carlsbad, CA). Type IV collagenase, HEPES [4-(2-hydroxyethyl)

piperazine-1-ethanesulfonic acid], glucagon, and hydrocortisone were obtained from Sigma-

Aldrich. Unless otherwise noted, all chemicals were used as received from Fisher Scien-

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659http://bioinformatics.cs.vt.edu/~murali/supplements/2010-kim-tissue-engineeringhttp://bioinformatics.cs.vt.edu/~murali/supplements/2010-kim-tissue-engineering


tific.

2.4.2 Hepatocyte Isolation and Culture

Primary rat hepatocytes were harvested from female Lewis rats (Harlan) that weighed

between 170 and 200g. Animal care and surgical procedures were conducted as per pro-

cedures approved by Virginia Polytechnic Institute and State University’s Institutional

Animal Care and Use Committee. A two-step in situ collagenase perfusion method was

utilized to excise the liver [59, 60]. Briefly, animals were anesthetized with 3L/min of

a gas mixture of 3% (v/v) isofluorane/97% oxygen (Veterinary Anesthesia Systems Co.).

The liver was perfused through the portal vein with Krebs Ringer Buffer (7.13g/L sodium

chloride, 2.1g/L sodium bicarbonate, 1g/L glucose, 4.76g/L HEPES, and 0.42g/L potas-

sium chloride) that contained 1mM ethylenediaminetetraacetic acid, followed by serial

perfusion with a 0.075% w/v and a 0.1% w/v collagenase (Type IV; Sigma-Aldrich) in

Krebs Ringer Buffer containing 5mM calcium chloride. Cell suspensions were filtered

through nylon meshes with porosity ranging from 250 to 62µm (Small Parts, Inc.). Hepa-

tocytes were separated using a Percoll (Sigma-Aldrich) density centrifugation technique.

Cell viability was determined by trypan blue exclusion. Hepatocytes were cultured on

collagen-coated 6-well sterile tissue culture plates (Becton Dickinson Labware) and were

maintained in a culture medium that consisted of DMEM supplemented with 10% heat-

inactivated fetal bovine serum (Hyclone), 200U/mL penicillin, 200µg/mL streptomycin,

20ng/mL epidermal growth factor (BD Biosciences), 0.5U/mL insulin (USP), 14ng/mL


glucagon, and 7.5g/mL hydrocortisone. A collagen gelling solution was prepared by

mixing nine parts of type I collagen (BD Biosciences) solution and one part of 10 DMEM.

Sterile 6-well tissue culture plates were coated with 0.5mL of the gelling solution and in-

cubated at 37◦C for 1h to promote gel formation. Isolated hepatocytes were suspended

in hepatocyte culture medium at a concentration of 1× 106 cells/mL and seeded on the

collagen-coated wells at a density of 1 million cells/well. CS cultures were formed by the

deposition of a second layer of collagen 1 day after the hepatocytes were seeded [59, 60].

Hepatocytes maintained in stable CS and in unstable confluent HM cultures served as

positive and negative controls, respectively. Hepatocyte cultures were maintained at 37◦C

in a humidified gas mixture of 90% air/10% CO2. The culture medium was replaced ev-

ery 24 h.

2.4.3 RNA Extraction and Gene Chip Hybridization

Primary rat hepatocytes cultured in CS and HM cultures were maintained for an 8-day

culture period. The samples were analyzed at four time points: days 1, 2, 3, and 8 after de-

position of the second layer of collagen gel on hepatocytes. Total RNA was extracted and

purified from cells for each culture system using an RNeasy mini kit (Qiagen) following

the manufacturer’s protocol. Isolated RNA samples in triplicate at each time point were

labeled according to the Affymetrix Standard Target labeling process, hybridized to the

GeneChip Rat Genome 230 2.0 array (Affymetrix), and scanned as described by the manu-

facturer. Complementary RNA (cRNA) synthesis, hybridization, and scanning were per-


Contrast name Treatment Control

Collagen sandwich vs. Monolayer cultures

CS vs. HM 1d Collagen sandwich 1 day Hepatocyte monolayer 1 day

CS vs. HM 2d Collagen sandwich 2 days Hepatocyte monolayer 2 days



Within Collagen sandwich

CS 8d vs. 1d Collagen sandwich 8 days Collagen sandwich 1 day

CS 8d vs. 2d Collagen sandwich 8 days Collagen sandwich 2 days

CS 8d vs. 3d Collagen sandwich 8 days Collagen sandwich 3 days

Table 2.1: Contrasts analyzed using GSEA.

formed at the Virginia Bioinformatics Institute Core Laboratory facility as follows. Briefly,

total RNA was converted into double-stranded complementary DNA using a T7-oligo

(dT) primer (5′–GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(dT)24–3′)

and reverse transcription. Synthesized cDNA was converted into biotinylated cRNA by

transcription using T7 RNA polymerase. Randomly fragmented cRNA was hybridized to

GeneChip and the arrays were washed and stained according to Affymetrix’s protocols.

The arrays were scanned using an Affymetrix 7G scanner.

2.4.4 Microarray Data Analysis

The BioConductor package [68] was used to perform initial statistical analysis of the DNA

microarray data. The data from 24 chips (2 culture conditions × 4 time points × 3 repli-


cates) were normalized using the Robust Multichip Average method for further analysis.

The affylmGUI interface to Linear Models for Microarray Data (LIMMA) [20] was used to

perform differential gene expression analysis for the contrasts shown in Table 2.1. Specif-

ically, for each contrast, LIMMA was used to compute a p-value for each probe set that

indicated the statistical significance of the difference of the expression levels of that probe

set between the two conditions in the contrast.

2.4.5 Gene Set Enrichment Analysis

The normalized gene expression data were analyzed using Gene Set Enrichment Analysis

(GSEA) [26]. Given replicate gene expression measurements for a control phenotype (e.g.,

HM at 1 day) and for a treatment phenotype (e.g., CS at 1 day), GSEA starts by ranking

all genes by the extent of their differential expression in the two phenotypes. Thus, the

lower the rank of a gene, the more up-regulated it is in the treatment, when compared

to the control. Next, given a gene set of interest (e.g., the genes involved in metabolism

of xenobiotics), GSEA uses a modified Kolmogorov-Smirnov test [69] to determine if the

genes in this set have surprisingly high or low ranks ranks of genes in the gene set. This

score has the following interpretation: the more positive the score, the more up-regulated

the genes in the gene set are in the treatment (compared to the control), and the more

negative the score, the more down-regulated the genes in that gene set are. Since the size

of a gene set may influence its enrichment score, GSEA controls this bias by performing

a permutation test and calculating a p-value that represents the statistical significance of


the enrichment score. Finally, GSEA converts the p-value into a q-value that measures the

false discovery rate, after adjusting for multiple hypothesis testing. Note that the q-value

is unsigned but the enrichment score is signed (positive for overall up regulation and

negative for overall down regulation). We applied GSEA using the following criteria:

1. Sort genes in decreasing order of the signal-to-noise measure.

2. Compute p-values using 10,000 permutations of the sample-to-phenotype associa-

tions.

3. Report all gene sets with q-value (false discovery rate) at most 0.2. Note that with

this cut-off, we expect one out of five gene sets to be a false discovery.

2.5 Results

LIMMA and GSEA were applied to compare the two culture conditions, as shown in Ta-

ble 2.1. The first set of four contrasts compared the hepatocyte transcriptional program in

CS cultures to that in HMs, at each of the four time-points analyzed. These contrasts were

expected to reveal time-dependent differences between these two culture conditions. The

second set of three contrasts compared CS samples to each other: 8 days to 1 day, 8 days

to 2 days, and 8 days to 3 days. Such contrasts were expected to provide information on

how transcriptional programs may vary within CS cultures condition over time.


p-value cut-off CS vs. HM 1d CS vs. HM 2d CS vs. HM 3d CS vs. HM 8d

10−5 31 224 1046 2242

0.0001 61 362 1535 3092

0.001 118 569 2277 4287

0.01 276 1095 3497 6185

0.05 552 1812 5134 8551

Table 2.2: The number of differentially-expressed genes in the each of the four CS vs.HM contrasts at different p-value cutoffs.

2.5.1 The transcriptional program in CS cultures steadily and compre-

hensively diverges from that in HMs

For each of the first four contrasts in Table 2.1, the number of differentially-expressed

probe sets was counted after applying different cutoffs on the p-values computed by

LIMMA. The first column in Table 2.2 indicates the p-value cutoff, while each of the other

four columns show the number of probe sets whose p-value meets the cutoff specified in

each row. An important feature revealed by these data is the monotonic divergence be-

tween the transcriptional programs of CS and HM samples over the 8-day culture period.

For each cutoff, the number of differentially-expressed probe sets increased steadily from

day 1 to day 8. Furthermore, this trend was maintained even over a variation of four

orders of magnitude in the p-value cutoff. On day 8, as many as 6185 probe sets had a p-

value of at most 0.01 (2242 had a p-value of at most 10−5). Since, the Affymetrix Rat230 2

GeneChip has 31,099 probe sets, these results suggest widespread transcriptional pertur-

bation in CS cultures compared to HM cultures.


Upon the identification of the global trends, GSEA was employed to study patterns of

differential expression in specific gene sets. GSEA was applied to the gene expression

data obtained through our experiments and to the following gene sets in the Molecular

Signature DataBase (MSigDB): 1892 curated gene sets from various sources such as on-

line pathway databases, publications in PubMed, and knowledge of domain experts; 837

motif gene sets containing genes that share a cis-regulatory motif that is conserved across

the human, mouse, rat, and dog genomes; and 1454 gene sets corresponding to genes

annotated by different Gene Ontology (GO) terms. For each contrast in Table 2.1 and for

each of these gene sets, GSEA was used to compute a q-value.

Gene sets were filtered to those that exhibited a monotonic up-regulation in the CS-HM

comparison. Specifically, gene sets were restricted to those whose q-values decreased

monotonically from 1 day to 8 days and whose enrichment scores were positive in all

four CS-HM contrasts (the first four contrasts in Table 2.1). Gene sets that were monoton-

ically down-regulated in collagen sandwiches over the 8 day period were also identified

(q-values decreasing monotonically and negative enrichment scores in all four CS-HM

contrasts). Since MSigDB collates data from several sources, many gene sets in it have

high degrees of overlap. When an overlapping group of gene sets is up-regulated or

down-regulated in our data, only one gene set per group is discussed below. The com-

plete sets of results are available on our supplementary web page.


2.5.2 Liver-specific gene sets are up-regulated starting on day 1 or day

2 in CS cultures

Color legend for up-regulated q-valuesq-value 1 0.2 0.05 0.01 0.001 0.0001Color

Figure 2.2: Liver-specific up-regulated gene sets. The legend below shows the q-valueranges for each color. The color scheme used in these figures is RdYlGn from Color

Brewer (http://colorbrewer2.org). CS, collagen sandwich; HM, hepatocyte monolayer.

Hsiao [70] created a compendium of gene expression in normal human tissues with the

goal of defining a reference for basic organ systems biology. They identified 251 genes

expressed selectively in the liver, which are included in MSigDB in the HSIAO LIVER -

SPECIFIC GENES gene set. In a similar study, Su as. [71] profiled gene expression from

91 human and mouse samples across a diverse array of tissues, organs and cell lines. They

identified 37 genes that were expressed specifically in human liver tissue samples; these

genes belong to the HUMAN TISSUE LIVER gene set. The gene sets HSIAO LIVER -

SPECIFIC GENES and LIVER SPECIFIC GENES were up-regulated significantly at day

1 and day 2, respectively. They were monotonically up-regulated on subsequent days.

Both these gene sets had insignificant q-values in the CS-CS contrasts, suggesting that

liver-specific genes are up-regulated on day 1 in CS cultures, and that they continue to

http://colorbrewer2.org


be monotonically up-regulated on subsequent days (Figure 2.2).The presence and con-

centration of albumin is often used as a marker of phenotypic function of in vitro hep-

atic models [59, 60]. The Albumin gene (ALB) was expressed in several gene sets such

as HSIAO LIVER SPECIFIC GENES and V$HNF1 Q6 that were up-regulated over the 8

day culture period (Figure 2.2). The promoter regions of genes in the set V$HNF1 Q6 con-

tain binding sites for hepatic nuclear factor (HNF1), a transcription factor that activates

gene expression of albumin [72,73]. This gene set has an overlap of 25 genes with the gene

set HSIAO LIVER SPECIFIC GENES, indicating that HNF1 monotonically up-regulates

the expression of albumin and other liver-specific genes in CS cultures but not in HMs.

These observations support the conclusion that transcriptional programs that have been

identified in other datasets to be liver-specific are active through the 8-day period in CS

cultures but are not active in HMs.

2.5.3 Gene sets involved in cholesterol, fatty-acid, alcohol, and carbo-

hydrate metabolism are significantly up-regulated starting on day

1 or day 2 in CS cultures

Cholesterol metabolism

Cholesterol metabolism is an important component of hepatic phenotypic function [48].

The trends exhibited by gene sets linked to cholesterol metabolism in our data were in-

vestigated. Multiple gene sets involved in cholesterol metabolism were up-regulated in


CS vs. HM 1d 2d 3d 8d

Set Name Description

Cholesterol metabolism HSA00120_BILE_ACID_BIOSYNTHESIS KEGG bile acid synthesis genes

MONOOXYGENASE_ACTIVITY (GO MF) integration of one oxygen atom into a compound CELLULAR_LIPID_METABOLIC_PROCESS (GO BP) lipid reactions and pathways

CARBOXYLIC_ACID_TRANSM_TRANSP (GO MF) transfer of carboxylic acid across a membrane LIPID_TRANSPORT (GO BP) transport into, out of, or between cells NUCLEAR_RECEPTORS GenMAPP nuclear receptor genes

Fatty-Acid Metabolism HSA00071_FATTY_ACID_METABOLISM KEGG fatty acid metabolism pathways MITOCHONDRIAL_FATTY_ACID_BETA GenMAPP fatty acid oxidation in mitochondria PEROXISOME (GO CC) associated with peroxisome HSA03320_PPAR_SIGNALING_PATHWAY KEGG PPAR signaling pathway

Alcohol Metabolism ALCOHOL_METABOLIC_PROCESS (GO BP) reactions and pathways involving alcoholsHSA00071_FATTY_ACID_METABOLISM KEGG fatty acid metabolism pathways HSA00980_METABOLISM_OF_XENOBIOTICS metabolism of xenobiotics by cytochrome P450

Carbohydrate Metabolism GLUCOSE_METABOLIC_PROCESS (GO BP) pathways involving glucose HSA00010_GLYCOLYSIS_AND_GLUCON KEGG glycolysis and gluconeogenesis pathways

Color legend for up-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001Color

Figure 2.3: Up-regulated gene sets involved in cholesterol, fatty-acid, alcohol, andcarbohydrate metabolism. The legend below shows the q-value ranges for each color.

(Abbreviations: CARBOXYLIC ACID TRANSM TRANSP forCARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER ACTIVITY;

MITOCHONDRIAL FATTY ACID BETA forMITOCHONDRIAL FATTY ACID BETAOXIDATION;

HSA00980 METABOLISM OF XENOBIOTICS forHSA00980 METABOLISM OF XENOBIOTICS BY CYTOCHROME P450;

HSA00010 GLYCOLYSIS AND GLUCON forHSA00010 GLYCOLYSIS AND GLUCONEOGENESIS.


Acetyl-Coenzyme A

Cholesterol

Nuclear ReceptorsLiver X receptor (LXR)

Retinoid X receptor (RXR)Cholesterol: LXR ligand; 9-cis retinoic acid: RXR ligand

Cytochrome P450CYP7A1, 8B1, 27A1

Bile AcidsCholic Acid (CA)

Chenodeoxycholic Acid (CDCA)

Nuclear ReceptorsFarnesoid X receptor (FXR)Retinoid X receptor (RXR)

Bile acids: FXR ligand; 9-cis retinoic acid: RXR ligand

ATP-Binding Cassette Transporter

ABCB11

Nuclear ReceptorSmall heterodimer partner

(SHP)

Activation

Inhibition

(A)

(B)

(C-D)

(D)

(A-B)

(E) (A)

No.

CARBOXYLIC_ACID_TRANSMEMBRANE_TRANSPORTER_ACTIVITY

CELLULAR_LIPID_METABOLIC_PROCESSANUCLEAR_RECEPTORSBMONOOXYGENASE_ACTIVITYCHSA00120_BILE_ACID_BIOSYNTHESISD

E

Name of gene set

[21-25]

[21-26]

[27]

[21-25, 28]

[21-24]

[21-25]

[28-29]

[28]

Figure 2.4: Pathway for cholesterol metabolism that shows gene sets involved in thisprocess.


CS cultures compared to HMs. These gene sets include HSA00120 BILE ACID BIOSYN-

THESIS, MONOOXYGENASE ACTIVITY, CELLULAR LIPID METABOLIC PROCESS,

NUCLEAR RECEPTORS, and CARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER -

ACTIVITY (Figure 2.3). Bile acids mediate cholesterol metabolism and the synthesis of

bile acids is initiated through the activity of CYP7A1, CYP8B1 and CYP27A1 enzymes

[74–76] (Figure 2.4). In CS samples, the three CYP enzymes mentioned above are present

either in the gene set HSA00120 BILE ACID BIOSYNTHESIS or in the gene set MONOOXY-

GENASE ACTIVITY, both of which are up-regulated in CS cultures. The gene expression

of CYP enzymes is activated by nuclear receptors—specifically, the retinoid X receptor

(RXR) and the liver X receptor (LXR) [77–80]. The gene set NUCLEAR RECEPTORS,

which contains nuclear receptors involved in the activation of hepatic functions, behaved

similarly to the liver-specific gene sets discussed earlier: it had insignificant q-values in

the CS-CS contrasts, suggesting that nuclear receptors were up regulated on day 1 in CS

cultures and remained up-regulated on subsequent days. The nuclear receptor Farnesoid

X receptor (FXR) plays a critical role in liver functioning. FXR is responsible for regulating

the concentration of bile acids [74–78,81,82]. Bile acid-mediated activation of FXR leads to

the transcriptional activation of the ATP-binding cassette transporter B11 (ABCB11, also

known as bile salt export pump), a process that is crucial for cholesterol secretion into

the bile canaliculi [80, 81]. In CS samples, the transcription of ABCB11, present in gene

set CARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER ACTIVITY, was shown

to be up-regulated over the culture period in comparison to HMs. This gene set contains


genes annotated with the Gene Ontology molecular function that involves the catalysis of

the transfer of carboxylic acids from one side of the membrane to the other. These trends

and data indicate that genes responsible for the formation, transformation, and transport

of bile acids are up-regulated in CS cultures, thereby promoting cholesterol metabolism.

Fatty Acid Metabolism (PPARα-mediated metabolism)

Peroxisome proliferator-activated receptor α (PPARα) is a nuclear receptor that activates

gene expression of enzymes linked to fatty acid metabolism [83–86]. PPARα-mediated

fatty acid metabolism initiates transcriptional activation of liver fatty acidbinding protein

(L-FABP or FABP1), which deliver fatty acids to its cognitive nuclear receptor, PPARα,

and promote expression of two transporters, ABCD2 and ABCD3, which are necessary

to transport fatty acids into peroxisomes, where target enzymes catalyze the clearance of

fatty acids [74, 83, 85, 86]. PPARα, being dependent on intracellular FABP concentrations,

regulates expression of Acyl-CoA oxidases (ACOXs), short/branched-, long-and very

long-chain Acyl-CoA dehydrogenase (ACADs), and mitochondrial enzymes involved in

-oxidation [87–91]. In our data, the gene sets involved in PPARα-mediated fatty acid

metabolism are HSA00071 FATTY ACID METABOLISM, PEROXISOME, HSA03320 PPAR -

SIGNALING PATHWAY, and MITOCHONDRIAL FATTY ACID BETAOXIDATION (Fig-

ure 2.3). All these gene sets were monotonically up-regulated in CS samples over the 8-

day period in contrast to HMs. The gene FABP1, which belongs to the gene set HSA03320 -

PPAR SIGNALING PATHWAY, was up-regulated in CS cultures and its expression in-


creased over time. In response to the expression of FABP1, the gene PPARα is expressed

[83]. The PPARα signaling pathway promotes the transcriptional activation of fatty acid

metabolic enzymes [ACOX (acyl-CoA oxidase), ACAD (acyl-CoA dehydrogenase), CAT

(carnitine palmitoyltransferase), LPL (lipoprotein lipase) and ACAT (acetyl-CoA acetyl-

transferease)] [83, 87–89]. These genes are members of the HSA00071 FATTY ACID -

METABOLISM, PEROXISOME, and MITOCHONDRIAL FATTY ACID BETAOXIDATION

gene sets. The combination of the expression of key enzymes responsible for fatty acid

metabolism as well as the expression of two members of the ABC transporter family

(ABCD2 and ABCD3) indicate that PPARα-mediated metabolism was up-regulated in CS

samples.

Alcohol Metabolism

Alcohol, specifically ethanol, is metabolized in the liver by several enzymes present in

the subcellular compartments of hepatocytes. Alcohol dehydrogenase (ADH), a key cy-

toplasmic enzyme plays an important role in converting ethanol to acetaldehyde [92–94].

Acetaldehyde, a toxic molecule, is subsequently converted to nontoxic acetates by mi-

tochondrial acetaldehyde dehydrogenase (ALDH) [92, 94]. Additionally, CYP2E1 en-

ables the clearance of ethanol through an oxidative reaction [94–96] (Figure 2.5). The

gene sets ALCOHOL METABOLIC PROCESS, HSA00980 METABOLISM OF XENOBI-

OTICS BY CYTOCHROME P450, and HSA00071 FATTY ACID METABOLISM were up-

regulated over time in CS cultures in comparison to HMs (Figure 2.3). These gene sets


Ethanol

Acetaldehyde

Acetate

CO2+H2O

Aldehydedehydrogenase

Cytochrome P450 2E1Alcohol

dehydrogenase

NAD+

NADH

NAD+

NADHNAD+: Nicotinamide adenine dinucleotideNADH: Nicotinamide adenine dinucelotide (Reduced form)

A

D

CB

No.

HSIAO_LIVER_SPECIFIC_GENES

HSA00980_METABOLISM_OF_XENOBIOTICS_BY_CYTOCHROME_P450

HSA00071_FATTY_ACID_METABOLISMALCOHOL_METABOLIC_PROCESSName of gene set

(A-C)

(A-C)

(C, D)[39-41]

[39-41]

[39-43]

Figure 2.5: Pathway for alcohol metabolism that shows gene sets involved in thisprocess.


include three alcohol dehydrogenase genes ADH, ADH1A, and ADH7 that mediate the

transformation of alcohol.

Carbohydrate Metabolism

Gluconeogenesis and glycolysis are essential to maintain glucose homeostasis [97–100].

The maintenance of a healthy glucose level is dependent upon the presence and con-

centration of insulin and glucagon [97–99]. In our gene expression data, genes for key

enzymes involved in the formation and metabolism of glucose are up-regulated in CS cul-

tures in contrast to HMs. The relevant gene sets include HSA00010 GLYCOLYSIS AND -

GLUCONEOGENESIS and GLUCOSE METABOLIC PROCESS (Figure 2.3). Specifically,

genes corresponding to enzymes implicated in glycolysis are hexokinase (HK) , glucose

phosphate isomerase (or phosphoglucose isomerase, GPI or PGI), phosphofructokinase

(PFK), aldolase (ALDOB and ALDOC), triosephosphate isomerase (TPI1), phosphoglyc-

erate kinase 1 (PGK1), phosphoglycerate mutase (PGAM), pyruvate kinase (PKLR), and

lactate dehydrogenase C (LDH) (terminal) [100]. PKLR catalyzes the transphosphory-

lation of phosphoenolpyruvate into pyruvate and ATP, which is the rate-limiting step

of glycolysis. LDH catalyzes the terminal step in glycolysis. Genes that code for en-

zymes involved in gluconeogenesis are phosphoenolpyruvate carboxykinase 1 (PCK or

PEPCK), glucose-6-phosphatase (G6PC), pyruvate carboxylase (PC) and fructose-1, 6-

bisphophatase 1 (FBP1) [99–101]. In the gene sets related to carbohydrate metabolism,

genes such as ALDOB, ALDOC, PKLR, PFK, PGK1, and GPI that are involved in glycoly-


sis and genes such as G6PC, FBP1, PCK1, and PC, which are involved in gluconeogenesis,

were upregulated in CS samples.

Urea Production

Color legend for up-regulated q-valuesq-value 1 0.2 0.05 0.01 0.001 0.0001Color

Figure 2.6: Up-regulated gene sets involved in urea production. The legend belowshows the q-value ranges for each color. Abbreviations: GO, gene ontology; BP,

biological process; HSA00220 UREA CYCLE forHSA00220 UREA CYCLE AND METABOLISM OF AMINO GROUPS; KEGG, KyotoEncyclopedia of Genes and Genomes; GenMAPP, Gene Map Annotator and Pathway

Profiler.

In the liver, the formation of urea is a critical step in ammonia clearance. The metabolism

of amino acids results in the formation of urea through the conversion of glutamate,

an intermediate metabolite [48]. Gene sets involved in glutamate metabolism such as

HSA00251 GLUTAMATE METABOLISM are gradually up-regulated over time in CS cul-

tures (Figure 2.6). Urea is formed as a result of the action of five enzymes: carbamoyl

phosphate synthetase-1 (CPS-1), ornithine transcarbamoylase (OTC), argininosuccinate


Glutamine

Glutamate

NH4+ CarbamoylPhosphate OrnithineTranscarbamoylase

Argininosuccinatesynthase Argininosuccinate

lysase

Arginase

Urea

Aspartate

Urea Cycle

Glutaminase

Glutamatedehydrogenase

Carbamoylphosphatesynthase-1

Citrulline

Arginino-succinate

Arginine

Ornithine

(A)

(A)

(B)

(C, D)

(C)(C, D)

(C, D)

NITROGEN_COMPOUND_METABOLIC_PROCESS DC

BANo.

NITROGEN_COMPOUND_CATABOLIC_PROCESS

HSA00251_GLUTAMATE_METABOLISMHSA00220_UREA_CYCLE_AND_METABOLISM_OF_AMINO_GROUPS

Name of gene set

[52-54]

[52-54]

[49-54]

[49-54]

[49-54]

[49-54]

[49-54]

Figure 2.7: Pathway for urea production that shows gene sets involved in this process.

synthase (ASS), argininosuccinate lysase (ASL), and arginase (ARG) [102–107] (Figure 2.7).

These five genes are present in the gene set HSA00220 UREA CYCLE AND METABOLISM -

OF AMINO GROUPS. This gene set is up-regulated in CS cultures over the 8-day period.

In addition, the gene sets NITROGEN COMPOUND CATABOLIC PROCESS and NI-

TROGEN COMPOUND METABOLIC PROCESS include genes such as ASL, ARG and

ASS. Both gene sets are also monotonically up-regulated in CS cultures. The nuclear re-

ceptor HNF-4α (a member of the up-regulated gene set NUCLEAR RECEPTORS) plays

an important role in triggering the transcription of key enzymes for urea production [102].

Together, these data provide information on why urea production is stable in CS cultures

but not in HMs.


2.5.4 Mono-oxygenases are initially not differentially expressed but re-

cover after day 3 in CS cultures

CS vs. HM 1d 2d 3d 8d

Set Name Description

0.03 0.01 6e-03 7e-05 MONOOXYGENASE_ACTIVITY (GO MF) integration of one oxygen atom into a compound

1.00 0.20 2e-03 1e-03 HSA00980_METABOLISM_OF_XENOBIOTICS metabolism of xenobiotics by cytochrome P450

Color legend for up-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001 Color

Figure 2.8: Gene sets that show recovery after day 3. We show the q-values to underscorethe recovery. MF, molecular function.

Xenobiotic metabolism in the liver is mediated through cytochrome P450 enzymes [108–

111]. Expression of these enzymes has been shown to decrease upon the isolation of

hepatocytes from the liver [110, 111]. The gene set MONOOXYGENASE ACTIVITY con-

tains several cytochrome P450 genes and flavin containing monooxygenase. This gene

set had q-values of 0.03, 0.01, 6× 10−3, and 7× 10−5 at days 1, 2, 3, and 8, respectively

(Figure 2.8). The q-values at day 1 and day 2 were nearly identical but decreased by an

order of magnitude at day 3 and by another order of magnitude at day 8. These trends

were examined further by computing the q-values for this gene set in the CS-CS con-

trasts. This gene set was up-regulated with q-value 0.13 for the day 8-day 1 contrast,

0.11 for the day 8-day 2 contrast, and 1.00 for the day 8-day 3 contrast. The statistical

significance values for corresponding contrasts among the HM samples were also com-

puted. This gene set was down-regulated in all three contrasts. The q-values were 0.13,


0.06, and 0.13, respectively. Thus, the variation in expression of this gene set arises from a

combination of up-regulation in CS cultures and down-regulation in HM cultures. Taken

together, these trends indicate that the genes in this set recovered or became up-regulated

at day 3 and later within CS cultures, in comparison to HM cultures. This trend is of

significance since genes in this set include those that encode for CYP3A, CYP4A, CYP1A

and CYP2C enzymes. CYP3A and 4A enzymes metabolize a wide range of pharmaceu-

ticals and drugs and the CYP2C and CYP1A enzymes break down toxins and xenobi-

otics. The gene set HSA00980 METABOLISM OF XENOBIOTICS BY CYTOCHROME -

P450, which contains cytochrome P450s, phase II metabolizing enzymes such as UDP-

glucuronosyltransferase (UDP-GT) isoforms, and glutathione S-transferase (GST), also

exhibited a similar trend. It showed no significant regulation at day 1, had a q-value of

0.20 at day 2, but had q-values an order of magnitude less at days 3 and 8 (2× 10−3 and

1× 10−3, respectively). In the three CS-CS contrasts, the q-values for this gene set were

0.12, 0.05, and 0.13 (all for up regulation), while they were 0.1, 0.12, and 1 in the HM-HM

contrasts (all for down regulation). In a previous study, the expression of a single CYP

enzyme, specifically, CYP1A1 was monitored and was shown to “recover” on day 3 [112].

However, additional time points or CYP enzymes were not investigated. The results in

this work point to a more widespread recovery phenomenon among the CYP gene family.


CS vs. HM Set Name Description 1d 2d 3d 8d

Cell Cycle MITOTIC_CELL_CYCLE (GO BP) participation in eukaryotic cell cycle events CELL_CYCLE_KEGG KEGG cell cycle pathway

Nuclear Transport

NUCLEOCYTOPLASMIC_TRANSPORT (GO BP) movement of molecules between nucleus and cytoplasm PROTEIN_IMPORT_INTO_NUCLEUS (GO BP) protein transport into nucleus

Cell Replication MICROTUBULE_CYTOSKELETON (GO CC) microtubules of the cytoskeleton MICROTUBULE_ORGANIZING_CENTER (GO CC) region of microtubule growth

SPINDLE (GO CC) microtubule array for segregating duplicated chromosomes CENTROSOME (GO CC) centriole and spindles

Color legend for down-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001 Color

Figure 2.9: Down-regulated gene sets. The legend below the table shows the q-valueranges for each color. CC, cellular component.

2.5.5 Cell-cycle activity decreases significantly in CS cultures

Analysis of the down regulated gene sets presented interesting insights into cellular re-

sponse within CS and HM systems. Our results suggested a significant difference in cell

cycle activity between the HM and CS samples over the 8 day culture period. The mono-

tonic down regulation of the gene sets MITOTIC CELL CYCLE and CELL CYCLE KEGG

coupled with insignificant q-values in the CS-versus-CS comparisons (data not shown)

suggested decreasing cell cycle activity within the CS cultures (Figure 2.9). Nuclear trans-

port and import functions show decreased activity within CS samples as indicated by the

monotonically down regulated gene sets NUCLEOCYTOPLASMIC TRANSPORT a

discovering contextual connections between biological ... · discovering contextual connections...

Documents