discovering contextual connections between biological ... · discovering contextual connections...
TRANSCRIPT
-
Discovering contextual connections between biological
processes using high-throughput data
Christopher D. Lasher
Dissertation submitted to the Faculty of the
Virginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
in
Genetics, Bioinformatics, and Computational Biology
T. M. Murali, Co-Chair
Padmavathy Rajagopalan, Co-Chair
Richard F. Helm
Madhav V. Marathe
Naren Ramakrishnan
September 12, 2011
Blacksburg, Virginia
Keywords: computational systems biology, liver, Markov chain Monte Carlo, molecular
interactions, gene expression
Copyright 2011, Christopher D. Lasher
-
Discovering contextual connections between biological processes usinghigh-throughput data
Christopher D. Lasher
ABSTRACT
Hearkening to calls from life scientists for aid in interpreting rapidly-growing reposi-tories of data, the fields of bioinformatics and computational systems biology continueto bear increasingly sophisticated methods capable of summarizing and distilling perti-nent phenomena captured by high-throughput experiments. Techniques in analysis ofgenome-wide gene expression (e.g., microarray) data, for example, have moved beyondsimply detecting individual genes perturbed in treatment-control experiments to report-ing the collective perturbation of biologically-related collections of genes, or “processes”.Recent expression analysis methods have focused on improving comprehensibility of re-sults by reporting concise, non-redundant sets of processes by leveraging statistical mod-eling techniques such as Bayesian networks.
Simultaneously, integrating gene expression measurements with gene interaction net-works has led to computation of response networks—subgraphs of interaction networksin which genes exhibit strong collective perturbation or co-expression. Methods that inte-grate process annotations of genes with interaction networks identify high-level connec-tions between biological processes, themselves. To identify context-specific changes inthese inter-process connections, however, techniques beyond process-based expressionanalysis, which reports only perturbed processes and not their relationships, responsenetworks, composed of interactions between genes rather than processes, and existingtechniques in process connection detection, which do not incorporate specific biologicalcontext, proved necessary.
We present two novel methods which take inspiration from the latest techniques in process-based gene expression analysis, computation of response networks, and computation ofinter-process connections. We motivate the need for detecting inter-process connectionsby identifying a collection of processes exhibiting significant differences in collective ex-pression in two liver tissue culture systems widely used in toxicological and pharma-ceutical assays. Next, we identify perturbed connections between these processes via anovel method that integrates gene expression, interaction, and annotation data. Finally,we present another novel method that computes non-redundant sets of perturbed inter-process connections, and apply it to several additional liver-related data sets. These appli-cations demonstrate the ability of our methods to capture and report biologically relevanthigh-level trends.
This work was supported by NSF CBET #0933225, the ICTAS ISBET at Virginia Tech, andthe GBCB Interdisciplinary Ph.D. Program of Virginia Tech.
-
Acknowledgments
The author thanks the following people and organizations for their contributions to this
dissertation: Prof. T.M. Murali, Prof. Padma Rajagopalan, Prof. Rich Helm, Prof. Madhav
Marathe, and Prof. Naren Ramakrishnan for their continued guidance through the course
of the author’s graduate research; the past and present members of the Rajagopalan
group, especially Dr. Yeonhee Kim, Dr. Christopher Detzel, and Mr. Adam Larkin for their
tremendous effort in obtaining gene expression data from hepatocyte cultures used in this
work, and for their helpful discussions and collegial support; the past and present mem-
bers of the Murali group, including Dr. Arjun Krishnan, Mr. Christopher Poirel, Mr. Yared
Kidane, Mr. Naveed Massjouni, Ms. Danielle Choi and Mr. Ahsanur Raman for helpful
discussions and collegial support, and Mr. Phillip J. Whisenhunt for assistance in creating
plots for Chapter 4; the staff at the Virginia Bioinformatics Institute (VBI) Core Labo-
ratory facility for their assistance with obtaining gene expression measurements for the
liver tissue cultures; Dr. Bryan Lewis and Dr. Keith Bissett of the Network Dynamics and
Simulation Science Laboratory at the Virginia Bioinformatics Institute at Virginia Tech for
helpfully providing access to computational resources.
iii
-
Financial support for this work was generously provided by National Science Foun-
dation (NSF) Chemical, Bioengineering, Environmental, and Transport Systems (CBET)
#0933225 “Transcriptional Signatures of 3D Liver Mimetic Architectures”, the Institute
for Critical Technology and Applied Sciences (ICTAS) Center for Systems Biology of En-
gineered Tissues (ISBET) at Virginia Tech, and the Genetics, Bioinformatics, and Compu-
tational Biology (GBCB) Interdisciplinary Ph.D. Program of Virginia Tech.
The author thanks the Python Software Foundation, particularly Mr. Jesse Noller, Mr. Steve
Holden, Mr. Van Lindberg, Mr. Jacob Kaplan-Moss, and the Python community, in gen-
eral, for their maintenance of not only an arguably superior programming language, but
a fantastic programming community as well. The author thanks the Stack Overflow com-
munity for their collective support in answering many, many questions raised through
the process of programming the software used throughout this dissertation.
The author gives his heartfelt thanks to the following people who made completion of this
work possible: his family (Mom, Dad, Richard, Bammee, Pop Pop, Grandma, Grandpa,
Aunt Nell, Uncle Jim, and all the cousins) for their continuing love and support; Mrs. Den-
nie Munson, who plays second-mom to so many of poor, wistful graduate students; the
Executive Board (Dr. Tsai-Tien Tseng, Dr. Marcus Chibucos, Dr. Bryan Lewis, Dr. An-
drea Apolloni, Mr. Tim Driscoll, and Mr. Andrew Warren) for making the rules around
here; the Girlfriend Advisory Board (Mrs. Katie Younger Gehrt, Ms. Rachel DeLauder,
and Dr. Charley Kelly) for their stalwart efforts in the face of dire chances; The Amelia
Earharts (Mr. Ian Firkin, Ms. Joelle Hackney, and Ms. Megan Tiller) for sharing wonderful
iv
-
musical and life experiences; Mr. Patrick Butler and Dr. Kiran Pashikanti for much needed
company and entertaining conversations; the Lovely Ladies of Clay Street (Ms. Phoebe
Williams and Ms. Kristi Steiner) for providing lodging during the final push; Mr. Wes
Smith for the much-needed breaks away from Blacksburg and computers; Mr. Kyle Parker
and Ms. Jessica Frisch for sending sunny Florida vibes to Blacksburg; Ms. Judy Kuhn for
being a great finder of excellent music, an all-around wonderful person, and a dear friend;
Dr. Mary Ann Moran and Dr. Barny Whitman for their encouragement to pursue a higher
degree; Dr. James Henriksen and Dr. Emily DeCrescenzo Henriksen for their encourage-
ment on finishing the degree; Mrs. Carol Zank-Rehwaldt for caring enough to teach us all
to write clear, coherent prose; and Dr. Kate Janean Steklachich for her adoration, affection,
and irresistible loveliness.
v
-
Dedication
Dedicated to Mr. Hanks, who spread not only the joy of the study of life, but also the joy
of life, itself.
vi
-
Contents
1 Introduction 1
1.1 Significant contributions of this dissertation . . . . . . . . . . . . . . . . . . . 5
1.2 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Process-based analysis of gene expression . . . . . . . . . . . . . . . . 6
1.2.2 Computing response networks from treatment-control data . . . . . 10
1.2.3 Computation of inter-process connections . . . . . . . . . . . . . . . . 11
1.3 Overview of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Discovering temporal patterns of expression in hepatocyte cultures 15
2.1 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vii
-
2.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Hepatocyte Isolation and Culture . . . . . . . . . . . . . . . . . . . . 21
2.4.3 RNA Extraction and Gene Chip Hybridization . . . . . . . . . . . . . 22
2.4.4 Microarray Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.5 Gene Set Enrichment Analysis . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.1 The transcriptional program in CS cultures steadily and compre-
hensively diverges from that in HMs . . . . . . . . . . . . . . . . . . . 26
2.5.2 Liver-specific gene sets are up-regulated starting on day 1 or day 2
in CS cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.3 Gene sets involved in cholesterol, fatty-acid, alcohol, and carbohy-
drate metabolism are significantly up-regulated starting on day 1 or
day 2 in CS cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.4 Mono-oxygenases are initially not differentially expressed but re-
cover after day 3 in CS cultures . . . . . . . . . . . . . . . . . . . . . . 39
2.5.5 Cell-cycle activity decreases significantly in CS cultures . . . . . . . . 41
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
viii
-
3 Discovering Networks of Perturbed Biological Processes in Hepatocyte Cultures 46
3.1 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.1 Measuring perturbation from gene expression data . . . . . . . . . . 51
3.4.2 Scoring a link between a pair of processes . . . . . . . . . . . . . . . . 51
3.4.3 Extending the score to include transcriptional data and interaction
weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.4 Assessing the statistical significance of links . . . . . . . . . . . . . . 55
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.1 Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.2 Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5.3 Liver Specific Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5.4 Liver Specific Gene Sets Regulated by HNF1 . . . . . . . . . . . . . . 73
3.5.5 Lipid Homeostasis and Bile Acid Synthesis . . . . . . . . . . . . . . . 77
3.5.6 Interpretation of Links in CBPLNs . . . . . . . . . . . . . . . . . . . . 87
ix
-
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4 Discovering descriptive networks of processes from gene expression and molec-
ular interaction network data 92
4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3.1 Computing gene expression perturbation . . . . . . . . . . . . . . . . 98
4.3.2 Selection of processes for computation of BPNs . . . . . . . . . . . . 98
4.3.3 The MCMC-BPN algorithm . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3.4 Computation of BPLNs . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.3.5 Measuring redundancy within a BPN . . . . . . . . . . . . . . . . . . 107
4.3.6 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.4.1 CS versus HM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.2 Acetaminophen Exposure . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.4.3 Cirrhosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.4.4 Very Advanced HCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
x
-
4.4.5 Behavior of the MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5 Conclusion 150
5.1 Summary of presented work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.3 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.4 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Bibliography 156
xi
-
List of Abbreviations
BP biological process
BPLN Biological Process Linkage Network
BPN Biological Process Network
CBPLN Contextual Biological Process Linkage Network
CC cellular component
CORUM Comprehensive Resource of Mammalian protein complexes
CS collagen sandwich
EBI European Bioinformatics Institute
GenMAPP Gene Map Annotator and Pathway Profiler
GEO Gene Expression Omnibus
GSEA Gene Set Enrichment Analysis
HCC hepatocellular carcinoma
HCV hepatitis C virus
HM hepatocyte monolayer
JI Jaccard Index
KEGG Kyoto Encyclopedia for Genes and Genomes
MCMC Markov chain Monte Carlo
MF molecular function
MiMI Michigan Molecular Interactions
xii
-
MSigDB Molecular Signatures Database
NCBI National Center for Biotechnology Information
NCI PID National Cancer Institute Pathway Interaction Database
STRING the Search Tool for the Retrieval of Interacting Genes/Proteins
xiii
-
List of Figures
1.1 Bipartite graph model of GenGO. . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Schematics of two popular liver cell culture systems. . . . . . . . . . . . . . . 18
2.2 Liver-specific up-regulated gene sets. . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Up-regulated gene sets involved in cholesterol, fatty-acid, alcohol, and car-
bohydrate metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Pathway for cholesterol metabolism that shows gene sets involved in this
process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Pathway for alcohol metabolism that shows gene sets involved in this pro-
cess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Up-regulated gene sets involved in urea production. . . . . . . . . . . . . . . 37
2.7 Pathway for urea production that shows gene sets involved in this process. 38
2.8 Gene sets that show recovery after day 3. . . . . . . . . . . . . . . . . . . . . 39
xiv
-
2.9 Down-regulated gene sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.10 Gene set level network of up-regulated processes in CS cultures. . . . . . . . 44
3.1 Calculating the links score σ(a, b) in an example network. . . . . . . . . . . . 55
3.2 Scatter plots of link p-values for links found to be significant. . . . . . . . . . 66
3.3 CS vs. HM CBPLN on day 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4 CS vs. HM CBPLN on day 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 CS vs. HM CBPLN on day 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.6 CS vs. HM CBPLN on day 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Context-free BPLN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.8 Network of functional interactions resulting in the link between V$HNF1 -
Q6 and HSIAO LIVER SPECIFIC GENES on day 8. . . . . . . . . . . . . . . 75
3.9 The liver regulates two tightly coupled pathways: bile acid synthesis and
fatty acid metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.10 Subgraphs of the CBPLNs involving nuclear receptors and the PPAR sig-
naling, bile acid biosynthesis, and fatty acid metabolism pathways. . . . . . 78
3.11 Network of functional interactions resulting in the link between NUCLEAR -
RECEPTORS and HSA03320 PPAR SIGNALING PATHWAY on day 8. . . . 82
xv
-
3.12 Network of functional interactions resulting in the link between HSA03320 -
PPAR SIGNALING PATHWAY and HSA00071 FATTY ACID METABOLISM
on day 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1 An example interaction network and Bayesian network model. . . . . . . . 101
4.2 Pairwise overlaps for CS vs. HM BPNs. . . . . . . . . . . . . . . . . . . . . . 116
4.3 Redundancy of processes and links in CS vs. HM BPNs. . . . . . . . . . . . . 118
4.4 A BPN computed for the CS vs. HM contrast. . . . . . . . . . . . . . . . . . . 120
4.5 Interactions explained by the link between two PPAR-related processes. . . 123
4.6 Redundancy of processes and links in the CS vs. HM BPLN at significance
threshold of 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.7 Overlap of interactions explained by the Acetaminophen BPNs. . . . . . . . 126
4.8 Redundancy of processes and links in Acetaminophen BPNs. . . . . . . . . 127
4.9 A BPN computed for the Acetaminophen contrast. . . . . . . . . . . . . . . . 128
4.10 Interactions explained by the link between KEGG VALINE LEUCINE AND -
ISOLEUCINE DEGRADATION and REACTOME METABOLISM OF LIPIDS -
AND LIPOPROTEINS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.11 Redundancy of processes and links in the Acetaminophen BPLN at signif-
icance threshold 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
xvi
-
4.12 Overlap of interactions explained by the Cirrhosis BPNs. . . . . . . . . . . . 133
4.13 Redundancy of processes and links in Cirrhosis BPNs. . . . . . . . . . . . . . 134
4.14 A BPN computed for the Cirrhosis contrast. . . . . . . . . . . . . . . . . . . . 135
4.15 Redundancy of processes and links in the Cirrhosis BPLN at significance
threshold 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.16 Overlap of interactions explained by the Very Advanced HCC BPNs. . . . . 139
4.17 Redundancy of processes and links in Very Advanced HCC BPNs. . . . . . 140
4.18 A BPN computed for the Very Advanced HCC contrast. . . . . . . . . . . . . 141
4.19 Interactions explained by the link between REACTOME INNATE IMMU-
NITY SIGNALING and REGULATION OF MITOTIC CELL CYCLE. . . . . 143
4.20 Distributions of states and visitation frequencies. . . . . . . . . . . . . . . . . 146
xvii
-
List of Tables
2.1 Contrasts analyzed using GSEA. . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 The number of differentially-expressed genes in the each of the four CS vs.
HM contrasts at different p-value cutoffs. . . . . . . . . . . . . . . . . . . . . 26
3.1 Contrasts analyzed for contextual BPLNs. . . . . . . . . . . . . . . . . . . . . 59
3.2 Gene sets from MSigDB selected for our analyses. . . . . . . . . . . . . . . . 61
3.3 Comparison of the properties of the CBPLNs computed by using each hy-
pothesis test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4 Comparison of the number of links in the BPLN to the number of links in
the CBPLNs, computed without and with normalization. . . . . . . . . . . . 64
4.1 Data sources for each contrast. . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2 Statistics on inputs by contrast. . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3 BPLN statistics for the CS vs. HM contrast. . . . . . . . . . . . . . . . . . . . 122
xviii
-
4.4 BPLN statistics for the Acetaminophen contrast. . . . . . . . . . . . . . . . . 129
4.5 BPLN statistics for the Cirrhosis contrast. . . . . . . . . . . . . . . . . . . . . 136
4.6 BPLN statistics for the Very Advanced HCC contrast. . . . . . . . . . . . . . 144
xix
-
Chapter 1
Introduction
Thanks to the continuous improvement in high-throughput experimental techniques in
the life sciences, this past decade has seen a tremendous explosion in the quantity of pub-
licly available biological data—from whole genome sequences [1], to genome-wide gene
expression measurements [2], to gene and protein interactions [3–17]—propelling biol-
ogy, out of necessity, from a reductionist science to a study of systems. This trend has been
most obvious in the evolution of analysis of gene expression data, as exemplified by the
increase in the number of data sets available in public repositories such as the Gene Ex-
pression Omnibus (GEO) [2]. Much gene expression data derives from treatment-control
experiments, where RNA samples are taken from samples exposed to some condition and
contrasted to those not exposed to that condition, or taken from another condition, e.g.,
cancerous versus non-cancerous biopsies, treatment with a drug versus no treatment, or
liver versus a background mix of tissues.
1
-
2
Early approaches to analysis of treatment-control gene expression experiments reported
lists of genes that exhibited strong perturbation in expression using the t-test and other
univariate test statistics [18, 19], and later, using more sophisticated techniques, such as
fitting linear models to the expression measurements [20]. These methods may report
hundreds or even thousands of genes as significantly perturbed for a given contrast [21],
making interpretation of results difficult. Such voluminous results motivated the devel-
opment of methods and tools to analyze gene expression on the basis of coherent collec-
tions of genes which we refer to in this dissertation as “biological processes” (also known
in the literature as gene sets or pathways). Initial methods focused on detecting over-
representation of genes belonging to a process among the list of perturbed genes [22–25].
Later techniques [26–28] instead sought to identify significant differences in collective ex-
pression of genes belonging to a process, allowing all genes, even those with insignificant
perturbation when measured individually, to contribute to the analysis. The methods
provide a higher level view of the biological phenomena underlying the gene expression
data, however, due to overlap in gene membership in processes, the lists of perturbed
processes can suffer from a large amount of redundancy [29]. The most recent techniques
in process-based expression analysis have thus emphasized computing concise sets of
perturbed processes that nonetheless explain much of the overall perturbation among the
genes [30–33].
Systems biologists have also sought methods to integrate high-throughput data from
multiple sources as a means of gaining biological insights that each source, alone, couldn’t
-
3
provide [34]. Among these efforts are integration of gene or protein interaction networks
with gene expression data to compute response networks—the subnetworks of genes and
interactions that display a large amount of perturbation or activity in response to some
condition or stimulus [35,36], or for which the genes exhibit significant co-expression [37–
40]. Response networks may contain many thousands of genes and interactions, in much
the same way that gene expression analysis could reveal overwhelmingly large lists of
differentially expressed genes. In a similar approach to that of process-based expression
analysis, researchers turned to summarizing response networks in terms of processes in a
method. For each process, one can assess the enrichment of genes belonging to that pro-
cesses among the genes in the response network, then report the processes significantly
overrepresented in the response network [40, 41].
Recently, the integration of gene interaction networks with process annotations for each
gene has led to novel insights as to how pairs of processes themselves interact on the
basis of the interactions between their respective genes [42–45]. These methods detect
inter-process connections on the basis of static data, more representative of the potential
for interaction rather than relevance to any specific biological condition. We reason that
as the phenotypes of cells or tissues change, so too do the connections between the pro-
cesses perturbed in the response to change. Surprisingly, none of these existing methods
also considered integrating gene expression perturbation to give context to the process
interaction network à la response networks. This gap in methodology provided the mo-
tivation for the work we present in this dissertation.
-
4
Inspired by methods for computing inter-process connections and response networks,
we developed the first published method which integrates genome-wide gene perturba-
tion measurements, gene annotations to biological process, and a gene interaction net-
work to identify the connections between processes perturbed under a particular biolog-
ical context. We then drew further inspiration from state-of-the-art methods in process-
based expression analysis to develop another novel method that computes a concise, non-
redundant set of perturbed inter-process connections that explains the perturbation of the
underlying gene-gene interactions that serve as the interfaces between the processes.
In this dissertation, we have applied these methods, as well as an existing process-based
expression analysis method [26] to liver-related data sets. In our first study, we analyzed
gene expression data collected from two rat hepatocyte in vitro cultures, the hepatocyte
monolayer (HM) and collagen sandwich (CS) (Chapter 2) [46]. The liver maintains a vari-
ety of physiological functions including lipid, carbohydrate, and amino acid metabolism,
immune response, and detoxification of xenobiotics, and hepatocytes, which comprise
60–70% of the liver, perform the bulk of its metabolic functions [47, 48]. While Dunn et
al. first compared CS and HM occured over two decades ago ago [49, 50], our study rep-
resents the first systems-level analysis of differences in gene expression of hepatocytes in
these cultures.
In our second study, we developed our method for computing context-specific connec-
tions between process, and used it to elucidate the relationships between the processes
we found perturbed in our first study (Chapter 3) [51]. In the final study (Chapter 4),
-
5
we developed our method for computing non-redundant sets of context-specific inter-
process connections, and apply it to the CS and HM data, as well as to data from a study
on the effects of acetaminophen [52], a model hepatotoxicant [53], as well as data col-
lected from liver biopsies of human patients suffering from infection with Hepatitis C
Virus [54], a major cause of liver damage and hepatocellular carcinoma, one of the most
deadly forms of cancer [55].
1.1 Significant contributions of this dissertation
This dissertation presents methodological advances computational systems biology ca-
pable of reporting high-level trends from high-throughput biological experiments. We
summarize the contributions of this dissertation as follows:
(i) We performed the first systems-level comparison of two important and common
hepatocyte culture systems, identifying biological processes perturbed between the
two conditions.
(ii) We developed a novel method to compute perturbed connections between pro-
cesses, and applied it to the processes identified in (i).
(iii) We developed a novel method to compute a concise, non-redundant set of perturbed
connections between processes that best summarizes the perturbation of the gene-
gene interactions that lie at the interfaces of the processes.
-
6
1.2 Prior work
In this section, we discuss the techniques of historical importance to those we have de-
veloped. These techniques fall into three categories, each with a respective subsection
below: computation of response networks, process-based analysis of gene expression ex-
periments, and computation of inter-process connections.
1.2.1 Process-based analysis of gene expression
A very common experimental design in genome-wide gene expression experiments (e.g.,
microarrays) partitions the set of biological samples into two subsets, with one subset cor-
responding to an experimental treatment and another subset corresponding to a control.
We call such designs treatment-control experimental designs. Early methods for investi-
gating gene expression perturbation in treatment-control microarray experiments com-
puted lists of differentially expressed genes [18, 19]. These lists may contain thousands
of perturbed genes [21], giving rise to methods for summarizing the results in terms of
biological processes known as enrichment analysis. Enrichment analysis asks whether
more genes belonging to a biological process appear in the list of perturbed genes than
expected by chance (i.e., the process is “over-represented”); a process over-represented in
the list of perturbed genes may be interpreted as perturbed, itself. For each process, one
tests the over-representation of each process using a statistical test of significance such as
Fisher’s Exact Test, then reports those that meet a significance threshold [22–25].
-
7
Enrichment analysis methods have several shortcomings that lead to loss of sensitivity.
For one, the user must choose the significance threshold at which to declare a gene sig-
nificantly perturbed, which can change the results of enrichment analysis [21, 29]. Ad-
ditionally, they limit computation of process perturbation by considering differences in
expression only in the most perturbed genes [21, 29]. To address these issue, Subrama-
nian et al. developed a method called Gene Set Enrichment Analysis (GSEA) [26]. GSEA
takes a holistic view of finding significant collective perturbation of processes. GSEA con-
siders the difference in expression of all genes in a treatment-control experiment, and so
each gene belonging to a process may contribute to the cumulative perturbation of that
process, regardless of the magnitude of difference for that individual gene. First, Subra-
manian et al. sort the genes into a ranked list on the basis of the difference in expression
between the treatment condition versus the control. They then score each process based
on the concentration of its genes either towards the top or bottom of the sorted list; pro-
cesses composed of genes that consistently show strong correlation with the treatment or
with the control receive high scores. They assess the significance of the score by construct-
ing an empirical distribution of scores after shuffling the treatment and control labels of
the samples and re-calculating scores, then ranking the original score among the empiri-
cal distribution of scores from the randomized data.
GSEA and other methods which test significance on a process-by-process basis suffer
from several drawbacks. For one, they must employ multiple testing correction. For
GSEA and related methods that rely on empirical distributions, the number of random-
-
8
ized samples required to give the precision necessary to declare significance after cor-
rection can increase rapidly in comparison to the number of tests, which in return can
greatly increase computational time. Additionally, in cases where two or more processes
have many common genes, they may both be reported as significant in the results, while
providing little additional information. Sources of process annotation that organize pro-
cesses as a hierarchy, such as the Gene Ontology (GO) [56], exacerbate the redundancy,
as the more specific processes are composed of subsets of genes in the less specific pro-
cesses [32].
Recent efforts in gene set enrichment have tackled both the challenge of reducing the
number of tests and also reporting a non-redundant significant processes. In 2008, Lu et
al. proposed one such method called GENerative GO Analysis (GenGO) [32], where they
considered the perturbation of individual genes as a noisy observation, generated by a
specific collection of processes that cells activated in response to a particular condition or
stress. The objective of GenGO is to identify a non-redundant set of processes that best
explains the gene expression perturbations observed. To do this, Lu et al. conceptualized
the biological processes and genes forming a bipartite graph, where edges connect genes
to processes in which they participate (see Figure 1.1.
This bipartite graph serves as a generative model: a set of processes is proposed as the set
of perturbed processes, which have generated the perturbation observed at the level of the
individual genes. Thus, genes connected to processes chosen as part of the generating set
are expected be observed as perturbed, while genes connected to no activated processes
-
9
Figure 1.1: Bipartite graph model of GenGO. Each node on the left, representing aprocess (GO Nodes), is connected by edges to nodes representing genes that belong to
that processes (Gene Nodes). Activation edges are drawn from an included process to itsgenes. [Modified from from Lu et al. [32] under the Creative Commons Attribution
License v2.5.]
are expected to be observed as unperturbed. Lu et al. constructed a likelihood function
to calculate how well the observed perturbations fit a selection of processes proposed as
activated. High scores are given to selection of processes connected to many perturbed
genes, connected to few unperturbed genes, and have few genes in common. Lu et al. use
a greedy algorithm to find selections of processes of high likelihood scores.
Bauer et al. proposed method similar to GenGO, which they called model-based gene set
analysis (MGSA) [33]. Like GenGO, they considered unobservable activation of processes
as generating observable but noisy perturbations in gene expression. In MGSA connec-
tions from the processes to the genes are modeled as a Bayesian network. Bauer et al.
calculated likelihood a given set of perturbed processes generated the observed gene ex-
pression perturbations as a combination of Bernoulli distributions based on several prior
-
10
probabilities: (i) the probability a gene connected to a perturbed processes will not be ob-
served as perturbed, (ii) the probability a gene connected to no perturbed processes will
be observed as perturbed, and (iii) the prior probability of observing any given process
as perturbed. Bauer et al. used a Markov chain Monte Carlo (MCMC) approach to find
collections of processes of high likelihood, and reported the probability a process should
be considered perturbed as the number of steps in the MCMC in which the process was
selected.
1.2.2 Computing response networks from treatment-control data
Ideker et al. published the first method capable of computing response networks from
treatment-control contrasts, called ActiveModules [35]. ActiveModules discovers sub-
graphs composed of genes with large differential expression. They define score for a
subgraph as a Liptak-Stouffer z-score, computed as the sum of the gene expression per-
turbation measurements divided by the square root of the number of genes within the
subgraph. They then calculate the significance of a subgraph’s score by its deviation from
a mean, estimated from an empirical distribution of scores of subgraphs created by sam-
pling uniformly at random a number of genes equal to the number in the original sub-
graph. Since the problem of finding the highest scoring subgraph is NP-complete [35],
Ideker et al. employ simulated annealing and report the highest scoring connected com-
ponent during its execution.
-
11
Later, Dittrich et al. extended ActiveModules, using a different method to score subgraphs
and an alternative search algorithm to identify the highest scoring subgraphs [36]. Rather
than simply use the p-values of the perturbation of expression, they model the perturba-
tion as coming from both a “signal component” and a “noise component”, each with its
own distribution. Genes with a large signal component have large positive scores, while
those with a large noise component have negative scores. The authors defined the sub-
graph score as the sum of the scores of the genes within the subgraph. Dittrich et al. de-
fined the problem of finding the connected subgraph of largest score as a Prize-Collecting
Steiner Tree (PCST), which they then solved using integer linear programming [57].
1.2.3 Computation of inter-process connections
As interpretation of gene expression data has moved from the level of individual genes
to processes composed of these genes, so too has the level of analysis of molecular inter-
actions begun to move from gene-gene interactions to discovering interactions between
processes.
Pandey et al. proposed a method to detect chains of connected processes of a specified
length k using gene regulatory networks and gene annotations as the inputs [42]. In this
method, a chain of processes of length k is formed if, for each process in the chain, there
is a corresponding gene in a path of genes in the regulatory network. Each chain of pro-
cesses has a frequency corresponding to how many such pathways in the gene regulatory
-
12
network exist. Pandey et al. proposed a statistical model based on the coupling of hyper-
geometric distributions for each process in the path, and used this model to compute the
significance of each process chain based on its frequency [58].
Li, Agarwal, and Rajagopalan developed a method that computes “crosstalk” based on
the number of interactions between the genes of each process [43]. They began by count-
ing the number of gene-gene interactions between each pair of processes. To determine
whether this number of interactions is greater than expected by chance, they proceeded
to build empirical distributions of counts. For each process, they create a new random-
ized process by sampling genes uniformly at random such that each gene originally in
the process is replaced by one with an equal number of gene-gene interactions. They then
re-compute the interactions between genes of each pair of processes. Li et al. repeat these
steps enough times to build distributions of sufficient sizes, and then compute the mean
number of interactions between pairs of processes for the pair’s distribution, as well as
the mean number over interactions for all pairs of processes over all randomizations. Fi-
nally, for each pair of processes, they compute the significance of the number of their
interactions using Fisher’s Exact Test, where the terms include the number of interactions
between the processes observed in the original data, the mean number of interactions for
that pair’s distribution, the mean for all distributions, and the total number of interactions
overall. If the p-value is found significant after multiple testing correction, they report the
two processes as having crosstalk.
Dotan-Cohen et al. proposed another method for finding connections between processes
-
13
called “Biological Process Linkage Networks” (BPLN) [44]. Similar to the approach of
Li et al. [43], BPLN declares one process linked to another process if genes in the first
process have significantly more interactions with genes in the second process than would
be expected by chance. BPLN computes this significance directly using Fisher’s Exact
Test, however, rather than building empirical distributions as described in the approach
by Li et al.
Wang et al. developed a method which calculates a type of connection between two pro-
cesses that they refer to as “functional similarity” [45]. Like BPLN, they consider interac-
tions between genes of two processes in their measure of the strength of the inter-process
connection, however, rather than include only immediate neighboring connections, they
score the connection by the sum of the distances of all genes in the two processes. They
then compare this score to an empirical distribution computed by re-sampling.
1.3 Overview of Chapters
We present here a brief description of the remainder of this document. Chapter 2 de-
scribes a process-level analysis of gene expression data for hepatocytes HM versus CS
cultures. We originally published this work in 2010 in the journal Tissue Engineering Part
C: Methods [46]. Chapter 3 describes a method which integrates gene expression data
with gene interaction networks to detect connections between the processes we detected
as significantly perturbed in Chapter 2. We originally published this work in the journal
-
14
PLoS ONE in 2011 [51]. Finally, Chapter 4 presents advances which allow computation
of comprehensible, non-redundant sets of perturbed inter-process connections that best
explain the underlying perturbation of expression in the gene interaction network.
-
Chapter 2
Discovering temporal patterns ofexpression in hepatocyte cultures
2.1 Attribution
This chapter contains material originally published as Yeonhee Kim, Christopher D Lasher,
Logan M Milford, T.M. Murali, Padmavathy Rajagopalan (2010) A comparative study of
Genome-Wide transcriptional profiles of primary hepatocytes in collagen sandwich and
monolayer cultures. Tissue Engineering Part C: Methods 16: 1449–1460. [46].
Dr. Yeonhee Kim performed the work described in Sections 2.4.1–2.4.3, while Mr. Lasher
performed the work described in Section 2.4.5.
15
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 16
2.2 Abstract
Two commonly used culture systems in hepatic tissue engineering are the collagen sand-
wich (CS) and monolayers of cells. In this study, genome-wide gene expression profiles
of primary hepatocytes were measured over an 8-day period for each cell culture sys-
tem using Affymetrix GeneChips and compared via gene set enrichment analysis to elicit
biologically meaningful information at the level of gene sets. Our results demonstrate
that gene expression in hepatocytes in CS cultures steadily and comprehensively diverges
from that in monolayer cultures. Gene sets up-regulated in CS cultures include several
associated with liver metabolic and synthesis functions, such as metabolism of lipids,
amino acids, carbohydrates, and alcohol, and synthesis of bile acids. Monooxygenases
such as Cytochrome-P450 enzymes do not show any change between the culture systems
after 1 day, but exhibit significant up-regulation in CS cultures after 3 days in comparison
to hepatocyte monolayers. These data provide insights into the up- and down-regulation
of several liver-critical gene sets and their subsequent effects on liver-specific functions.
These results provide a baseline for further explorations into the systems biology of engi-
neered liver mimics.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 17
2.3 Introduction
As one of the important organs in our bodies, the liver performs many essential functions
such as metabolism, synthesis, secretion, and, detoxification [48]. Hepatocytes are the
principal cells in the liver, comprising over 80% of its mass. Hepatocytes perform sev-
eral characteristic functions of the liver, such as lipid metabolism, glucose homeostasis,
regulation of urea, production of plasma proteins, alcohol clearance, and biotransforma-
tion of xenobiotics [48]. In hepatic tissue engineering, two widely used culture systems
are hepatocyte monolayers (HMs) (Figure 2.1a) and the collagen sandwich (CS) (Figure
2.1b) [49, 50]. In HMs, hepatocytes are cultured on a single-collagen gel. Such cells pro-
gressively lose their phenotypic characteristics over time. In CS cultures, hepatocytes
are maintained between two collagen gels and remain stable over extended periods of
time [59, 60]. Studies have indicated that CS cultures exhibit the preservation of differ-
entiated functions including secretion of urea, expression of plasma proteins such as al-
bumin and fibrinogen, polygonal morphology, the presence of bile canaliculi, as well as
the synthesis of gap junction and tight junction proteins [59,60]. Although morphological
and physiological characteristics of hepatocytes in CS cultures have been studied exten-
sively, comprehensive evaluations of temporal genome-wide gene expression programs
in these culture systems have not been reported. Global gene expression of human hep-
atocellular carcinoma cells (HepG2) in monolayer and spheroidal cultures revealed up-
regulated metabolic functions in spheroids but not in monolayer cultures [61]. Since these
data were taken at a single time point, they did not reveal temporal variations. Another
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 18
study that monitored temporal gene expression in hepatocyte monolayers cultured over a
three day time period revealed the down-regulation of cytochrome-P450 expression [62].
However, neither did this study investigate longer time points nor did it compare mono-
layers to other, more stable culture conditions. DNA microarray measurements have also
been used to study specific pathways through which toxicity was conferred in human
hepatoblastoma cells [63] and to understand the effects of non-parenchymal cells in 2D
cocultures of hepatocytes with fibroblasts or sinusoidal endothelial cells [64, 65].
Hepatocyte Monolayer (HM)
Collagen gel Hepatocyte(a)
Collagen Sandwich (CS)
Collagen gel Bile CanaliculiHepatocyte(b)
Figure 2.1: Schematics of two popular liver cell culture systems. (a) In a hepatocytemonolayer (HM), hepatocytes reside on a layer of collagen. The hepatocytes show a
progressive loss of phenotype, including cell-shape. (b) In a collagen sandwich (CS), anadditional layer of collagen overlays the hepatocytes. Hepatocytes retain their
phenotype for several weeks.
We hypothesized that the enhanced in vivo liver-like phenotypes in CS cultures were a
result of the underlying differences in the transcriptional program between hepatocytes
cultured in CS and HMs. Accordingly, genome-wide gene expression profiles of primary
hepatocytes were measured at four different time points over an 8-day period for each
cell culture system using Affymetrix GeneChips. Among the wide range of techniques
that are available to analyze DNA microarray data, a method was desired that would
summarize, at the level of predefined biological pathways, the differences between the
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 19
culture conditions at each time point. Gene set enrichment analysis (GSEA) [26] was se-
lected since it satisfies this criterion. GSEA is one among a family of techniques that can
summarize differential expression at the level of gene sets [66]. GSEA is widely used,
generates detailed information on the results, and has shown very good performance in
a comparison of methods that compute enrichment at the level of gene sets [67]. Fur-
thermore, GSEA has been used to identify pathways involved in liver toxicity in human
hepatoblastoma cells [63]. GSEA is designed to identify predefined gene sets that are dif-
ferentially expressed in a treatment and a control. All the genes expressed on each gene
chip are ranked based upon their differential expression in CS and HM cultures. There-
fore, a gene set could be important if its members are clustered within the ranked gene
list. GSEA measures the statistical significance of the distribution of ranks within the gene
set against the background of the ranks of all the genes.
Over the 8-day culture period, the gene expression program of hepatocytes in CS cul-
tures monotonically diverged from cells cultured as a monolayer. Gene sets that were
up-regulated to a statistically significant extent in CS cultures included those associated
with liver-specific functions such as bile acid synthesis and lipid, amino acid, carbohy-
drate, and alcohol metabolism. Nuclear receptors, which play a key role in control-
ling the transcriptional activation of target proteins, were up-regulated in CS cultures
on day 1 in culture. Sets containing genes whose expression is mediated by nuclear
receptors were up-regulated in CS systems after 1 day. Gene sets related to xenobiotic
metabolism and monoxygenase activity were not differentially expressed after 1 or 2
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 20
days, but showed highly significant up-regulation after 3 days, suggesting a recovery
in expression of the genes in these sets. Numerous gene sets related to the cell cycle
were down-regulated, suggesting that the cell cycle was arrested in hepatocytes main-
tained in CS culture systems in comparison to HMs. These findings recapitulated well-
known aspects of liver function, thereby suggesting that DNA microarrays are a powerful
tool for shedding light on the transcriptional signatures that underlie differences between
these two culture systems. The DNA microarray data generated in this study are avail-
able at NCBI’s Gene Expression Omnibus under accession number GSE20659 at http:
//www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659. All our results are avail-
able at the following supplementary website: http://bioinformatics.cs.vt.edu/∼murali/
supplements/2010-kim-tissue-engineering.
2.4 Materials and Methods
2.4.1 Materials
Dulbecco’s Modified Eagle Medium (DMEM) containing 4.5 g/L glucose, phosphate-
buffered saline (PBS), penicillin, streptomycin, and trypsin-EDTA were obtained from In-
vitrogen Life Technologies (Carlsbad, CA). Type IV collagenase, HEPES [4-(2-hydroxyethyl)
piperazine-1-ethanesulfonic acid], glucagon, and hydrocortisone were obtained from Sigma-
Aldrich. Unless otherwise noted, all chemicals were used as received from Fisher Scien-
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659http://bioinformatics.cs.vt.edu/~murali/supplements/2010-kim-tissue-engineeringhttp://bioinformatics.cs.vt.edu/~murali/supplements/2010-kim-tissue-engineering
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 21
tific.
2.4.2 Hepatocyte Isolation and Culture
Primary rat hepatocytes were harvested from female Lewis rats (Harlan) that weighed
between 170 and 200g. Animal care and surgical procedures were conducted as per pro-
cedures approved by Virginia Polytechnic Institute and State University’s Institutional
Animal Care and Use Committee. A two-step in situ collagenase perfusion method was
utilized to excise the liver [59, 60]. Briefly, animals were anesthetized with 3L/min of
a gas mixture of 3% (v/v) isofluorane/97% oxygen (Veterinary Anesthesia Systems Co.).
The liver was perfused through the portal vein with Krebs Ringer Buffer (7.13g/L sodium
chloride, 2.1g/L sodium bicarbonate, 1g/L glucose, 4.76g/L HEPES, and 0.42g/L potas-
sium chloride) that contained 1mM ethylenediaminetetraacetic acid, followed by serial
perfusion with a 0.075% w/v and a 0.1% w/v collagenase (Type IV; Sigma-Aldrich) in
Krebs Ringer Buffer containing 5mM calcium chloride. Cell suspensions were filtered
through nylon meshes with porosity ranging from 250 to 62µm (Small Parts, Inc.). Hepa-
tocytes were separated using a Percoll (Sigma-Aldrich) density centrifugation technique.
Cell viability was determined by trypan blue exclusion. Hepatocytes were cultured on
collagen-coated 6-well sterile tissue culture plates (Becton Dickinson Labware) and were
maintained in a culture medium that consisted of DMEM supplemented with 10% heat-
inactivated fetal bovine serum (Hyclone), 200U/mL penicillin, 200µg/mL streptomycin,
20ng/mL epidermal growth factor (BD Biosciences), 0.5U/mL insulin (USP), 14ng/mL
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 22
glucagon, and 7.5g/mL hydrocortisone. A collagen gelling solution was prepared by
mixing nine parts of type I collagen (BD Biosciences) solution and one part of 10 DMEM.
Sterile 6-well tissue culture plates were coated with 0.5mL of the gelling solution and in-
cubated at 37◦C for 1h to promote gel formation. Isolated hepatocytes were suspended
in hepatocyte culture medium at a concentration of 1× 106 cells/mL and seeded on the
collagen-coated wells at a density of 1 million cells/well. CS cultures were formed by the
deposition of a second layer of collagen 1 day after the hepatocytes were seeded [59, 60].
Hepatocytes maintained in stable CS and in unstable confluent HM cultures served as
positive and negative controls, respectively. Hepatocyte cultures were maintained at 37◦C
in a humidified gas mixture of 90% air/10% CO2. The culture medium was replaced ev-
ery 24 h.
2.4.3 RNA Extraction and Gene Chip Hybridization
Primary rat hepatocytes cultured in CS and HM cultures were maintained for an 8-day
culture period. The samples were analyzed at four time points: days 1, 2, 3, and 8 after de-
position of the second layer of collagen gel on hepatocytes. Total RNA was extracted and
purified from cells for each culture system using an RNeasy mini kit (Qiagen) following
the manufacturer’s protocol. Isolated RNA samples in triplicate at each time point were
labeled according to the Affymetrix Standard Target labeling process, hybridized to the
GeneChip Rat Genome 230 2.0 array (Affymetrix), and scanned as described by the manu-
facturer. Complementary RNA (cRNA) synthesis, hybridization, and scanning were per-
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 23
Contrast name Treatment Control
Collagen sandwich vs. Monolayer cultures
CS vs. HM 1d Collagen sandwich 1 day Hepatocyte monolayer 1 day
CS vs. HM 2d Collagen sandwich 2 days Hepatocyte monolayer 2 days
CS vs. HM 3d Collagen sandwich 3 days Hepatocyte monolayer 3 days
CS vs. HM 8d Collagen sandwich 8 days Hepatocyte monolayer 8 days
Within Collagen sandwich
CS 8d vs. 1d Collagen sandwich 8 days Collagen sandwich 1 day
CS 8d vs. 2d Collagen sandwich 8 days Collagen sandwich 2 days
CS 8d vs. 3d Collagen sandwich 8 days Collagen sandwich 3 days
Table 2.1: Contrasts analyzed using GSEA.
formed at the Virginia Bioinformatics Institute Core Laboratory facility as follows. Briefly,
total RNA was converted into double-stranded complementary DNA using a T7-oligo
(dT) primer (5′–GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(dT)24–3′)
and reverse transcription. Synthesized cDNA was converted into biotinylated cRNA by
transcription using T7 RNA polymerase. Randomly fragmented cRNA was hybridized to
GeneChip and the arrays were washed and stained according to Affymetrix’s protocols.
The arrays were scanned using an Affymetrix 7G scanner.
2.4.4 Microarray Data Analysis
The BioConductor package [68] was used to perform initial statistical analysis of the DNA
microarray data. The data from 24 chips (2 culture conditions × 4 time points × 3 repli-
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 24
cates) were normalized using the Robust Multichip Average method for further analysis.
The affylmGUI interface to Linear Models for Microarray Data (LIMMA) [20] was used to
perform differential gene expression analysis for the contrasts shown in Table 2.1. Specif-
ically, for each contrast, LIMMA was used to compute a p-value for each probe set that
indicated the statistical significance of the difference of the expression levels of that probe
set between the two conditions in the contrast.
2.4.5 Gene Set Enrichment Analysis
The normalized gene expression data were analyzed using Gene Set Enrichment Analysis
(GSEA) [26]. Given replicate gene expression measurements for a control phenotype (e.g.,
HM at 1 day) and for a treatment phenotype (e.g., CS at 1 day), GSEA starts by ranking
all genes by the extent of their differential expression in the two phenotypes. Thus, the
lower the rank of a gene, the more up-regulated it is in the treatment, when compared
to the control. Next, given a gene set of interest (e.g., the genes involved in metabolism
of xenobiotics), GSEA uses a modified Kolmogorov-Smirnov test [69] to determine if the
genes in this set have surprisingly high or low ranks ranks of genes in the gene set. This
score has the following interpretation: the more positive the score, the more up-regulated
the genes in the gene set are in the treatment (compared to the control), and the more
negative the score, the more down-regulated the genes in that gene set are. Since the size
of a gene set may influence its enrichment score, GSEA controls this bias by performing
a permutation test and calculating a p-value that represents the statistical significance of
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 25
the enrichment score. Finally, GSEA converts the p-value into a q-value that measures the
false discovery rate, after adjusting for multiple hypothesis testing. Note that the q-value
is unsigned but the enrichment score is signed (positive for overall up regulation and
negative for overall down regulation). We applied GSEA using the following criteria:
1. Sort genes in decreasing order of the signal-to-noise measure.
2. Compute p-values using 10,000 permutations of the sample-to-phenotype associa-
tions.
3. Report all gene sets with q-value (false discovery rate) at most 0.2. Note that with
this cut-off, we expect one out of five gene sets to be a false discovery.
2.5 Results
LIMMA and GSEA were applied to compare the two culture conditions, as shown in Ta-
ble 2.1. The first set of four contrasts compared the hepatocyte transcriptional program in
CS cultures to that in HMs, at each of the four time-points analyzed. These contrasts were
expected to reveal time-dependent differences between these two culture conditions. The
second set of three contrasts compared CS samples to each other: 8 days to 1 day, 8 days
to 2 days, and 8 days to 3 days. Such contrasts were expected to provide information on
how transcriptional programs may vary within CS cultures condition over time.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 26
p-value cut-off CS vs. HM 1d CS vs. HM 2d CS vs. HM 3d CS vs. HM 8d
10−5 31 224 1046 2242
0.0001 61 362 1535 3092
0.001 118 569 2277 4287
0.01 276 1095 3497 6185
0.05 552 1812 5134 8551
Table 2.2: The number of differentially-expressed genes in the each of the four CS vs.HM contrasts at different p-value cutoffs.
2.5.1 The transcriptional program in CS cultures steadily and compre-
hensively diverges from that in HMs
For each of the first four contrasts in Table 2.1, the number of differentially-expressed
probe sets was counted after applying different cutoffs on the p-values computed by
LIMMA. The first column in Table 2.2 indicates the p-value cutoff, while each of the other
four columns show the number of probe sets whose p-value meets the cutoff specified in
each row. An important feature revealed by these data is the monotonic divergence be-
tween the transcriptional programs of CS and HM samples over the 8-day culture period.
For each cutoff, the number of differentially-expressed probe sets increased steadily from
day 1 to day 8. Furthermore, this trend was maintained even over a variation of four
orders of magnitude in the p-value cutoff. On day 8, as many as 6185 probe sets had a p-
value of at most 0.01 (2242 had a p-value of at most 10−5). Since, the Affymetrix Rat230 2
GeneChip has 31,099 probe sets, these results suggest widespread transcriptional pertur-
bation in CS cultures compared to HM cultures.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 27
Upon the identification of the global trends, GSEA was employed to study patterns of
differential expression in specific gene sets. GSEA was applied to the gene expression
data obtained through our experiments and to the following gene sets in the Molecular
Signature DataBase (MSigDB): 1892 curated gene sets from various sources such as on-
line pathway databases, publications in PubMed, and knowledge of domain experts; 837
motif gene sets containing genes that share a cis-regulatory motif that is conserved across
the human, mouse, rat, and dog genomes; and 1454 gene sets corresponding to genes
annotated by different Gene Ontology (GO) terms. For each contrast in Table 2.1 and for
each of these gene sets, GSEA was used to compute a q-value.
Gene sets were filtered to those that exhibited a monotonic up-regulation in the CS-HM
comparison. Specifically, gene sets were restricted to those whose q-values decreased
monotonically from 1 day to 8 days and whose enrichment scores were positive in all
four CS-HM contrasts (the first four contrasts in Table 2.1). Gene sets that were monoton-
ically down-regulated in collagen sandwiches over the 8 day period were also identified
(q-values decreasing monotonically and negative enrichment scores in all four CS-HM
contrasts). Since MSigDB collates data from several sources, many gene sets in it have
high degrees of overlap. When an overlapping group of gene sets is up-regulated or
down-regulated in our data, only one gene set per group is discussed below. The com-
plete sets of results are available on our supplementary web page.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 28
2.5.2 Liver-specific gene sets are up-regulated starting on day 1 or day
2 in CS cultures
Color legend for up-regulated q-valuesq-value 1 0.2 0.05 0.01 0.001 0.0001Color
Figure 2.2: Liver-specific up-regulated gene sets. The legend below shows the q-valueranges for each color. The color scheme used in these figures is RdYlGn from Color
Brewer (http://colorbrewer2.org). CS, collagen sandwich; HM, hepatocyte monolayer.
Hsiao [70] created a compendium of gene expression in normal human tissues with the
goal of defining a reference for basic organ systems biology. They identified 251 genes
expressed selectively in the liver, which are included in MSigDB in the HSIAO LIVER -
SPECIFIC GENES gene set. In a similar study, Su as. [71] profiled gene expression from
91 human and mouse samples across a diverse array of tissues, organs and cell lines. They
identified 37 genes that were expressed specifically in human liver tissue samples; these
genes belong to the HUMAN TISSUE LIVER gene set. The gene sets HSIAO LIVER -
SPECIFIC GENES and LIVER SPECIFIC GENES were up-regulated significantly at day
1 and day 2, respectively. They were monotonically up-regulated on subsequent days.
Both these gene sets had insignificant q-values in the CS-CS contrasts, suggesting that
liver-specific genes are up-regulated on day 1 in CS cultures, and that they continue to
http://colorbrewer2.org
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 29
be monotonically up-regulated on subsequent days (Figure 2.2).The presence and con-
centration of albumin is often used as a marker of phenotypic function of in vitro hep-
atic models [59, 60]. The Albumin gene (ALB) was expressed in several gene sets such
as HSIAO LIVER SPECIFIC GENES and V$HNF1 Q6 that were up-regulated over the 8
day culture period (Figure 2.2). The promoter regions of genes in the set V$HNF1 Q6 con-
tain binding sites for hepatic nuclear factor (HNF1), a transcription factor that activates
gene expression of albumin [72,73]. This gene set has an overlap of 25 genes with the gene
set HSIAO LIVER SPECIFIC GENES, indicating that HNF1 monotonically up-regulates
the expression of albumin and other liver-specific genes in CS cultures but not in HMs.
These observations support the conclusion that transcriptional programs that have been
identified in other datasets to be liver-specific are active through the 8-day period in CS
cultures but are not active in HMs.
2.5.3 Gene sets involved in cholesterol, fatty-acid, alcohol, and carbo-
hydrate metabolism are significantly up-regulated starting on day
1 or day 2 in CS cultures
Cholesterol metabolism
Cholesterol metabolism is an important component of hepatic phenotypic function [48].
The trends exhibited by gene sets linked to cholesterol metabolism in our data were in-
vestigated. Multiple gene sets involved in cholesterol metabolism were up-regulated in
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 30
CS vs. HM 1d 2d 3d 8d
Set Name Description
Cholesterol metabolism HSA00120_BILE_ACID_BIOSYNTHESIS KEGG bile acid synthesis genes
MONOOXYGENASE_ACTIVITY (GO MF) integration of one oxygen atom into a compound CELLULAR_LIPID_METABOLIC_PROCESS (GO BP) lipid reactions and pathways
CARBOXYLIC_ACID_TRANSM_TRANSP (GO MF) transfer of carboxylic acid across a membrane LIPID_TRANSPORT (GO BP) transport into, out of, or between cells NUCLEAR_RECEPTORS GenMAPP nuclear receptor genes
Fatty-Acid Metabolism HSA00071_FATTY_ACID_METABOLISM KEGG fatty acid metabolism pathways MITOCHONDRIAL_FATTY_ACID_BETA GenMAPP fatty acid oxidation in mitochondria PEROXISOME (GO CC) associated with peroxisome HSA03320_PPAR_SIGNALING_PATHWAY KEGG PPAR signaling pathway
Alcohol Metabolism ALCOHOL_METABOLIC_PROCESS (GO BP) reactions and pathways involving alcoholsHSA00071_FATTY_ACID_METABOLISM KEGG fatty acid metabolism pathways HSA00980_METABOLISM_OF_XENOBIOTICS metabolism of xenobiotics by cytochrome P450
Carbohydrate Metabolism GLUCOSE_METABOLIC_PROCESS (GO BP) pathways involving glucose HSA00010_GLYCOLYSIS_AND_GLUCON KEGG glycolysis and gluconeogenesis pathways
Color legend for up-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001Color
Figure 2.3: Up-regulated gene sets involved in cholesterol, fatty-acid, alcohol, andcarbohydrate metabolism. The legend below shows the q-value ranges for each color.
(Abbreviations: CARBOXYLIC ACID TRANSM TRANSP forCARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER ACTIVITY;
MITOCHONDRIAL FATTY ACID BETA forMITOCHONDRIAL FATTY ACID BETAOXIDATION;
HSA00980 METABOLISM OF XENOBIOTICS forHSA00980 METABOLISM OF XENOBIOTICS BY CYTOCHROME P450;
HSA00010 GLYCOLYSIS AND GLUCON forHSA00010 GLYCOLYSIS AND GLUCONEOGENESIS.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 31
Acetyl-Coenzyme A
Cholesterol
Nuclear ReceptorsLiver X receptor (LXR)
Retinoid X receptor (RXR)Cholesterol: LXR ligand; 9-cis retinoic acid: RXR ligand
Cytochrome P450CYP7A1, 8B1, 27A1
Bile AcidsCholic Acid (CA)
Chenodeoxycholic Acid (CDCA)
Nuclear ReceptorsFarnesoid X receptor (FXR)Retinoid X receptor (RXR)
Bile acids: FXR ligand; 9-cis retinoic acid: RXR ligand
ATP-Binding Cassette Transporter
ABCB11
Nuclear ReceptorSmall heterodimer partner
(SHP)
Activation
Inhibition
(A)
(B)
(C-D)
(D)
(A-B)
(E) (A)
No.
CARBOXYLIC_ACID_TRANSMEMBRANE_TRANSPORTER_ACTIVITY
CELLULAR_LIPID_METABOLIC_PROCESSANUCLEAR_RECEPTORSBMONOOXYGENASE_ACTIVITYCHSA00120_BILE_ACID_BIOSYNTHESISD
E
Name of gene set
[21-25]
[21-26]
[27]
[21-25, 28]
[21-24]
[21-25]
[28-29]
[28]
Figure 2.4: Pathway for cholesterol metabolism that shows gene sets involved in thisprocess.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 32
CS cultures compared to HMs. These gene sets include HSA00120 BILE ACID BIOSYN-
THESIS, MONOOXYGENASE ACTIVITY, CELLULAR LIPID METABOLIC PROCESS,
NUCLEAR RECEPTORS, and CARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER -
ACTIVITY (Figure 2.3). Bile acids mediate cholesterol metabolism and the synthesis of
bile acids is initiated through the activity of CYP7A1, CYP8B1 and CYP27A1 enzymes
[74–76] (Figure 2.4). In CS samples, the three CYP enzymes mentioned above are present
either in the gene set HSA00120 BILE ACID BIOSYNTHESIS or in the gene set MONOOXY-
GENASE ACTIVITY, both of which are up-regulated in CS cultures. The gene expression
of CYP enzymes is activated by nuclear receptors—specifically, the retinoid X receptor
(RXR) and the liver X receptor (LXR) [77–80]. The gene set NUCLEAR RECEPTORS,
which contains nuclear receptors involved in the activation of hepatic functions, behaved
similarly to the liver-specific gene sets discussed earlier: it had insignificant q-values in
the CS-CS contrasts, suggesting that nuclear receptors were up regulated on day 1 in CS
cultures and remained up-regulated on subsequent days. The nuclear receptor Farnesoid
X receptor (FXR) plays a critical role in liver functioning. FXR is responsible for regulating
the concentration of bile acids [74–78,81,82]. Bile acid-mediated activation of FXR leads to
the transcriptional activation of the ATP-binding cassette transporter B11 (ABCB11, also
known as bile salt export pump), a process that is crucial for cholesterol secretion into
the bile canaliculi [80, 81]. In CS samples, the transcription of ABCB11, present in gene
set CARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER ACTIVITY, was shown
to be up-regulated over the culture period in comparison to HMs. This gene set contains
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 33
genes annotated with the Gene Ontology molecular function that involves the catalysis of
the transfer of carboxylic acids from one side of the membrane to the other. These trends
and data indicate that genes responsible for the formation, transformation, and transport
of bile acids are up-regulated in CS cultures, thereby promoting cholesterol metabolism.
Fatty Acid Metabolism (PPARα-mediated metabolism)
Peroxisome proliferator-activated receptor α (PPARα) is a nuclear receptor that activates
gene expression of enzymes linked to fatty acid metabolism [83–86]. PPARα-mediated
fatty acid metabolism initiates transcriptional activation of liver fatty acidbinding protein
(L-FABP or FABP1), which deliver fatty acids to its cognitive nuclear receptor, PPARα,
and promote expression of two transporters, ABCD2 and ABCD3, which are necessary
to transport fatty acids into peroxisomes, where target enzymes catalyze the clearance of
fatty acids [74, 83, 85, 86]. PPARα, being dependent on intracellular FABP concentrations,
regulates expression of Acyl-CoA oxidases (ACOXs), short/branched-, long-and very
long-chain Acyl-CoA dehydrogenase (ACADs), and mitochondrial enzymes involved in
-oxidation [87–91]. In our data, the gene sets involved in PPARα-mediated fatty acid
metabolism are HSA00071 FATTY ACID METABOLISM, PEROXISOME, HSA03320 PPAR -
SIGNALING PATHWAY, and MITOCHONDRIAL FATTY ACID BETAOXIDATION (Fig-
ure 2.3). All these gene sets were monotonically up-regulated in CS samples over the 8-
day period in contrast to HMs. The gene FABP1, which belongs to the gene set HSA03320 -
PPAR SIGNALING PATHWAY, was up-regulated in CS cultures and its expression in-
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 34
creased over time. In response to the expression of FABP1, the gene PPARα is expressed
[83]. The PPARα signaling pathway promotes the transcriptional activation of fatty acid
metabolic enzymes [ACOX (acyl-CoA oxidase), ACAD (acyl-CoA dehydrogenase), CAT
(carnitine palmitoyltransferase), LPL (lipoprotein lipase) and ACAT (acetyl-CoA acetyl-
transferease)] [83, 87–89]. These genes are members of the HSA00071 FATTY ACID -
METABOLISM, PEROXISOME, and MITOCHONDRIAL FATTY ACID BETAOXIDATION
gene sets. The combination of the expression of key enzymes responsible for fatty acid
metabolism as well as the expression of two members of the ABC transporter family
(ABCD2 and ABCD3) indicate that PPARα-mediated metabolism was up-regulated in CS
samples.
Alcohol Metabolism
Alcohol, specifically ethanol, is metabolized in the liver by several enzymes present in
the subcellular compartments of hepatocytes. Alcohol dehydrogenase (ADH), a key cy-
toplasmic enzyme plays an important role in converting ethanol to acetaldehyde [92–94].
Acetaldehyde, a toxic molecule, is subsequently converted to nontoxic acetates by mi-
tochondrial acetaldehyde dehydrogenase (ALDH) [92, 94]. Additionally, CYP2E1 en-
ables the clearance of ethanol through an oxidative reaction [94–96] (Figure 2.5). The
gene sets ALCOHOL METABOLIC PROCESS, HSA00980 METABOLISM OF XENOBI-
OTICS BY CYTOCHROME P450, and HSA00071 FATTY ACID METABOLISM were up-
regulated over time in CS cultures in comparison to HMs (Figure 2.3). These gene sets
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 35
Ethanol
Acetaldehyde
Acetate
CO2+H2O
Aldehydedehydrogenase
Cytochrome P450 2E1Alcohol
dehydrogenase
NAD+
NADH
NAD+
NADHNAD+: Nicotinamide adenine dinucleotideNADH: Nicotinamide adenine dinucelotide (Reduced form)
A
D
CB
No.
HSIAO_LIVER_SPECIFIC_GENES
HSA00980_METABOLISM_OF_XENOBIOTICS_BY_CYTOCHROME_P450
HSA00071_FATTY_ACID_METABOLISMALCOHOL_METABOLIC_PROCESSName of gene set
(A-C)
(A-C)
(C, D)[39-41]
[39-41]
[39-43]
Figure 2.5: Pathway for alcohol metabolism that shows gene sets involved in thisprocess.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 36
include three alcohol dehydrogenase genes ADH, ADH1A, and ADH7 that mediate the
transformation of alcohol.
Carbohydrate Metabolism
Gluconeogenesis and glycolysis are essential to maintain glucose homeostasis [97–100].
The maintenance of a healthy glucose level is dependent upon the presence and con-
centration of insulin and glucagon [97–99]. In our gene expression data, genes for key
enzymes involved in the formation and metabolism of glucose are up-regulated in CS cul-
tures in contrast to HMs. The relevant gene sets include HSA00010 GLYCOLYSIS AND -
GLUCONEOGENESIS and GLUCOSE METABOLIC PROCESS (Figure 2.3). Specifically,
genes corresponding to enzymes implicated in glycolysis are hexokinase (HK) , glucose
phosphate isomerase (or phosphoglucose isomerase, GPI or PGI), phosphofructokinase
(PFK), aldolase (ALDOB and ALDOC), triosephosphate isomerase (TPI1), phosphoglyc-
erate kinase 1 (PGK1), phosphoglycerate mutase (PGAM), pyruvate kinase (PKLR), and
lactate dehydrogenase C (LDH) (terminal) [100]. PKLR catalyzes the transphosphory-
lation of phosphoenolpyruvate into pyruvate and ATP, which is the rate-limiting step
of glycolysis. LDH catalyzes the terminal step in glycolysis. Genes that code for en-
zymes involved in gluconeogenesis are phosphoenolpyruvate carboxykinase 1 (PCK or
PEPCK), glucose-6-phosphatase (G6PC), pyruvate carboxylase (PC) and fructose-1, 6-
bisphophatase 1 (FBP1) [99–101]. In the gene sets related to carbohydrate metabolism,
genes such as ALDOB, ALDOC, PKLR, PFK, PGK1, and GPI that are involved in glycoly-
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 37
sis and genes such as G6PC, FBP1, PCK1, and PC, which are involved in gluconeogenesis,
were upregulated in CS samples.
Urea Production
Color legend for up-regulated q-valuesq-value 1 0.2 0.05 0.01 0.001 0.0001Color
Figure 2.6: Up-regulated gene sets involved in urea production. The legend belowshows the q-value ranges for each color. Abbreviations: GO, gene ontology; BP,
biological process; HSA00220 UREA CYCLE forHSA00220 UREA CYCLE AND METABOLISM OF AMINO GROUPS; KEGG, KyotoEncyclopedia of Genes and Genomes; GenMAPP, Gene Map Annotator and Pathway
Profiler.
In the liver, the formation of urea is a critical step in ammonia clearance. The metabolism
of amino acids results in the formation of urea through the conversion of glutamate,
an intermediate metabolite [48]. Gene sets involved in glutamate metabolism such as
HSA00251 GLUTAMATE METABOLISM are gradually up-regulated over time in CS cul-
tures (Figure 2.6). Urea is formed as a result of the action of five enzymes: carbamoyl
phosphate synthetase-1 (CPS-1), ornithine transcarbamoylase (OTC), argininosuccinate
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 38
Glutamine
Glutamate
NH4+ CarbamoylPhosphate OrnithineTranscarbamoylase
Argininosuccinatesynthase Argininosuccinate
lysase
Arginase
Urea
Aspartate
Urea Cycle
Glutaminase
Glutamatedehydrogenase
Carbamoylphosphatesynthase-1
Citrulline
Arginino-succinate
Arginine
Ornithine
(A)
(A)
(B)
(C, D)
(C)(C, D)
(C, D)
NITROGEN_COMPOUND_METABOLIC_PROCESS DC
BANo.
NITROGEN_COMPOUND_CATABOLIC_PROCESS
HSA00251_GLUTAMATE_METABOLISMHSA00220_UREA_CYCLE_AND_METABOLISM_OF_AMINO_GROUPS
Name of gene set
[52-54]
[52-54]
[49-54]
[49-54]
[49-54]
[49-54]
[49-54]
Figure 2.7: Pathway for urea production that shows gene sets involved in this process.
synthase (ASS), argininosuccinate lysase (ASL), and arginase (ARG) [102–107] (Figure 2.7).
These five genes are present in the gene set HSA00220 UREA CYCLE AND METABOLISM -
OF AMINO GROUPS. This gene set is up-regulated in CS cultures over the 8-day period.
In addition, the gene sets NITROGEN COMPOUND CATABOLIC PROCESS and NI-
TROGEN COMPOUND METABOLIC PROCESS include genes such as ASL, ARG and
ASS. Both gene sets are also monotonically up-regulated in CS cultures. The nuclear re-
ceptor HNF-4α (a member of the up-regulated gene set NUCLEAR RECEPTORS) plays
an important role in triggering the transcription of key enzymes for urea production [102].
Together, these data provide information on why urea production is stable in CS cultures
but not in HMs.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 39
2.5.4 Mono-oxygenases are initially not differentially expressed but re-
cover after day 3 in CS cultures
CS vs. HM 1d 2d 3d 8d
Set Name Description
0.03 0.01 6e-03 7e-05 MONOOXYGENASE_ACTIVITY (GO MF) integration of one oxygen atom into a compound
1.00 0.20 2e-03 1e-03 HSA00980_METABOLISM_OF_XENOBIOTICS metabolism of xenobiotics by cytochrome P450
Color legend for up-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001 Color
Figure 2.8: Gene sets that show recovery after day 3. We show the q-values to underscorethe recovery. MF, molecular function.
Xenobiotic metabolism in the liver is mediated through cytochrome P450 enzymes [108–
111]. Expression of these enzymes has been shown to decrease upon the isolation of
hepatocytes from the liver [110, 111]. The gene set MONOOXYGENASE ACTIVITY con-
tains several cytochrome P450 genes and flavin containing monooxygenase. This gene
set had q-values of 0.03, 0.01, 6× 10−3, and 7× 10−5 at days 1, 2, 3, and 8, respectively
(Figure 2.8). The q-values at day 1 and day 2 were nearly identical but decreased by an
order of magnitude at day 3 and by another order of magnitude at day 8. These trends
were examined further by computing the q-values for this gene set in the CS-CS con-
trasts. This gene set was up-regulated with q-value 0.13 for the day 8-day 1 contrast,
0.11 for the day 8-day 2 contrast, and 1.00 for the day 8-day 3 contrast. The statistical
significance values for corresponding contrasts among the HM samples were also com-
puted. This gene set was down-regulated in all three contrasts. The q-values were 0.13,
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 40
0.06, and 0.13, respectively. Thus, the variation in expression of this gene set arises from a
combination of up-regulation in CS cultures and down-regulation in HM cultures. Taken
together, these trends indicate that the genes in this set recovered or became up-regulated
at day 3 and later within CS cultures, in comparison to HM cultures. This trend is of
significance since genes in this set include those that encode for CYP3A, CYP4A, CYP1A
and CYP2C enzymes. CYP3A and 4A enzymes metabolize a wide range of pharmaceu-
ticals and drugs and the CYP2C and CYP1A enzymes break down toxins and xenobi-
otics. The gene set HSA00980 METABOLISM OF XENOBIOTICS BY CYTOCHROME -
P450, which contains cytochrome P450s, phase II metabolizing enzymes such as UDP-
glucuronosyltransferase (UDP-GT) isoforms, and glutathione S-transferase (GST), also
exhibited a similar trend. It showed no significant regulation at day 1, had a q-value of
0.20 at day 2, but had q-values an order of magnitude less at days 3 and 8 (2× 10−3 and
1× 10−3, respectively). In the three CS-CS contrasts, the q-values for this gene set were
0.12, 0.05, and 0.13 (all for up regulation), while they were 0.1, 0.12, and 1 in the HM-HM
contrasts (all for down regulation). In a previous study, the expression of a single CYP
enzyme, specifically, CYP1A1 was monitored and was shown to “recover” on day 3 [112].
However, additional time points or CYP enzymes were not investigated. The results in
this work point to a more widespread recovery phenomenon among the CYP gene family.
-
Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 41
CS vs. HM Set Name Description 1d 2d 3d 8d
Cell Cycle MITOTIC_CELL_CYCLE (GO BP) participation in eukaryotic cell cycle events CELL_CYCLE_KEGG KEGG cell cycle pathway
Nuclear Transport
NUCLEOCYTOPLASMIC_TRANSPORT (GO BP) movement of molecules between nucleus and cytoplasm PROTEIN_IMPORT_INTO_NUCLEUS (GO BP) protein transport into nucleus
Cell Replication MICROTUBULE_CYTOSKELETON (GO CC) microtubules of the cytoskeleton MICROTUBULE_ORGANIZING_CENTER (GO CC) region of microtubule growth
SPINDLE (GO CC) microtubule array for segregating duplicated chromosomes CENTROSOME (GO CC) centriole and spindles
Color legend for down-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001 Color
Figure 2.9: Down-regulated gene sets. The legend below the table shows the q-valueranges for each color. CC, cellular component.
2.5.5 Cell-cycle activity decreases significantly in CS cultures
Analysis of the down regulated gene sets presented interesting insights into cellular re-
sponse within CS and HM systems. Our results suggested a significant difference in cell
cycle activity between the HM and CS samples over the 8 day culture period. The mono-
tonic down regulation of the gene sets MITOTIC CELL CYCLE and CELL CYCLE KEGG
coupled with insignificant q-values in the CS-versus-CS comparisons (data not shown)
suggested decreasing cell cycle activity within the CS cultures (Figure 2.9). Nuclear trans-
port and import functions show decreased activity within CS samples as indicated by the
monotonically down regulated gene sets NUCLEOCYTOPLASMIC TRANSPORT a