discovering contextual connections between biological ... · discovering contextual connections...

202
Discovering contextual connections between biological processes using high-throughput data Christopher D. Lasher Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Genetics, Bioinformatics, and Computational Biology T. M. Murali, Co-Chair Padmavathy Rajagopalan, Co-Chair Richard F. Helm Madhav V. Marathe Naren Ramakrishnan September 12, 2011 Blacksburg, Virginia Keywords: computational systems biology, liver, Markov chain Monte Carlo, molecular interactions, gene expression Copyright 2011, Christopher D. Lasher

Upload: others

Post on 19-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Discovering contextual connections between biological

    processes using high-throughput data

    Christopher D. Lasher

    Dissertation submitted to the Faculty of the

    Virginia Polytechnic Institute and State University

    in partial fulfillment of the requirements for the degree of

    Doctor of Philosophy

    in

    Genetics, Bioinformatics, and Computational Biology

    T. M. Murali, Co-Chair

    Padmavathy Rajagopalan, Co-Chair

    Richard F. Helm

    Madhav V. Marathe

    Naren Ramakrishnan

    September 12, 2011

    Blacksburg, Virginia

    Keywords: computational systems biology, liver, Markov chain Monte Carlo, molecular

    interactions, gene expression

    Copyright 2011, Christopher D. Lasher

  • Discovering contextual connections between biological processes usinghigh-throughput data

    Christopher D. Lasher

    ABSTRACT

    Hearkening to calls from life scientists for aid in interpreting rapidly-growing reposi-tories of data, the fields of bioinformatics and computational systems biology continueto bear increasingly sophisticated methods capable of summarizing and distilling perti-nent phenomena captured by high-throughput experiments. Techniques in analysis ofgenome-wide gene expression (e.g., microarray) data, for example, have moved beyondsimply detecting individual genes perturbed in treatment-control experiments to report-ing the collective perturbation of biologically-related collections of genes, or “processes”.Recent expression analysis methods have focused on improving comprehensibility of re-sults by reporting concise, non-redundant sets of processes by leveraging statistical mod-eling techniques such as Bayesian networks.

    Simultaneously, integrating gene expression measurements with gene interaction net-works has led to computation of response networks—subgraphs of interaction networksin which genes exhibit strong collective perturbation or co-expression. Methods that inte-grate process annotations of genes with interaction networks identify high-level connec-tions between biological processes, themselves. To identify context-specific changes inthese inter-process connections, however, techniques beyond process-based expressionanalysis, which reports only perturbed processes and not their relationships, responsenetworks, composed of interactions between genes rather than processes, and existingtechniques in process connection detection, which do not incorporate specific biologicalcontext, proved necessary.

    We present two novel methods which take inspiration from the latest techniques in process-based gene expression analysis, computation of response networks, and computation ofinter-process connections. We motivate the need for detecting inter-process connectionsby identifying a collection of processes exhibiting significant differences in collective ex-pression in two liver tissue culture systems widely used in toxicological and pharma-ceutical assays. Next, we identify perturbed connections between these processes via anovel method that integrates gene expression, interaction, and annotation data. Finally,we present another novel method that computes non-redundant sets of perturbed inter-process connections, and apply it to several additional liver-related data sets. These appli-cations demonstrate the ability of our methods to capture and report biologically relevanthigh-level trends.

    This work was supported by NSF CBET #0933225, the ICTAS ISBET at Virginia Tech, andthe GBCB Interdisciplinary Ph.D. Program of Virginia Tech.

  • Acknowledgments

    The author thanks the following people and organizations for their contributions to this

    dissertation: Prof. T.M. Murali, Prof. Padma Rajagopalan, Prof. Rich Helm, Prof. Madhav

    Marathe, and Prof. Naren Ramakrishnan for their continued guidance through the course

    of the author’s graduate research; the past and present members of the Rajagopalan

    group, especially Dr. Yeonhee Kim, Dr. Christopher Detzel, and Mr. Adam Larkin for their

    tremendous effort in obtaining gene expression data from hepatocyte cultures used in this

    work, and for their helpful discussions and collegial support; the past and present mem-

    bers of the Murali group, including Dr. Arjun Krishnan, Mr. Christopher Poirel, Mr. Yared

    Kidane, Mr. Naveed Massjouni, Ms. Danielle Choi and Mr. Ahsanur Raman for helpful

    discussions and collegial support, and Mr. Phillip J. Whisenhunt for assistance in creating

    plots for Chapter 4; the staff at the Virginia Bioinformatics Institute (VBI) Core Labo-

    ratory facility for their assistance with obtaining gene expression measurements for the

    liver tissue cultures; Dr. Bryan Lewis and Dr. Keith Bissett of the Network Dynamics and

    Simulation Science Laboratory at the Virginia Bioinformatics Institute at Virginia Tech for

    helpfully providing access to computational resources.

    iii

  • Financial support for this work was generously provided by National Science Foun-

    dation (NSF) Chemical, Bioengineering, Environmental, and Transport Systems (CBET)

    #0933225 “Transcriptional Signatures of 3D Liver Mimetic Architectures”, the Institute

    for Critical Technology and Applied Sciences (ICTAS) Center for Systems Biology of En-

    gineered Tissues (ISBET) at Virginia Tech, and the Genetics, Bioinformatics, and Compu-

    tational Biology (GBCB) Interdisciplinary Ph.D. Program of Virginia Tech.

    The author thanks the Python Software Foundation, particularly Mr. Jesse Noller, Mr. Steve

    Holden, Mr. Van Lindberg, Mr. Jacob Kaplan-Moss, and the Python community, in gen-

    eral, for their maintenance of not only an arguably superior programming language, but

    a fantastic programming community as well. The author thanks the Stack Overflow com-

    munity for their collective support in answering many, many questions raised through

    the process of programming the software used throughout this dissertation.

    The author gives his heartfelt thanks to the following people who made completion of this

    work possible: his family (Mom, Dad, Richard, Bammee, Pop Pop, Grandma, Grandpa,

    Aunt Nell, Uncle Jim, and all the cousins) for their continuing love and support; Mrs. Den-

    nie Munson, who plays second-mom to so many of poor, wistful graduate students; the

    Executive Board (Dr. Tsai-Tien Tseng, Dr. Marcus Chibucos, Dr. Bryan Lewis, Dr. An-

    drea Apolloni, Mr. Tim Driscoll, and Mr. Andrew Warren) for making the rules around

    here; the Girlfriend Advisory Board (Mrs. Katie Younger Gehrt, Ms. Rachel DeLauder,

    and Dr. Charley Kelly) for their stalwart efforts in the face of dire chances; The Amelia

    Earharts (Mr. Ian Firkin, Ms. Joelle Hackney, and Ms. Megan Tiller) for sharing wonderful

    iv

  • musical and life experiences; Mr. Patrick Butler and Dr. Kiran Pashikanti for much needed

    company and entertaining conversations; the Lovely Ladies of Clay Street (Ms. Phoebe

    Williams and Ms. Kristi Steiner) for providing lodging during the final push; Mr. Wes

    Smith for the much-needed breaks away from Blacksburg and computers; Mr. Kyle Parker

    and Ms. Jessica Frisch for sending sunny Florida vibes to Blacksburg; Ms. Judy Kuhn for

    being a great finder of excellent music, an all-around wonderful person, and a dear friend;

    Dr. Mary Ann Moran and Dr. Barny Whitman for their encouragement to pursue a higher

    degree; Dr. James Henriksen and Dr. Emily DeCrescenzo Henriksen for their encourage-

    ment on finishing the degree; Mrs. Carol Zank-Rehwaldt for caring enough to teach us all

    to write clear, coherent prose; and Dr. Kate Janean Steklachich for her adoration, affection,

    and irresistible loveliness.

    v

  • Dedication

    Dedicated to Mr. Hanks, who spread not only the joy of the study of life, but also the joy

    of life, itself.

    vi

  • Contents

    1 Introduction 1

    1.1 Significant contributions of this dissertation . . . . . . . . . . . . . . . . . . . 5

    1.2 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2.1 Process-based analysis of gene expression . . . . . . . . . . . . . . . . 6

    1.2.2 Computing response networks from treatment-control data . . . . . 10

    1.2.3 Computation of inter-process connections . . . . . . . . . . . . . . . . 11

    1.3 Overview of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2 Discovering temporal patterns of expression in hepatocyte cultures 15

    2.1 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    vii

  • 2.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.4.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.4.2 Hepatocyte Isolation and Culture . . . . . . . . . . . . . . . . . . . . 21

    2.4.3 RNA Extraction and Gene Chip Hybridization . . . . . . . . . . . . . 22

    2.4.4 Microarray Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 23

    2.4.5 Gene Set Enrichment Analysis . . . . . . . . . . . . . . . . . . . . . . 24

    2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    2.5.1 The transcriptional program in CS cultures steadily and compre-

    hensively diverges from that in HMs . . . . . . . . . . . . . . . . . . . 26

    2.5.2 Liver-specific gene sets are up-regulated starting on day 1 or day 2

    in CS cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    2.5.3 Gene sets involved in cholesterol, fatty-acid, alcohol, and carbohy-

    drate metabolism are significantly up-regulated starting on day 1 or

    day 2 in CS cultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    2.5.4 Mono-oxygenases are initially not differentially expressed but re-

    cover after day 3 in CS cultures . . . . . . . . . . . . . . . . . . . . . . 39

    2.5.5 Cell-cycle activity decreases significantly in CS cultures . . . . . . . . 41

    2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    viii

  • 3 Discovering Networks of Perturbed Biological Processes in Hepatocyte Cultures 46

    3.1 Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    3.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    3.4.1 Measuring perturbation from gene expression data . . . . . . . . . . 51

    3.4.2 Scoring a link between a pair of processes . . . . . . . . . . . . . . . . 51

    3.4.3 Extending the score to include transcriptional data and interaction

    weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    3.4.4 Assessing the statistical significance of links . . . . . . . . . . . . . . 55

    3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    3.5.1 Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    3.5.2 Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    3.5.3 Liver Specific Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    3.5.4 Liver Specific Gene Sets Regulated by HNF1 . . . . . . . . . . . . . . 73

    3.5.5 Lipid Homeostasis and Bile Acid Synthesis . . . . . . . . . . . . . . . 77

    3.5.6 Interpretation of Links in CBPLNs . . . . . . . . . . . . . . . . . . . . 87

    ix

  • 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    4 Discovering descriptive networks of processes from gene expression and molec-

    ular interaction network data 92

    4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

    4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    4.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    4.3.1 Computing gene expression perturbation . . . . . . . . . . . . . . . . 98

    4.3.2 Selection of processes for computation of BPNs . . . . . . . . . . . . 98

    4.3.3 The MCMC-BPN algorithm . . . . . . . . . . . . . . . . . . . . . . . . 99

    4.3.4 Computation of BPLNs . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    4.3.5 Measuring redundancy within a BPN . . . . . . . . . . . . . . . . . . 107

    4.3.6 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    4.4.1 CS versus HM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    4.4.2 Acetaminophen Exposure . . . . . . . . . . . . . . . . . . . . . . . . . 125

    4.4.3 Cirrhosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

    4.4.4 Very Advanced HCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

    x

  • 4.4.5 Behavior of the MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

    5 Conclusion 150

    5.1 Summary of presented work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

    5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    5.3 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

    5.4 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

    Bibliography 156

    xi

  • List of Abbreviations

    BP biological process

    BPLN Biological Process Linkage Network

    BPN Biological Process Network

    CBPLN Contextual Biological Process Linkage Network

    CC cellular component

    CORUM Comprehensive Resource of Mammalian protein complexes

    CS collagen sandwich

    EBI European Bioinformatics Institute

    GenMAPP Gene Map Annotator and Pathway Profiler

    GEO Gene Expression Omnibus

    GSEA Gene Set Enrichment Analysis

    HCC hepatocellular carcinoma

    HCV hepatitis C virus

    HM hepatocyte monolayer

    JI Jaccard Index

    KEGG Kyoto Encyclopedia for Genes and Genomes

    MCMC Markov chain Monte Carlo

    MF molecular function

    MiMI Michigan Molecular Interactions

    xii

  • MSigDB Molecular Signatures Database

    NCBI National Center for Biotechnology Information

    NCI PID National Cancer Institute Pathway Interaction Database

    STRING the Search Tool for the Retrieval of Interacting Genes/Proteins

    xiii

  • List of Figures

    1.1 Bipartite graph model of GenGO. . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1 Schematics of two popular liver cell culture systems. . . . . . . . . . . . . . . 18

    2.2 Liver-specific up-regulated gene sets. . . . . . . . . . . . . . . . . . . . . . . 28

    2.3 Up-regulated gene sets involved in cholesterol, fatty-acid, alcohol, and car-

    bohydrate metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2.4 Pathway for cholesterol metabolism that shows gene sets involved in this

    process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    2.5 Pathway for alcohol metabolism that shows gene sets involved in this pro-

    cess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    2.6 Up-regulated gene sets involved in urea production. . . . . . . . . . . . . . . 37

    2.7 Pathway for urea production that shows gene sets involved in this process. 38

    2.8 Gene sets that show recovery after day 3. . . . . . . . . . . . . . . . . . . . . 39

    xiv

  • 2.9 Down-regulated gene sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    2.10 Gene set level network of up-regulated processes in CS cultures. . . . . . . . 44

    3.1 Calculating the links score σ(a, b) in an example network. . . . . . . . . . . . 55

    3.2 Scatter plots of link p-values for links found to be significant. . . . . . . . . . 66

    3.3 CS vs. HM CBPLN on day 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    3.4 CS vs. HM CBPLN on day 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    3.5 CS vs. HM CBPLN on day 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    3.6 CS vs. HM CBPLN on day 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    3.7 Context-free BPLN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    3.8 Network of functional interactions resulting in the link between V$HNF1 -

    Q6 and HSIAO LIVER SPECIFIC GENES on day 8. . . . . . . . . . . . . . . 75

    3.9 The liver regulates two tightly coupled pathways: bile acid synthesis and

    fatty acid metabolism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    3.10 Subgraphs of the CBPLNs involving nuclear receptors and the PPAR sig-

    naling, bile acid biosynthesis, and fatty acid metabolism pathways. . . . . . 78

    3.11 Network of functional interactions resulting in the link between NUCLEAR -

    RECEPTORS and HSA03320 PPAR SIGNALING PATHWAY on day 8. . . . 82

    xv

  • 3.12 Network of functional interactions resulting in the link between HSA03320 -

    PPAR SIGNALING PATHWAY and HSA00071 FATTY ACID METABOLISM

    on day 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    4.1 An example interaction network and Bayesian network model. . . . . . . . 101

    4.2 Pairwise overlaps for CS vs. HM BPNs. . . . . . . . . . . . . . . . . . . . . . 116

    4.3 Redundancy of processes and links in CS vs. HM BPNs. . . . . . . . . . . . . 118

    4.4 A BPN computed for the CS vs. HM contrast. . . . . . . . . . . . . . . . . . . 120

    4.5 Interactions explained by the link between two PPAR-related processes. . . 123

    4.6 Redundancy of processes and links in the CS vs. HM BPLN at significance

    threshold of 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    4.7 Overlap of interactions explained by the Acetaminophen BPNs. . . . . . . . 126

    4.8 Redundancy of processes and links in Acetaminophen BPNs. . . . . . . . . 127

    4.9 A BPN computed for the Acetaminophen contrast. . . . . . . . . . . . . . . . 128

    4.10 Interactions explained by the link between KEGG VALINE LEUCINE AND -

    ISOLEUCINE DEGRADATION and REACTOME METABOLISM OF LIPIDS -

    AND LIPOPROTEINS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

    4.11 Redundancy of processes and links in the Acetaminophen BPLN at signif-

    icance threshold 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

    xvi

  • 4.12 Overlap of interactions explained by the Cirrhosis BPNs. . . . . . . . . . . . 133

    4.13 Redundancy of processes and links in Cirrhosis BPNs. . . . . . . . . . . . . . 134

    4.14 A BPN computed for the Cirrhosis contrast. . . . . . . . . . . . . . . . . . . . 135

    4.15 Redundancy of processes and links in the Cirrhosis BPLN at significance

    threshold 0.0001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

    4.16 Overlap of interactions explained by the Very Advanced HCC BPNs. . . . . 139

    4.17 Redundancy of processes and links in Very Advanced HCC BPNs. . . . . . 140

    4.18 A BPN computed for the Very Advanced HCC contrast. . . . . . . . . . . . . 141

    4.19 Interactions explained by the link between REACTOME INNATE IMMU-

    NITY SIGNALING and REGULATION OF MITOTIC CELL CYCLE. . . . . 143

    4.20 Distributions of states and visitation frequencies. . . . . . . . . . . . . . . . . 146

    xvii

  • List of Tables

    2.1 Contrasts analyzed using GSEA. . . . . . . . . . . . . . . . . . . . . . . . . . 23

    2.2 The number of differentially-expressed genes in the each of the four CS vs.

    HM contrasts at different p-value cutoffs. . . . . . . . . . . . . . . . . . . . . 26

    3.1 Contrasts analyzed for contextual BPLNs. . . . . . . . . . . . . . . . . . . . . 59

    3.2 Gene sets from MSigDB selected for our analyses. . . . . . . . . . . . . . . . 61

    3.3 Comparison of the properties of the CBPLNs computed by using each hy-

    pothesis test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    3.4 Comparison of the number of links in the BPLN to the number of links in

    the CBPLNs, computed without and with normalization. . . . . . . . . . . . 64

    4.1 Data sources for each contrast. . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    4.2 Statistics on inputs by contrast. . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    4.3 BPLN statistics for the CS vs. HM contrast. . . . . . . . . . . . . . . . . . . . 122

    xviii

  • 4.4 BPLN statistics for the Acetaminophen contrast. . . . . . . . . . . . . . . . . 129

    4.5 BPLN statistics for the Cirrhosis contrast. . . . . . . . . . . . . . . . . . . . . 136

    4.6 BPLN statistics for the Very Advanced HCC contrast. . . . . . . . . . . . . . 144

    xix

  • Chapter 1

    Introduction

    Thanks to the continuous improvement in high-throughput experimental techniques in

    the life sciences, this past decade has seen a tremendous explosion in the quantity of pub-

    licly available biological data—from whole genome sequences [1], to genome-wide gene

    expression measurements [2], to gene and protein interactions [3–17]—propelling biol-

    ogy, out of necessity, from a reductionist science to a study of systems. This trend has been

    most obvious in the evolution of analysis of gene expression data, as exemplified by the

    increase in the number of data sets available in public repositories such as the Gene Ex-

    pression Omnibus (GEO) [2]. Much gene expression data derives from treatment-control

    experiments, where RNA samples are taken from samples exposed to some condition and

    contrasted to those not exposed to that condition, or taken from another condition, e.g.,

    cancerous versus non-cancerous biopsies, treatment with a drug versus no treatment, or

    liver versus a background mix of tissues.

    1

  • 2

    Early approaches to analysis of treatment-control gene expression experiments reported

    lists of genes that exhibited strong perturbation in expression using the t-test and other

    univariate test statistics [18, 19], and later, using more sophisticated techniques, such as

    fitting linear models to the expression measurements [20]. These methods may report

    hundreds or even thousands of genes as significantly perturbed for a given contrast [21],

    making interpretation of results difficult. Such voluminous results motivated the devel-

    opment of methods and tools to analyze gene expression on the basis of coherent collec-

    tions of genes which we refer to in this dissertation as “biological processes” (also known

    in the literature as gene sets or pathways). Initial methods focused on detecting over-

    representation of genes belonging to a process among the list of perturbed genes [22–25].

    Later techniques [26–28] instead sought to identify significant differences in collective ex-

    pression of genes belonging to a process, allowing all genes, even those with insignificant

    perturbation when measured individually, to contribute to the analysis. The methods

    provide a higher level view of the biological phenomena underlying the gene expression

    data, however, due to overlap in gene membership in processes, the lists of perturbed

    processes can suffer from a large amount of redundancy [29]. The most recent techniques

    in process-based expression analysis have thus emphasized computing concise sets of

    perturbed processes that nonetheless explain much of the overall perturbation among the

    genes [30–33].

    Systems biologists have also sought methods to integrate high-throughput data from

    multiple sources as a means of gaining biological insights that each source, alone, couldn’t

  • 3

    provide [34]. Among these efforts are integration of gene or protein interaction networks

    with gene expression data to compute response networks—the subnetworks of genes and

    interactions that display a large amount of perturbation or activity in response to some

    condition or stimulus [35,36], or for which the genes exhibit significant co-expression [37–

    40]. Response networks may contain many thousands of genes and interactions, in much

    the same way that gene expression analysis could reveal overwhelmingly large lists of

    differentially expressed genes. In a similar approach to that of process-based expression

    analysis, researchers turned to summarizing response networks in terms of processes in a

    method. For each process, one can assess the enrichment of genes belonging to that pro-

    cesses among the genes in the response network, then report the processes significantly

    overrepresented in the response network [40, 41].

    Recently, the integration of gene interaction networks with process annotations for each

    gene has led to novel insights as to how pairs of processes themselves interact on the

    basis of the interactions between their respective genes [42–45]. These methods detect

    inter-process connections on the basis of static data, more representative of the potential

    for interaction rather than relevance to any specific biological condition. We reason that

    as the phenotypes of cells or tissues change, so too do the connections between the pro-

    cesses perturbed in the response to change. Surprisingly, none of these existing methods

    also considered integrating gene expression perturbation to give context to the process

    interaction network à la response networks. This gap in methodology provided the mo-

    tivation for the work we present in this dissertation.

  • 4

    Inspired by methods for computing inter-process connections and response networks,

    we developed the first published method which integrates genome-wide gene perturba-

    tion measurements, gene annotations to biological process, and a gene interaction net-

    work to identify the connections between processes perturbed under a particular biolog-

    ical context. We then drew further inspiration from state-of-the-art methods in process-

    based expression analysis to develop another novel method that computes a concise, non-

    redundant set of perturbed inter-process connections that explains the perturbation of the

    underlying gene-gene interactions that serve as the interfaces between the processes.

    In this dissertation, we have applied these methods, as well as an existing process-based

    expression analysis method [26] to liver-related data sets. In our first study, we analyzed

    gene expression data collected from two rat hepatocyte in vitro cultures, the hepatocyte

    monolayer (HM) and collagen sandwich (CS) (Chapter 2) [46]. The liver maintains a vari-

    ety of physiological functions including lipid, carbohydrate, and amino acid metabolism,

    immune response, and detoxification of xenobiotics, and hepatocytes, which comprise

    60–70% of the liver, perform the bulk of its metabolic functions [47, 48]. While Dunn et

    al. first compared CS and HM occured over two decades ago ago [49, 50], our study rep-

    resents the first systems-level analysis of differences in gene expression of hepatocytes in

    these cultures.

    In our second study, we developed our method for computing context-specific connec-

    tions between process, and used it to elucidate the relationships between the processes

    we found perturbed in our first study (Chapter 3) [51]. In the final study (Chapter 4),

  • 5

    we developed our method for computing non-redundant sets of context-specific inter-

    process connections, and apply it to the CS and HM data, as well as to data from a study

    on the effects of acetaminophen [52], a model hepatotoxicant [53], as well as data col-

    lected from liver biopsies of human patients suffering from infection with Hepatitis C

    Virus [54], a major cause of liver damage and hepatocellular carcinoma, one of the most

    deadly forms of cancer [55].

    1.1 Significant contributions of this dissertation

    This dissertation presents methodological advances computational systems biology ca-

    pable of reporting high-level trends from high-throughput biological experiments. We

    summarize the contributions of this dissertation as follows:

    (i) We performed the first systems-level comparison of two important and common

    hepatocyte culture systems, identifying biological processes perturbed between the

    two conditions.

    (ii) We developed a novel method to compute perturbed connections between pro-

    cesses, and applied it to the processes identified in (i).

    (iii) We developed a novel method to compute a concise, non-redundant set of perturbed

    connections between processes that best summarizes the perturbation of the gene-

    gene interactions that lie at the interfaces of the processes.

  • 6

    1.2 Prior work

    In this section, we discuss the techniques of historical importance to those we have de-

    veloped. These techniques fall into three categories, each with a respective subsection

    below: computation of response networks, process-based analysis of gene expression ex-

    periments, and computation of inter-process connections.

    1.2.1 Process-based analysis of gene expression

    A very common experimental design in genome-wide gene expression experiments (e.g.,

    microarrays) partitions the set of biological samples into two subsets, with one subset cor-

    responding to an experimental treatment and another subset corresponding to a control.

    We call such designs treatment-control experimental designs. Early methods for investi-

    gating gene expression perturbation in treatment-control microarray experiments com-

    puted lists of differentially expressed genes [18, 19]. These lists may contain thousands

    of perturbed genes [21], giving rise to methods for summarizing the results in terms of

    biological processes known as enrichment analysis. Enrichment analysis asks whether

    more genes belonging to a biological process appear in the list of perturbed genes than

    expected by chance (i.e., the process is “over-represented”); a process over-represented in

    the list of perturbed genes may be interpreted as perturbed, itself. For each process, one

    tests the over-representation of each process using a statistical test of significance such as

    Fisher’s Exact Test, then reports those that meet a significance threshold [22–25].

  • 7

    Enrichment analysis methods have several shortcomings that lead to loss of sensitivity.

    For one, the user must choose the significance threshold at which to declare a gene sig-

    nificantly perturbed, which can change the results of enrichment analysis [21, 29]. Ad-

    ditionally, they limit computation of process perturbation by considering differences in

    expression only in the most perturbed genes [21, 29]. To address these issue, Subrama-

    nian et al. developed a method called Gene Set Enrichment Analysis (GSEA) [26]. GSEA

    takes a holistic view of finding significant collective perturbation of processes. GSEA con-

    siders the difference in expression of all genes in a treatment-control experiment, and so

    each gene belonging to a process may contribute to the cumulative perturbation of that

    process, regardless of the magnitude of difference for that individual gene. First, Subra-

    manian et al. sort the genes into a ranked list on the basis of the difference in expression

    between the treatment condition versus the control. They then score each process based

    on the concentration of its genes either towards the top or bottom of the sorted list; pro-

    cesses composed of genes that consistently show strong correlation with the treatment or

    with the control receive high scores. They assess the significance of the score by construct-

    ing an empirical distribution of scores after shuffling the treatment and control labels of

    the samples and re-calculating scores, then ranking the original score among the empiri-

    cal distribution of scores from the randomized data.

    GSEA and other methods which test significance on a process-by-process basis suffer

    from several drawbacks. For one, they must employ multiple testing correction. For

    GSEA and related methods that rely on empirical distributions, the number of random-

  • 8

    ized samples required to give the precision necessary to declare significance after cor-

    rection can increase rapidly in comparison to the number of tests, which in return can

    greatly increase computational time. Additionally, in cases where two or more processes

    have many common genes, they may both be reported as significant in the results, while

    providing little additional information. Sources of process annotation that organize pro-

    cesses as a hierarchy, such as the Gene Ontology (GO) [56], exacerbate the redundancy,

    as the more specific processes are composed of subsets of genes in the less specific pro-

    cesses [32].

    Recent efforts in gene set enrichment have tackled both the challenge of reducing the

    number of tests and also reporting a non-redundant significant processes. In 2008, Lu et

    al. proposed one such method called GENerative GO Analysis (GenGO) [32], where they

    considered the perturbation of individual genes as a noisy observation, generated by a

    specific collection of processes that cells activated in response to a particular condition or

    stress. The objective of GenGO is to identify a non-redundant set of processes that best

    explains the gene expression perturbations observed. To do this, Lu et al. conceptualized

    the biological processes and genes forming a bipartite graph, where edges connect genes

    to processes in which they participate (see Figure 1.1.

    This bipartite graph serves as a generative model: a set of processes is proposed as the set

    of perturbed processes, which have generated the perturbation observed at the level of the

    individual genes. Thus, genes connected to processes chosen as part of the generating set

    are expected be observed as perturbed, while genes connected to no activated processes

  • 9

    Figure 1.1: Bipartite graph model of GenGO. Each node on the left, representing aprocess (GO Nodes), is connected by edges to nodes representing genes that belong to

    that processes (Gene Nodes). Activation edges are drawn from an included process to itsgenes. [Modified from from Lu et al. [32] under the Creative Commons Attribution

    License v2.5.]

    are expected to be observed as unperturbed. Lu et al. constructed a likelihood function

    to calculate how well the observed perturbations fit a selection of processes proposed as

    activated. High scores are given to selection of processes connected to many perturbed

    genes, connected to few unperturbed genes, and have few genes in common. Lu et al. use

    a greedy algorithm to find selections of processes of high likelihood scores.

    Bauer et al. proposed method similar to GenGO, which they called model-based gene set

    analysis (MGSA) [33]. Like GenGO, they considered unobservable activation of processes

    as generating observable but noisy perturbations in gene expression. In MGSA connec-

    tions from the processes to the genes are modeled as a Bayesian network. Bauer et al.

    calculated likelihood a given set of perturbed processes generated the observed gene ex-

    pression perturbations as a combination of Bernoulli distributions based on several prior

  • 10

    probabilities: (i) the probability a gene connected to a perturbed processes will not be ob-

    served as perturbed, (ii) the probability a gene connected to no perturbed processes will

    be observed as perturbed, and (iii) the prior probability of observing any given process

    as perturbed. Bauer et al. used a Markov chain Monte Carlo (MCMC) approach to find

    collections of processes of high likelihood, and reported the probability a process should

    be considered perturbed as the number of steps in the MCMC in which the process was

    selected.

    1.2.2 Computing response networks from treatment-control data

    Ideker et al. published the first method capable of computing response networks from

    treatment-control contrasts, called ActiveModules [35]. ActiveModules discovers sub-

    graphs composed of genes with large differential expression. They define score for a

    subgraph as a Liptak-Stouffer z-score, computed as the sum of the gene expression per-

    turbation measurements divided by the square root of the number of genes within the

    subgraph. They then calculate the significance of a subgraph’s score by its deviation from

    a mean, estimated from an empirical distribution of scores of subgraphs created by sam-

    pling uniformly at random a number of genes equal to the number in the original sub-

    graph. Since the problem of finding the highest scoring subgraph is NP-complete [35],

    Ideker et al. employ simulated annealing and report the highest scoring connected com-

    ponent during its execution.

  • 11

    Later, Dittrich et al. extended ActiveModules, using a different method to score subgraphs

    and an alternative search algorithm to identify the highest scoring subgraphs [36]. Rather

    than simply use the p-values of the perturbation of expression, they model the perturba-

    tion as coming from both a “signal component” and a “noise component”, each with its

    own distribution. Genes with a large signal component have large positive scores, while

    those with a large noise component have negative scores. The authors defined the sub-

    graph score as the sum of the scores of the genes within the subgraph. Dittrich et al. de-

    fined the problem of finding the connected subgraph of largest score as a Prize-Collecting

    Steiner Tree (PCST), which they then solved using integer linear programming [57].

    1.2.3 Computation of inter-process connections

    As interpretation of gene expression data has moved from the level of individual genes

    to processes composed of these genes, so too has the level of analysis of molecular inter-

    actions begun to move from gene-gene interactions to discovering interactions between

    processes.

    Pandey et al. proposed a method to detect chains of connected processes of a specified

    length k using gene regulatory networks and gene annotations as the inputs [42]. In this

    method, a chain of processes of length k is formed if, for each process in the chain, there

    is a corresponding gene in a path of genes in the regulatory network. Each chain of pro-

    cesses has a frequency corresponding to how many such pathways in the gene regulatory

  • 12

    network exist. Pandey et al. proposed a statistical model based on the coupling of hyper-

    geometric distributions for each process in the path, and used this model to compute the

    significance of each process chain based on its frequency [58].

    Li, Agarwal, and Rajagopalan developed a method that computes “crosstalk” based on

    the number of interactions between the genes of each process [43]. They began by count-

    ing the number of gene-gene interactions between each pair of processes. To determine

    whether this number of interactions is greater than expected by chance, they proceeded

    to build empirical distributions of counts. For each process, they create a new random-

    ized process by sampling genes uniformly at random such that each gene originally in

    the process is replaced by one with an equal number of gene-gene interactions. They then

    re-compute the interactions between genes of each pair of processes. Li et al. repeat these

    steps enough times to build distributions of sufficient sizes, and then compute the mean

    number of interactions between pairs of processes for the pair’s distribution, as well as

    the mean number over interactions for all pairs of processes over all randomizations. Fi-

    nally, for each pair of processes, they compute the significance of the number of their

    interactions using Fisher’s Exact Test, where the terms include the number of interactions

    between the processes observed in the original data, the mean number of interactions for

    that pair’s distribution, the mean for all distributions, and the total number of interactions

    overall. If the p-value is found significant after multiple testing correction, they report the

    two processes as having crosstalk.

    Dotan-Cohen et al. proposed another method for finding connections between processes

  • 13

    called “Biological Process Linkage Networks” (BPLN) [44]. Similar to the approach of

    Li et al. [43], BPLN declares one process linked to another process if genes in the first

    process have significantly more interactions with genes in the second process than would

    be expected by chance. BPLN computes this significance directly using Fisher’s Exact

    Test, however, rather than building empirical distributions as described in the approach

    by Li et al.

    Wang et al. developed a method which calculates a type of connection between two pro-

    cesses that they refer to as “functional similarity” [45]. Like BPLN, they consider interac-

    tions between genes of two processes in their measure of the strength of the inter-process

    connection, however, rather than include only immediate neighboring connections, they

    score the connection by the sum of the distances of all genes in the two processes. They

    then compare this score to an empirical distribution computed by re-sampling.

    1.3 Overview of Chapters

    We present here a brief description of the remainder of this document. Chapter 2 de-

    scribes a process-level analysis of gene expression data for hepatocytes HM versus CS

    cultures. We originally published this work in 2010 in the journal Tissue Engineering Part

    C: Methods [46]. Chapter 3 describes a method which integrates gene expression data

    with gene interaction networks to detect connections between the processes we detected

    as significantly perturbed in Chapter 2. We originally published this work in the journal

  • 14

    PLoS ONE in 2011 [51]. Finally, Chapter 4 presents advances which allow computation

    of comprehensible, non-redundant sets of perturbed inter-process connections that best

    explain the underlying perturbation of expression in the gene interaction network.

  • Chapter 2

    Discovering temporal patterns ofexpression in hepatocyte cultures

    2.1 Attribution

    This chapter contains material originally published as Yeonhee Kim, Christopher D Lasher,

    Logan M Milford, T.M. Murali, Padmavathy Rajagopalan (2010) A comparative study of

    Genome-Wide transcriptional profiles of primary hepatocytes in collagen sandwich and

    monolayer cultures. Tissue Engineering Part C: Methods 16: 1449–1460. [46].

    Dr. Yeonhee Kim performed the work described in Sections 2.4.1–2.4.3, while Mr. Lasher

    performed the work described in Section 2.4.5.

    15

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 16

    2.2 Abstract

    Two commonly used culture systems in hepatic tissue engineering are the collagen sand-

    wich (CS) and monolayers of cells. In this study, genome-wide gene expression profiles

    of primary hepatocytes were measured over an 8-day period for each cell culture sys-

    tem using Affymetrix GeneChips and compared via gene set enrichment analysis to elicit

    biologically meaningful information at the level of gene sets. Our results demonstrate

    that gene expression in hepatocytes in CS cultures steadily and comprehensively diverges

    from that in monolayer cultures. Gene sets up-regulated in CS cultures include several

    associated with liver metabolic and synthesis functions, such as metabolism of lipids,

    amino acids, carbohydrates, and alcohol, and synthesis of bile acids. Monooxygenases

    such as Cytochrome-P450 enzymes do not show any change between the culture systems

    after 1 day, but exhibit significant up-regulation in CS cultures after 3 days in comparison

    to hepatocyte monolayers. These data provide insights into the up- and down-regulation

    of several liver-critical gene sets and their subsequent effects on liver-specific functions.

    These results provide a baseline for further explorations into the systems biology of engi-

    neered liver mimics.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 17

    2.3 Introduction

    As one of the important organs in our bodies, the liver performs many essential functions

    such as metabolism, synthesis, secretion, and, detoxification [48]. Hepatocytes are the

    principal cells in the liver, comprising over 80% of its mass. Hepatocytes perform sev-

    eral characteristic functions of the liver, such as lipid metabolism, glucose homeostasis,

    regulation of urea, production of plasma proteins, alcohol clearance, and biotransforma-

    tion of xenobiotics [48]. In hepatic tissue engineering, two widely used culture systems

    are hepatocyte monolayers (HMs) (Figure 2.1a) and the collagen sandwich (CS) (Figure

    2.1b) [49, 50]. In HMs, hepatocytes are cultured on a single-collagen gel. Such cells pro-

    gressively lose their phenotypic characteristics over time. In CS cultures, hepatocytes

    are maintained between two collagen gels and remain stable over extended periods of

    time [59, 60]. Studies have indicated that CS cultures exhibit the preservation of differ-

    entiated functions including secretion of urea, expression of plasma proteins such as al-

    bumin and fibrinogen, polygonal morphology, the presence of bile canaliculi, as well as

    the synthesis of gap junction and tight junction proteins [59,60]. Although morphological

    and physiological characteristics of hepatocytes in CS cultures have been studied exten-

    sively, comprehensive evaluations of temporal genome-wide gene expression programs

    in these culture systems have not been reported. Global gene expression of human hep-

    atocellular carcinoma cells (HepG2) in monolayer and spheroidal cultures revealed up-

    regulated metabolic functions in spheroids but not in monolayer cultures [61]. Since these

    data were taken at a single time point, they did not reveal temporal variations. Another

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 18

    study that monitored temporal gene expression in hepatocyte monolayers cultured over a

    three day time period revealed the down-regulation of cytochrome-P450 expression [62].

    However, neither did this study investigate longer time points nor did it compare mono-

    layers to other, more stable culture conditions. DNA microarray measurements have also

    been used to study specific pathways through which toxicity was conferred in human

    hepatoblastoma cells [63] and to understand the effects of non-parenchymal cells in 2D

    cocultures of hepatocytes with fibroblasts or sinusoidal endothelial cells [64, 65].

    Hepatocyte Monolayer (HM)

    Collagen gel Hepatocyte(a)

    Collagen Sandwich (CS)

    Collagen gel Bile CanaliculiHepatocyte(b)

    Figure 2.1: Schematics of two popular liver cell culture systems. (a) In a hepatocytemonolayer (HM), hepatocytes reside on a layer of collagen. The hepatocytes show a

    progressive loss of phenotype, including cell-shape. (b) In a collagen sandwich (CS), anadditional layer of collagen overlays the hepatocytes. Hepatocytes retain their

    phenotype for several weeks.

    We hypothesized that the enhanced in vivo liver-like phenotypes in CS cultures were a

    result of the underlying differences in the transcriptional program between hepatocytes

    cultured in CS and HMs. Accordingly, genome-wide gene expression profiles of primary

    hepatocytes were measured at four different time points over an 8-day period for each

    cell culture system using Affymetrix GeneChips. Among the wide range of techniques

    that are available to analyze DNA microarray data, a method was desired that would

    summarize, at the level of predefined biological pathways, the differences between the

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 19

    culture conditions at each time point. Gene set enrichment analysis (GSEA) [26] was se-

    lected since it satisfies this criterion. GSEA is one among a family of techniques that can

    summarize differential expression at the level of gene sets [66]. GSEA is widely used,

    generates detailed information on the results, and has shown very good performance in

    a comparison of methods that compute enrichment at the level of gene sets [67]. Fur-

    thermore, GSEA has been used to identify pathways involved in liver toxicity in human

    hepatoblastoma cells [63]. GSEA is designed to identify predefined gene sets that are dif-

    ferentially expressed in a treatment and a control. All the genes expressed on each gene

    chip are ranked based upon their differential expression in CS and HM cultures. There-

    fore, a gene set could be important if its members are clustered within the ranked gene

    list. GSEA measures the statistical significance of the distribution of ranks within the gene

    set against the background of the ranks of all the genes.

    Over the 8-day culture period, the gene expression program of hepatocytes in CS cul-

    tures monotonically diverged from cells cultured as a monolayer. Gene sets that were

    up-regulated to a statistically significant extent in CS cultures included those associated

    with liver-specific functions such as bile acid synthesis and lipid, amino acid, carbohy-

    drate, and alcohol metabolism. Nuclear receptors, which play a key role in control-

    ling the transcriptional activation of target proteins, were up-regulated in CS cultures

    on day 1 in culture. Sets containing genes whose expression is mediated by nuclear

    receptors were up-regulated in CS systems after 1 day. Gene sets related to xenobiotic

    metabolism and monoxygenase activity were not differentially expressed after 1 or 2

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 20

    days, but showed highly significant up-regulation after 3 days, suggesting a recovery

    in expression of the genes in these sets. Numerous gene sets related to the cell cycle

    were down-regulated, suggesting that the cell cycle was arrested in hepatocytes main-

    tained in CS culture systems in comparison to HMs. These findings recapitulated well-

    known aspects of liver function, thereby suggesting that DNA microarrays are a powerful

    tool for shedding light on the transcriptional signatures that underlie differences between

    these two culture systems. The DNA microarray data generated in this study are avail-

    able at NCBI’s Gene Expression Omnibus under accession number GSE20659 at http:

    //www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659. All our results are avail-

    able at the following supplementary website: http://bioinformatics.cs.vt.edu/∼murali/

    supplements/2010-kim-tissue-engineering.

    2.4 Materials and Methods

    2.4.1 Materials

    Dulbecco’s Modified Eagle Medium (DMEM) containing 4.5 g/L glucose, phosphate-

    buffered saline (PBS), penicillin, streptomycin, and trypsin-EDTA were obtained from In-

    vitrogen Life Technologies (Carlsbad, CA). Type IV collagenase, HEPES [4-(2-hydroxyethyl)

    piperazine-1-ethanesulfonic acid], glucagon, and hydrocortisone were obtained from Sigma-

    Aldrich. Unless otherwise noted, all chemicals were used as received from Fisher Scien-

    http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20659http://bioinformatics.cs.vt.edu/~murali/supplements/2010-kim-tissue-engineeringhttp://bioinformatics.cs.vt.edu/~murali/supplements/2010-kim-tissue-engineering

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 21

    tific.

    2.4.2 Hepatocyte Isolation and Culture

    Primary rat hepatocytes were harvested from female Lewis rats (Harlan) that weighed

    between 170 and 200g. Animal care and surgical procedures were conducted as per pro-

    cedures approved by Virginia Polytechnic Institute and State University’s Institutional

    Animal Care and Use Committee. A two-step in situ collagenase perfusion method was

    utilized to excise the liver [59, 60]. Briefly, animals were anesthetized with 3L/min of

    a gas mixture of 3% (v/v) isofluorane/97% oxygen (Veterinary Anesthesia Systems Co.).

    The liver was perfused through the portal vein with Krebs Ringer Buffer (7.13g/L sodium

    chloride, 2.1g/L sodium bicarbonate, 1g/L glucose, 4.76g/L HEPES, and 0.42g/L potas-

    sium chloride) that contained 1mM ethylenediaminetetraacetic acid, followed by serial

    perfusion with a 0.075% w/v and a 0.1% w/v collagenase (Type IV; Sigma-Aldrich) in

    Krebs Ringer Buffer containing 5mM calcium chloride. Cell suspensions were filtered

    through nylon meshes with porosity ranging from 250 to 62µm (Small Parts, Inc.). Hepa-

    tocytes were separated using a Percoll (Sigma-Aldrich) density centrifugation technique.

    Cell viability was determined by trypan blue exclusion. Hepatocytes were cultured on

    collagen-coated 6-well sterile tissue culture plates (Becton Dickinson Labware) and were

    maintained in a culture medium that consisted of DMEM supplemented with 10% heat-

    inactivated fetal bovine serum (Hyclone), 200U/mL penicillin, 200µg/mL streptomycin,

    20ng/mL epidermal growth factor (BD Biosciences), 0.5U/mL insulin (USP), 14ng/mL

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 22

    glucagon, and 7.5g/mL hydrocortisone. A collagen gelling solution was prepared by

    mixing nine parts of type I collagen (BD Biosciences) solution and one part of 10 DMEM.

    Sterile 6-well tissue culture plates were coated with 0.5mL of the gelling solution and in-

    cubated at 37◦C for 1h to promote gel formation. Isolated hepatocytes were suspended

    in hepatocyte culture medium at a concentration of 1× 106 cells/mL and seeded on the

    collagen-coated wells at a density of 1 million cells/well. CS cultures were formed by the

    deposition of a second layer of collagen 1 day after the hepatocytes were seeded [59, 60].

    Hepatocytes maintained in stable CS and in unstable confluent HM cultures served as

    positive and negative controls, respectively. Hepatocyte cultures were maintained at 37◦C

    in a humidified gas mixture of 90% air/10% CO2. The culture medium was replaced ev-

    ery 24 h.

    2.4.3 RNA Extraction and Gene Chip Hybridization

    Primary rat hepatocytes cultured in CS and HM cultures were maintained for an 8-day

    culture period. The samples were analyzed at four time points: days 1, 2, 3, and 8 after de-

    position of the second layer of collagen gel on hepatocytes. Total RNA was extracted and

    purified from cells for each culture system using an RNeasy mini kit (Qiagen) following

    the manufacturer’s protocol. Isolated RNA samples in triplicate at each time point were

    labeled according to the Affymetrix Standard Target labeling process, hybridized to the

    GeneChip Rat Genome 230 2.0 array (Affymetrix), and scanned as described by the manu-

    facturer. Complementary RNA (cRNA) synthesis, hybridization, and scanning were per-

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 23

    Contrast name Treatment Control

    Collagen sandwich vs. Monolayer cultures

    CS vs. HM 1d Collagen sandwich 1 day Hepatocyte monolayer 1 day

    CS vs. HM 2d Collagen sandwich 2 days Hepatocyte monolayer 2 days

    CS vs. HM 3d Collagen sandwich 3 days Hepatocyte monolayer 3 days

    CS vs. HM 8d Collagen sandwich 8 days Hepatocyte monolayer 8 days

    Within Collagen sandwich

    CS 8d vs. 1d Collagen sandwich 8 days Collagen sandwich 1 day

    CS 8d vs. 2d Collagen sandwich 8 days Collagen sandwich 2 days

    CS 8d vs. 3d Collagen sandwich 8 days Collagen sandwich 3 days

    Table 2.1: Contrasts analyzed using GSEA.

    formed at the Virginia Bioinformatics Institute Core Laboratory facility as follows. Briefly,

    total RNA was converted into double-stranded complementary DNA using a T7-oligo

    (dT) primer (5′–GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(dT)24–3′)

    and reverse transcription. Synthesized cDNA was converted into biotinylated cRNA by

    transcription using T7 RNA polymerase. Randomly fragmented cRNA was hybridized to

    GeneChip and the arrays were washed and stained according to Affymetrix’s protocols.

    The arrays were scanned using an Affymetrix 7G scanner.

    2.4.4 Microarray Data Analysis

    The BioConductor package [68] was used to perform initial statistical analysis of the DNA

    microarray data. The data from 24 chips (2 culture conditions × 4 time points × 3 repli-

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 24

    cates) were normalized using the Robust Multichip Average method for further analysis.

    The affylmGUI interface to Linear Models for Microarray Data (LIMMA) [20] was used to

    perform differential gene expression analysis for the contrasts shown in Table 2.1. Specif-

    ically, for each contrast, LIMMA was used to compute a p-value for each probe set that

    indicated the statistical significance of the difference of the expression levels of that probe

    set between the two conditions in the contrast.

    2.4.5 Gene Set Enrichment Analysis

    The normalized gene expression data were analyzed using Gene Set Enrichment Analysis

    (GSEA) [26]. Given replicate gene expression measurements for a control phenotype (e.g.,

    HM at 1 day) and for a treatment phenotype (e.g., CS at 1 day), GSEA starts by ranking

    all genes by the extent of their differential expression in the two phenotypes. Thus, the

    lower the rank of a gene, the more up-regulated it is in the treatment, when compared

    to the control. Next, given a gene set of interest (e.g., the genes involved in metabolism

    of xenobiotics), GSEA uses a modified Kolmogorov-Smirnov test [69] to determine if the

    genes in this set have surprisingly high or low ranks ranks of genes in the gene set. This

    score has the following interpretation: the more positive the score, the more up-regulated

    the genes in the gene set are in the treatment (compared to the control), and the more

    negative the score, the more down-regulated the genes in that gene set are. Since the size

    of a gene set may influence its enrichment score, GSEA controls this bias by performing

    a permutation test and calculating a p-value that represents the statistical significance of

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 25

    the enrichment score. Finally, GSEA converts the p-value into a q-value that measures the

    false discovery rate, after adjusting for multiple hypothesis testing. Note that the q-value

    is unsigned but the enrichment score is signed (positive for overall up regulation and

    negative for overall down regulation). We applied GSEA using the following criteria:

    1. Sort genes in decreasing order of the signal-to-noise measure.

    2. Compute p-values using 10,000 permutations of the sample-to-phenotype associa-

    tions.

    3. Report all gene sets with q-value (false discovery rate) at most 0.2. Note that with

    this cut-off, we expect one out of five gene sets to be a false discovery.

    2.5 Results

    LIMMA and GSEA were applied to compare the two culture conditions, as shown in Ta-

    ble 2.1. The first set of four contrasts compared the hepatocyte transcriptional program in

    CS cultures to that in HMs, at each of the four time-points analyzed. These contrasts were

    expected to reveal time-dependent differences between these two culture conditions. The

    second set of three contrasts compared CS samples to each other: 8 days to 1 day, 8 days

    to 2 days, and 8 days to 3 days. Such contrasts were expected to provide information on

    how transcriptional programs may vary within CS cultures condition over time.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 26

    p-value cut-off CS vs. HM 1d CS vs. HM 2d CS vs. HM 3d CS vs. HM 8d

    10−5 31 224 1046 2242

    0.0001 61 362 1535 3092

    0.001 118 569 2277 4287

    0.01 276 1095 3497 6185

    0.05 552 1812 5134 8551

    Table 2.2: The number of differentially-expressed genes in the each of the four CS vs.HM contrasts at different p-value cutoffs.

    2.5.1 The transcriptional program in CS cultures steadily and compre-

    hensively diverges from that in HMs

    For each of the first four contrasts in Table 2.1, the number of differentially-expressed

    probe sets was counted after applying different cutoffs on the p-values computed by

    LIMMA. The first column in Table 2.2 indicates the p-value cutoff, while each of the other

    four columns show the number of probe sets whose p-value meets the cutoff specified in

    each row. An important feature revealed by these data is the monotonic divergence be-

    tween the transcriptional programs of CS and HM samples over the 8-day culture period.

    For each cutoff, the number of differentially-expressed probe sets increased steadily from

    day 1 to day 8. Furthermore, this trend was maintained even over a variation of four

    orders of magnitude in the p-value cutoff. On day 8, as many as 6185 probe sets had a p-

    value of at most 0.01 (2242 had a p-value of at most 10−5). Since, the Affymetrix Rat230 2

    GeneChip has 31,099 probe sets, these results suggest widespread transcriptional pertur-

    bation in CS cultures compared to HM cultures.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 27

    Upon the identification of the global trends, GSEA was employed to study patterns of

    differential expression in specific gene sets. GSEA was applied to the gene expression

    data obtained through our experiments and to the following gene sets in the Molecular

    Signature DataBase (MSigDB): 1892 curated gene sets from various sources such as on-

    line pathway databases, publications in PubMed, and knowledge of domain experts; 837

    motif gene sets containing genes that share a cis-regulatory motif that is conserved across

    the human, mouse, rat, and dog genomes; and 1454 gene sets corresponding to genes

    annotated by different Gene Ontology (GO) terms. For each contrast in Table 2.1 and for

    each of these gene sets, GSEA was used to compute a q-value.

    Gene sets were filtered to those that exhibited a monotonic up-regulation in the CS-HM

    comparison. Specifically, gene sets were restricted to those whose q-values decreased

    monotonically from 1 day to 8 days and whose enrichment scores were positive in all

    four CS-HM contrasts (the first four contrasts in Table 2.1). Gene sets that were monoton-

    ically down-regulated in collagen sandwiches over the 8 day period were also identified

    (q-values decreasing monotonically and negative enrichment scores in all four CS-HM

    contrasts). Since MSigDB collates data from several sources, many gene sets in it have

    high degrees of overlap. When an overlapping group of gene sets is up-regulated or

    down-regulated in our data, only one gene set per group is discussed below. The com-

    plete sets of results are available on our supplementary web page.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 28

    2.5.2 Liver-specific gene sets are up-regulated starting on day 1 or day

    2 in CS cultures

    Color legend for up-regulated q-valuesq-value 1 0.2 0.05 0.01 0.001 0.0001Color

    Figure 2.2: Liver-specific up-regulated gene sets. The legend below shows the q-valueranges for each color. The color scheme used in these figures is RdYlGn from Color

    Brewer (http://colorbrewer2.org). CS, collagen sandwich; HM, hepatocyte monolayer.

    Hsiao [70] created a compendium of gene expression in normal human tissues with the

    goal of defining a reference for basic organ systems biology. They identified 251 genes

    expressed selectively in the liver, which are included in MSigDB in the HSIAO LIVER -

    SPECIFIC GENES gene set. In a similar study, Su as. [71] profiled gene expression from

    91 human and mouse samples across a diverse array of tissues, organs and cell lines. They

    identified 37 genes that were expressed specifically in human liver tissue samples; these

    genes belong to the HUMAN TISSUE LIVER gene set. The gene sets HSIAO LIVER -

    SPECIFIC GENES and LIVER SPECIFIC GENES were up-regulated significantly at day

    1 and day 2, respectively. They were monotonically up-regulated on subsequent days.

    Both these gene sets had insignificant q-values in the CS-CS contrasts, suggesting that

    liver-specific genes are up-regulated on day 1 in CS cultures, and that they continue to

    http://colorbrewer2.org

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 29

    be monotonically up-regulated on subsequent days (Figure 2.2).The presence and con-

    centration of albumin is often used as a marker of phenotypic function of in vitro hep-

    atic models [59, 60]. The Albumin gene (ALB) was expressed in several gene sets such

    as HSIAO LIVER SPECIFIC GENES and V$HNF1 Q6 that were up-regulated over the 8

    day culture period (Figure 2.2). The promoter regions of genes in the set V$HNF1 Q6 con-

    tain binding sites for hepatic nuclear factor (HNF1), a transcription factor that activates

    gene expression of albumin [72,73]. This gene set has an overlap of 25 genes with the gene

    set HSIAO LIVER SPECIFIC GENES, indicating that HNF1 monotonically up-regulates

    the expression of albumin and other liver-specific genes in CS cultures but not in HMs.

    These observations support the conclusion that transcriptional programs that have been

    identified in other datasets to be liver-specific are active through the 8-day period in CS

    cultures but are not active in HMs.

    2.5.3 Gene sets involved in cholesterol, fatty-acid, alcohol, and carbo-

    hydrate metabolism are significantly up-regulated starting on day

    1 or day 2 in CS cultures

    Cholesterol metabolism

    Cholesterol metabolism is an important component of hepatic phenotypic function [48].

    The trends exhibited by gene sets linked to cholesterol metabolism in our data were in-

    vestigated. Multiple gene sets involved in cholesterol metabolism were up-regulated in

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 30

    CS vs. HM 1d 2d 3d 8d

    Set Name Description

    Cholesterol metabolism HSA00120_BILE_ACID_BIOSYNTHESIS KEGG bile acid synthesis genes

    MONOOXYGENASE_ACTIVITY (GO MF) integration of one oxygen atom into a compound CELLULAR_LIPID_METABOLIC_PROCESS (GO BP) lipid reactions and pathways

    CARBOXYLIC_ACID_TRANSM_TRANSP (GO MF) transfer of carboxylic acid across a membrane LIPID_TRANSPORT (GO BP) transport into, out of, or between cells NUCLEAR_RECEPTORS GenMAPP nuclear receptor genes

    Fatty-Acid Metabolism HSA00071_FATTY_ACID_METABOLISM KEGG fatty acid metabolism pathways MITOCHONDRIAL_FATTY_ACID_BETA GenMAPP fatty acid oxidation in mitochondria PEROXISOME (GO CC) associated with peroxisome HSA03320_PPAR_SIGNALING_PATHWAY KEGG PPAR signaling pathway

    Alcohol Metabolism ALCOHOL_METABOLIC_PROCESS (GO BP) reactions and pathways involving alcoholsHSA00071_FATTY_ACID_METABOLISM KEGG fatty acid metabolism pathways HSA00980_METABOLISM_OF_XENOBIOTICS metabolism of xenobiotics by cytochrome P450

    Carbohydrate Metabolism GLUCOSE_METABOLIC_PROCESS (GO BP) pathways involving glucose HSA00010_GLYCOLYSIS_AND_GLUCON KEGG glycolysis and gluconeogenesis pathways

    Color legend for up-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001Color

    Figure 2.3: Up-regulated gene sets involved in cholesterol, fatty-acid, alcohol, andcarbohydrate metabolism. The legend below shows the q-value ranges for each color.

    (Abbreviations: CARBOXYLIC ACID TRANSM TRANSP forCARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER ACTIVITY;

    MITOCHONDRIAL FATTY ACID BETA forMITOCHONDRIAL FATTY ACID BETAOXIDATION;

    HSA00980 METABOLISM OF XENOBIOTICS forHSA00980 METABOLISM OF XENOBIOTICS BY CYTOCHROME P450;

    HSA00010 GLYCOLYSIS AND GLUCON forHSA00010 GLYCOLYSIS AND GLUCONEOGENESIS.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 31

    Acetyl-Coenzyme A

    Cholesterol

    Nuclear ReceptorsLiver X receptor (LXR)

    Retinoid X receptor (RXR)Cholesterol: LXR ligand; 9-cis retinoic acid: RXR ligand

    Cytochrome P450CYP7A1, 8B1, 27A1

    Bile AcidsCholic Acid (CA)

    Chenodeoxycholic Acid (CDCA)

    Nuclear ReceptorsFarnesoid X receptor (FXR)Retinoid X receptor (RXR)

    Bile acids: FXR ligand; 9-cis retinoic acid: RXR ligand

    ATP-Binding Cassette Transporter

    ABCB11

    Nuclear ReceptorSmall heterodimer partner

    (SHP)

    Activation

    Inhibition

    (A)

    (B)

    (C-D)

    (D)

    (A-B)

    (E) (A)

    No.

    CARBOXYLIC_ACID_TRANSMEMBRANE_TRANSPORTER_ACTIVITY

    CELLULAR_LIPID_METABOLIC_PROCESSANUCLEAR_RECEPTORSBMONOOXYGENASE_ACTIVITYCHSA00120_BILE_ACID_BIOSYNTHESISD

    E

    Name of gene set

    [21-25]

    [21-26]

    [27]

    [21-25, 28]

    [21-24]

    [21-25]

    [28-29]

    [28]

    Figure 2.4: Pathway for cholesterol metabolism that shows gene sets involved in thisprocess.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 32

    CS cultures compared to HMs. These gene sets include HSA00120 BILE ACID BIOSYN-

    THESIS, MONOOXYGENASE ACTIVITY, CELLULAR LIPID METABOLIC PROCESS,

    NUCLEAR RECEPTORS, and CARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER -

    ACTIVITY (Figure 2.3). Bile acids mediate cholesterol metabolism and the synthesis of

    bile acids is initiated through the activity of CYP7A1, CYP8B1 and CYP27A1 enzymes

    [74–76] (Figure 2.4). In CS samples, the three CYP enzymes mentioned above are present

    either in the gene set HSA00120 BILE ACID BIOSYNTHESIS or in the gene set MONOOXY-

    GENASE ACTIVITY, both of which are up-regulated in CS cultures. The gene expression

    of CYP enzymes is activated by nuclear receptors—specifically, the retinoid X receptor

    (RXR) and the liver X receptor (LXR) [77–80]. The gene set NUCLEAR RECEPTORS,

    which contains nuclear receptors involved in the activation of hepatic functions, behaved

    similarly to the liver-specific gene sets discussed earlier: it had insignificant q-values in

    the CS-CS contrasts, suggesting that nuclear receptors were up regulated on day 1 in CS

    cultures and remained up-regulated on subsequent days. The nuclear receptor Farnesoid

    X receptor (FXR) plays a critical role in liver functioning. FXR is responsible for regulating

    the concentration of bile acids [74–78,81,82]. Bile acid-mediated activation of FXR leads to

    the transcriptional activation of the ATP-binding cassette transporter B11 (ABCB11, also

    known as bile salt export pump), a process that is crucial for cholesterol secretion into

    the bile canaliculi [80, 81]. In CS samples, the transcription of ABCB11, present in gene

    set CARBOXYLIC ACID TRANSMEMBRANE TRANSPORTER ACTIVITY, was shown

    to be up-regulated over the culture period in comparison to HMs. This gene set contains

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 33

    genes annotated with the Gene Ontology molecular function that involves the catalysis of

    the transfer of carboxylic acids from one side of the membrane to the other. These trends

    and data indicate that genes responsible for the formation, transformation, and transport

    of bile acids are up-regulated in CS cultures, thereby promoting cholesterol metabolism.

    Fatty Acid Metabolism (PPARα-mediated metabolism)

    Peroxisome proliferator-activated receptor α (PPARα) is a nuclear receptor that activates

    gene expression of enzymes linked to fatty acid metabolism [83–86]. PPARα-mediated

    fatty acid metabolism initiates transcriptional activation of liver fatty acidbinding protein

    (L-FABP or FABP1), which deliver fatty acids to its cognitive nuclear receptor, PPARα,

    and promote expression of two transporters, ABCD2 and ABCD3, which are necessary

    to transport fatty acids into peroxisomes, where target enzymes catalyze the clearance of

    fatty acids [74, 83, 85, 86]. PPARα, being dependent on intracellular FABP concentrations,

    regulates expression of Acyl-CoA oxidases (ACOXs), short/branched-, long-and very

    long-chain Acyl-CoA dehydrogenase (ACADs), and mitochondrial enzymes involved in

    -oxidation [87–91]. In our data, the gene sets involved in PPARα-mediated fatty acid

    metabolism are HSA00071 FATTY ACID METABOLISM, PEROXISOME, HSA03320 PPAR -

    SIGNALING PATHWAY, and MITOCHONDRIAL FATTY ACID BETAOXIDATION (Fig-

    ure 2.3). All these gene sets were monotonically up-regulated in CS samples over the 8-

    day period in contrast to HMs. The gene FABP1, which belongs to the gene set HSA03320 -

    PPAR SIGNALING PATHWAY, was up-regulated in CS cultures and its expression in-

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 34

    creased over time. In response to the expression of FABP1, the gene PPARα is expressed

    [83]. The PPARα signaling pathway promotes the transcriptional activation of fatty acid

    metabolic enzymes [ACOX (acyl-CoA oxidase), ACAD (acyl-CoA dehydrogenase), CAT

    (carnitine palmitoyltransferase), LPL (lipoprotein lipase) and ACAT (acetyl-CoA acetyl-

    transferease)] [83, 87–89]. These genes are members of the HSA00071 FATTY ACID -

    METABOLISM, PEROXISOME, and MITOCHONDRIAL FATTY ACID BETAOXIDATION

    gene sets. The combination of the expression of key enzymes responsible for fatty acid

    metabolism as well as the expression of two members of the ABC transporter family

    (ABCD2 and ABCD3) indicate that PPARα-mediated metabolism was up-regulated in CS

    samples.

    Alcohol Metabolism

    Alcohol, specifically ethanol, is metabolized in the liver by several enzymes present in

    the subcellular compartments of hepatocytes. Alcohol dehydrogenase (ADH), a key cy-

    toplasmic enzyme plays an important role in converting ethanol to acetaldehyde [92–94].

    Acetaldehyde, a toxic molecule, is subsequently converted to nontoxic acetates by mi-

    tochondrial acetaldehyde dehydrogenase (ALDH) [92, 94]. Additionally, CYP2E1 en-

    ables the clearance of ethanol through an oxidative reaction [94–96] (Figure 2.5). The

    gene sets ALCOHOL METABOLIC PROCESS, HSA00980 METABOLISM OF XENOBI-

    OTICS BY CYTOCHROME P450, and HSA00071 FATTY ACID METABOLISM were up-

    regulated over time in CS cultures in comparison to HMs (Figure 2.3). These gene sets

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 35

    Ethanol

    Acetaldehyde

    Acetate

    CO2+H2O

    Aldehydedehydrogenase

    Cytochrome P450 2E1Alcohol

    dehydrogenase

    NAD+

    NADH

    NAD+

    NADHNAD+: Nicotinamide adenine dinucleotideNADH: Nicotinamide adenine dinucelotide (Reduced form)

    A

    D

    CB

    No.

    HSIAO_LIVER_SPECIFIC_GENES

    HSA00980_METABOLISM_OF_XENOBIOTICS_BY_CYTOCHROME_P450

    HSA00071_FATTY_ACID_METABOLISMALCOHOL_METABOLIC_PROCESSName of gene set

    (A-C)

    (A-C)

    (C, D)[39-41]

    [39-41]

    [39-43]

    Figure 2.5: Pathway for alcohol metabolism that shows gene sets involved in thisprocess.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 36

    include three alcohol dehydrogenase genes ADH, ADH1A, and ADH7 that mediate the

    transformation of alcohol.

    Carbohydrate Metabolism

    Gluconeogenesis and glycolysis are essential to maintain glucose homeostasis [97–100].

    The maintenance of a healthy glucose level is dependent upon the presence and con-

    centration of insulin and glucagon [97–99]. In our gene expression data, genes for key

    enzymes involved in the formation and metabolism of glucose are up-regulated in CS cul-

    tures in contrast to HMs. The relevant gene sets include HSA00010 GLYCOLYSIS AND -

    GLUCONEOGENESIS and GLUCOSE METABOLIC PROCESS (Figure 2.3). Specifically,

    genes corresponding to enzymes implicated in glycolysis are hexokinase (HK) , glucose

    phosphate isomerase (or phosphoglucose isomerase, GPI or PGI), phosphofructokinase

    (PFK), aldolase (ALDOB and ALDOC), triosephosphate isomerase (TPI1), phosphoglyc-

    erate kinase 1 (PGK1), phosphoglycerate mutase (PGAM), pyruvate kinase (PKLR), and

    lactate dehydrogenase C (LDH) (terminal) [100]. PKLR catalyzes the transphosphory-

    lation of phosphoenolpyruvate into pyruvate and ATP, which is the rate-limiting step

    of glycolysis. LDH catalyzes the terminal step in glycolysis. Genes that code for en-

    zymes involved in gluconeogenesis are phosphoenolpyruvate carboxykinase 1 (PCK or

    PEPCK), glucose-6-phosphatase (G6PC), pyruvate carboxylase (PC) and fructose-1, 6-

    bisphophatase 1 (FBP1) [99–101]. In the gene sets related to carbohydrate metabolism,

    genes such as ALDOB, ALDOC, PKLR, PFK, PGK1, and GPI that are involved in glycoly-

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 37

    sis and genes such as G6PC, FBP1, PCK1, and PC, which are involved in gluconeogenesis,

    were upregulated in CS samples.

    Urea Production

    Color legend for up-regulated q-valuesq-value 1 0.2 0.05 0.01 0.001 0.0001Color

    Figure 2.6: Up-regulated gene sets involved in urea production. The legend belowshows the q-value ranges for each color. Abbreviations: GO, gene ontology; BP,

    biological process; HSA00220 UREA CYCLE forHSA00220 UREA CYCLE AND METABOLISM OF AMINO GROUPS; KEGG, KyotoEncyclopedia of Genes and Genomes; GenMAPP, Gene Map Annotator and Pathway

    Profiler.

    In the liver, the formation of urea is a critical step in ammonia clearance. The metabolism

    of amino acids results in the formation of urea through the conversion of glutamate,

    an intermediate metabolite [48]. Gene sets involved in glutamate metabolism such as

    HSA00251 GLUTAMATE METABOLISM are gradually up-regulated over time in CS cul-

    tures (Figure 2.6). Urea is formed as a result of the action of five enzymes: carbamoyl

    phosphate synthetase-1 (CPS-1), ornithine transcarbamoylase (OTC), argininosuccinate

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 38

    Glutamine

    Glutamate

    NH4+ CarbamoylPhosphate OrnithineTranscarbamoylase

    Argininosuccinatesynthase Argininosuccinate

    lysase

    Arginase

    Urea

    Aspartate

    Urea Cycle

    Glutaminase

    Glutamatedehydrogenase

    Carbamoylphosphatesynthase-1

    Citrulline

    Arginino-succinate

    Arginine

    Ornithine

    (A)

    (A)

    (B)

    (C, D)

    (C)(C, D)

    (C, D)

    NITROGEN_COMPOUND_METABOLIC_PROCESS DC

    BANo.

    NITROGEN_COMPOUND_CATABOLIC_PROCESS

    HSA00251_GLUTAMATE_METABOLISMHSA00220_UREA_CYCLE_AND_METABOLISM_OF_AMINO_GROUPS

    Name of gene set

    [52-54]

    [52-54]

    [49-54]

    [49-54]

    [49-54]

    [49-54]

    [49-54]

    Figure 2.7: Pathway for urea production that shows gene sets involved in this process.

    synthase (ASS), argininosuccinate lysase (ASL), and arginase (ARG) [102–107] (Figure 2.7).

    These five genes are present in the gene set HSA00220 UREA CYCLE AND METABOLISM -

    OF AMINO GROUPS. This gene set is up-regulated in CS cultures over the 8-day period.

    In addition, the gene sets NITROGEN COMPOUND CATABOLIC PROCESS and NI-

    TROGEN COMPOUND METABOLIC PROCESS include genes such as ASL, ARG and

    ASS. Both gene sets are also monotonically up-regulated in CS cultures. The nuclear re-

    ceptor HNF-4α (a member of the up-regulated gene set NUCLEAR RECEPTORS) plays

    an important role in triggering the transcription of key enzymes for urea production [102].

    Together, these data provide information on why urea production is stable in CS cultures

    but not in HMs.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 39

    2.5.4 Mono-oxygenases are initially not differentially expressed but re-

    cover after day 3 in CS cultures

    CS vs. HM 1d 2d 3d 8d

    Set Name Description

    0.03 0.01 6e-03 7e-05 MONOOXYGENASE_ACTIVITY (GO MF) integration of one oxygen atom into a compound

    1.00 0.20 2e-03 1e-03 HSA00980_METABOLISM_OF_XENOBIOTICS metabolism of xenobiotics by cytochrome P450

    Color legend for up-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001 Color

    Figure 2.8: Gene sets that show recovery after day 3. We show the q-values to underscorethe recovery. MF, molecular function.

    Xenobiotic metabolism in the liver is mediated through cytochrome P450 enzymes [108–

    111]. Expression of these enzymes has been shown to decrease upon the isolation of

    hepatocytes from the liver [110, 111]. The gene set MONOOXYGENASE ACTIVITY con-

    tains several cytochrome P450 genes and flavin containing monooxygenase. This gene

    set had q-values of 0.03, 0.01, 6× 10−3, and 7× 10−5 at days 1, 2, 3, and 8, respectively

    (Figure 2.8). The q-values at day 1 and day 2 were nearly identical but decreased by an

    order of magnitude at day 3 and by another order of magnitude at day 8. These trends

    were examined further by computing the q-values for this gene set in the CS-CS con-

    trasts. This gene set was up-regulated with q-value 0.13 for the day 8-day 1 contrast,

    0.11 for the day 8-day 2 contrast, and 1.00 for the day 8-day 3 contrast. The statistical

    significance values for corresponding contrasts among the HM samples were also com-

    puted. This gene set was down-regulated in all three contrasts. The q-values were 0.13,

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 40

    0.06, and 0.13, respectively. Thus, the variation in expression of this gene set arises from a

    combination of up-regulation in CS cultures and down-regulation in HM cultures. Taken

    together, these trends indicate that the genes in this set recovered or became up-regulated

    at day 3 and later within CS cultures, in comparison to HM cultures. This trend is of

    significance since genes in this set include those that encode for CYP3A, CYP4A, CYP1A

    and CYP2C enzymes. CYP3A and 4A enzymes metabolize a wide range of pharmaceu-

    ticals and drugs and the CYP2C and CYP1A enzymes break down toxins and xenobi-

    otics. The gene set HSA00980 METABOLISM OF XENOBIOTICS BY CYTOCHROME -

    P450, which contains cytochrome P450s, phase II metabolizing enzymes such as UDP-

    glucuronosyltransferase (UDP-GT) isoforms, and glutathione S-transferase (GST), also

    exhibited a similar trend. It showed no significant regulation at day 1, had a q-value of

    0.20 at day 2, but had q-values an order of magnitude less at days 3 and 8 (2× 10−3 and

    1× 10−3, respectively). In the three CS-CS contrasts, the q-values for this gene set were

    0.12, 0.05, and 0.13 (all for up regulation), while they were 0.1, 0.12, and 1 in the HM-HM

    contrasts (all for down regulation). In a previous study, the expression of a single CYP

    enzyme, specifically, CYP1A1 was monitored and was shown to “recover” on day 3 [112].

    However, additional time points or CYP enzymes were not investigated. The results in

    this work point to a more widespread recovery phenomenon among the CYP gene family.

  • Christopher D. Lasher Chapter 2. Patterns of expression in hepatocyte cultures 41

    CS vs. HM Set Name Description 1d 2d 3d 8d

    Cell Cycle MITOTIC_CELL_CYCLE (GO BP) participation in eukaryotic cell cycle events CELL_CYCLE_KEGG KEGG cell cycle pathway

    Nuclear Transport

    NUCLEOCYTOPLASMIC_TRANSPORT (GO BP) movement of molecules between nucleus and cytoplasm PROTEIN_IMPORT_INTO_NUCLEUS (GO BP) protein transport into nucleus

    Cell Replication MICROTUBULE_CYTOSKELETON (GO CC) microtubules of the cytoskeleton MICROTUBULE_ORGANIZING_CENTER (GO CC) region of microtubule growth

    SPINDLE (GO CC) microtubule array for segregating duplicated chromosomes CENTROSOME (GO CC) centriole and spindles

    Color legend for down-regulated q-values q-value 1 0.2 0.05 0.01 0.001 0.0001 Color

    Figure 2.9: Down-regulated gene sets. The legend below the table shows the q-valueranges for each color. CC, cellular component.

    2.5.5 Cell-cycle activity decreases significantly in CS cultures

    Analysis of the down regulated gene sets presented interesting insights into cellular re-

    sponse within CS and HM systems. Our results suggested a significant difference in cell

    cycle activity between the HM and CS samples over the 8 day culture period. The mono-

    tonic down regulation of the gene sets MITOTIC CELL CYCLE and CELL CYCLE KEGG

    coupled with insignificant q-values in the CS-versus-CS comparisons (data not shown)

    suggested decreasing cell cycle activity within the CS cultures (Figure 2.9). Nuclear trans-

    port and import functions show decreased activity within CS samples as indicated by the

    monotonically down regulated gene sets NUCLEOCYTOPLASMIC TRANSPORT a