genomic convergence association studies of …ss495bw5478/...whether discussing my project, the...
TRANSCRIPT
GENOMIC CONVERGENCE ASSOCIATION STUDIES OF
EXPRESSION AND AGING IN THE HUMAN KIDNEY
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF GENETICS
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Heather Elizabeth Wheeler
March 2010
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/ss495bw5478
© 2010 by Heather Elizabeth Wheeler. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Stuart Kim, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Gregory Barsh
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Anne Brunet
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Hua Tang
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
iv
Abstract
Although family studies have shown that genes play a role in longevity and
tissue aging, it has proven difficult to identify the specific genetic variants involved.
Kidneys age at different rates, such that some people show little or no effects of aging
whereas others show rapid functional decline. We developed a sequential
transcriptional profiling and expression quantitative trait loci (eQTL) mapping
approach known as genomic convergence to find genes associated with aging in the
kidney. We first performed whole-genome transcriptional profiling to find 630 genes
that change expression with age in the kidney. Next, we used two methods to
determine which of these age-regulated genes are eQTLs, which means they contain
SNPs whose alleles associate with expression level. We found that 101 of the age-
regulated genes are eQTLs. We also found that the allele-specific eQTL detection
method, which compares the mRNA levels of the two alleles within heterozygous
individuals, was more sensitive than the total expression method in detecting allelic
expression differences. We tested the eQTLs for association with kidney aging,
measured by glomerular filtration rate (GFR) using combined data from the Baltimore
Longitudinal Study of Aging (BLSA) and the InCHIANTI study. We found a SNP
association (rs1711437 in MMP20) with kidney aging (uncorrected p = 3.6 x 10-5,
empirical p = 0.01) that explains 1-2% of the variance in GFR among individuals.
The results of this sequential analysis may provide the first evidence for a gene
association with kidney aging in humans. Our approach of combining both expression
and genotype data can be applied to any phenotype of interest to increase the power to
find genetic associations.
v
Acknowledgements
I am thankful for the opportunity to pursue my Ph.D. in genetics at Stanford
University. I am grateful to my advisor, Dr. Stuart Kim, for allowing me to join his
lab and study the genetics of human aging. His encouragement and guidance helped
me complete this dissertation. I appreciate all of his work securing collaborations in
order for me to gather the data I needed for this project. Whether discussing my
project, the latest discovery in the aging field or whether the Twins and/or Red Sox
were going to make the playoffs, my talks with Stuart were always entertaining, and
often educational.
This dissertation would not have been possible without the help of multiple
collaborators. Thanks to the donors for providing kidney tissue. John Higgins of the
Stanford Pathology Department and Janet Bueno of the Stanford Tissue Bank helped
me obtain kidney samples. Rick Myers, Devin Absher, Jun Li, Shannon Brady and
Amita Aggarwal from the Stanford Human Genome Center assisted with Illumina
genotyping assays. Ron Davis and Julie Wilhelmy from the Stanford Genome
Technology Center provided technical assistance for the Affymetrix expression
microarrays. Luigi Ferrucci, Jeff Metter and Toshiko Tanaka from the National
Institute on Aging provided genotype and phenotype data from the Baltimore
Longitudinal Study of Aging and the InCHIANTI Study. I also thank the members of
my thesis committee for their invaluable guidance and support: Greg Barsh, Anne
Brunet and Hua Tang. Thank you to John Boothroyd for serving as the chair of my
thesis committee.
vi
I would like to thank the members of the Kim Lab who have worked alongside
me. I have learned much from such a diverse group of people who work on such
diverse projects. First, I thank Jacob Zahn, Graham Rodwell, Rebecca Sonu and
Emily Crane, who performed the initial transcriptional studies from which my
dissertation stems. I also thank two undergraduate researchers who assisted me with
my project, Kaeley Anderson and Mandy Kovach. In addition, I thank Jesse
Karmazin, who will be extending my project. I thank the lab’s “computational
people” for useful programming and statistical tips: Lucy Southworth, Sarah
Kummerfeld and Sarah Aerni. I am grateful to Flo Pauli and Yelena Budovskaya for
making life inside and outside of the lab more fun. And I thank the rest of the “worm
people”: Kendall Wu, Min Jiang, Xiao Liu, Adolfo Sanchez-Blanco, Xiao Xu, Eric
Van Nostrand, Dror Sagi, Cindie Slightham, and Biff Mann. Thanks to you, I feel like
I have earned a second Ph.D. in C. elegans aging. Someday, I will look back fondly
on our marathon practice talks and smile about irregular balls and sheep clouds.
I must thank the following graduate school friends for their support both in lab
and life matters: Tovi Anderson, Jason Hoyt, Alayne Brunner, Rayka Yokoo, Yuya
Kobayashi, Max Banko, Alyssa Wright, Matt Hill and Dasha Glazer. I also would like
to thank my fellow members of the departmental softball team, the Lethal Yellows, for
some good times spent outside the lab over the years. I feel fortunate to have been a
member of the Genetics Department at Stanford, which provided me a friendly and
supportive environment to grow as a scientist.
I am also grateful for several teaching experiences I received while at Stanford.
I designed and taught two semesters of genetics at the University of California
vii
Berkeley Extension. Through this endeavor I gained valuable experience teaching a
class on my own. I thank Jim Ford for the opportunity to be a TA in his Human
Genetics class for medical students at Stanford. I also thank Katherine Moser and
Stuart Kim for allowing me to help design and implement four laboratory projects in
the AP Biology class at Gunn High School in Palo Alto.
Without my family, I would not have been able to accomplish so much. To my
parents, Terry and Tracy Wheeler, thank you for your love and support. To my sister,
Jamie Wheeler, thank you for your love and friendship. And to Marty Gabel, thank
you for being my loving partner and continuing this journey with me.
viii
Table of Contents Abstract ........................................................................................................................ iv
Acknowledgements .......................................................................................................v List of Tables.................................................................................................................x
List of Figures ..............................................................................................................xi
Chapter 1: Introduction..............................................................................................1 Genetics of Human Longevity .................................................................................2 Genetics of Kidney Aging ........................................................................................6 Genomic Convergence..............................................................................................9 Genome-wide Transcriptional Profile of Kidney Aging .....................................10 Significance and Dissertation Content..................................................................13
Chapter 2: Identification of eQTLs by Total Expression Analysis.......................20
Background .............................................................................................................21 Results......................................................................................................................22
Ancestry Analysis.................................................................................................22 Total Expression QTL Analysis ...........................................................................23
Discussion ................................................................................................................25 Methods ...................................................................................................................26
Stanford Kidney Samples .....................................................................................26 RNA and DNA Preparation..................................................................................27 SNP Selection.......................................................................................................27 Genotyping ...........................................................................................................28 Total Expression Quantification...........................................................................29 Ancestry Analysis.................................................................................................29 Total Expression Regression Models ...................................................................32
Chapter 3: Identification of eQTLs by Allele-Specific Expression Analysis .......40 Background .............................................................................................................41 Results......................................................................................................................42 Discussion ................................................................................................................44 Methods ...................................................................................................................46
Stanford Kidney Samples .....................................................................................46 RNA and DNA Preparation..................................................................................46 SNP Selection.......................................................................................................47 Genotyping ...........................................................................................................47 Allele-Specific Expression Quantification ...........................................................48
ix
Chapter 4: Genetic Association with Kidney Aging...............................................61 Background .............................................................................................................62 Results......................................................................................................................63 Discussion ................................................................................................................65 Methods ...................................................................................................................67
BLSA Samples .....................................................................................................67 InCHIANTI Samples............................................................................................68 Glomerular Filtration Rate Regression Models....................................................69 Testing for Evidence of SNP Association with GFR in Both Datasets................71 Permutation Analysis............................................................................................71
Chapter 5: Conclusions.............................................................................................77
Summary and Discussion of Findings...................................................................78 Future Directions for Human Aging Genomics...................................................82
References ...................................................................................................................87
x
List of Tables
Chapter 2
Table 2.1: SNPs that associate with total gene expression level ..........................34
Table 2.2: Common total expression associations across studies ........................36
Table 2.3: The probability of the genotype data at K=1-7 ...................................37
Table 2.4: Mean variance of the percent ancestry in each cluster........................39
Chapter 3
Table 3.1: eQTLs identified by allele-specific expression analysis .....................51
Table 3.2: Common allele-specific expression across studies .............................60 Chapter 4
Table 4.1: Characteristics of kidney aging study samples ...................................73
Table 4.2: Top SNPs that show association with kidney aging ...........................74
xi
List of Figures
Chapter 1
Figure 1.1: The genomic convergence approach..................................................16
Figure 1.2: A transcriptional profile of kidney aging ..........................................17
Figure 1.3: Chronicity index examples ................................................................18
Figure 1.4: Common signature for aging .............................................................19 Chapter 2
Figure 2.1: Total expression analysis ...................................................................35
Figure 2.2: Estimated genetic ancestry.................................................................38 Chapter 3
Figure 3.1: Assay for allele-specific expression...................................................50
Figure 3.2: Distribution of allele-specific expression ..........................................55
Figure 3.3: Distribution of mean allelic fold changes ..........................................56
Figure 3.4: Allele-specific expression analysis ....................................................57
Figure 3.5: Allele-specific expression eQTL characteristics ...............................58
Figure 3.6: Comparison of eQTL methods at one locus ......................................59 Chapter 4
Figure 4.1: A SNP in MMP20 associates with a kidney aging phenotype ..........75
Figure 4.2: Linkage disequilibrium pattern of MMP20........................................76
1
Chapter 1: Introduction
Portions of this chapter will be submitted for publication in
Philosophical Transactions of the Royal Society B: Biological Sciences (2010) with the following authors:
Heather E. Wheeler1 and Stuart K. Kim1,2
1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Department of Developmental Biology, Stanford University Medical Center,
Stanford, CA, USA
2
Genetics of Human Longevity
Aging trajectories vary among individuals. Both the age at which
physiological function begins to decline and the rate of such decline varies among
individuals. The heritability of human longevity ranges from 0.23-0.33, but little is
known about specific genes that affect the rate of aging or human lifespan (Herskind
et al., 1996; McGue et al., 1993; Mitchell et al., 2001). Because most organisms do
not escape predation and infection to reach old age in the wild, aging itself is not under
strong natural selection (Kirkwood, 1997). Instead, aging is probably an unregulated
side effect caused by the failure of natural selection to maintain function at the later
ages that few individuals reach in the wild (Partridge and Gems, 2002). If mutations
arise that cause deleterious effects late in life, whether they are neutral or beneficial
early in life, there is little or no selection to eliminate them from the population
(Hamilton, 1966; Kirkwood, 1997; Partridge, 2010; Williams, 1957). Because such
mutations will accumulate over time, aging is likely a highly polygenic trait and the
mechanisms involved may not be conserved among species.
Despite these evolutionary predictions, studies in model organisms have
revealed that mutations in single genes can extend lifespan. For example, mutations in
insulin or insulin-like signaling pathway genes have been shown to extend lifespan in
Caenorhabditis elegans (Kenyon et al., 1993), Drosophila melanogaster (Clancy et
al., 2001; Tatar et al., 2001) and mice (Bluher et al., 2003; Holzenberger et al., 2003).
Also, overexpression of the SIR2 deacetylase extends lifespan in Saccharomyces
cerevisiae (Kaeberlein et al., 1999), C. elegans (Tissenbaum and Guarente, 2001), and
3
Drosophila (Rogina and Helfand, 2004). These examples demonstrate that
evolutionary conservation is present in some molecular pathways of aging. Therefore,
aging genes found across model organisms are good candidates to test for association
with lifespan in humans with the caveat that any pathways unique to human aging will
be missed.
The average lifespan in the US is 77 years and only one in 10,000 individuals
survive to age 100 (Atzmon et al., 2006). Thus, enrichment in the frequency of certain
alleles in centenarians probably reflects a selection effect that increases the likelihood
of survival. The genetic influence of achieving extreme old age may be even greater
than the heritability of average lifespan. Siblings of centenarians have an 8- to 17-fold
greater relative risk of surviving to age 100 (Perls et al., 2002). Most of the work
comparing the genotypes of long-lived individuals to average-aged individuals has
taken place in candidate genes. For example, the ε4 allele of apolipoprotein E
(APOE), which is well known to increase risk of Alzheimer’s disease and
cardiovascular disease, is found in significantly lower proportions of nonagenarians
and centenarians (Corder et al., 1993; Kervinen et al., 1994; Schachter et al., 1994).
Consequently, the ε2 allele is enriched in long-lived individuals and may offer a
protective effect (Kervinen et al., 1994; Schachter et al., 1994). A study of 35
additional genes related to cardiovascular disease found that an allele in apolipoprotein
C3 (APOC3) is enriched in centenarians and their offspring compared to average-aged
controls in an Ashkenazi Jewish population (Atzmon et al., 2006).
In addition to testing genes known to be associated with age-related diseases
for association with longevity, genes known to promote longevity in model organisms
4
have also been examined in human populations. Mutations in insulin or insulin-like
signaling pathway genes have been shown to extend lifespan from C. elegans (Kenyon
et al., 1993) to mice (Bluher et al., 2003; Holzenberger et al., 2003). An
overrepresentation of rare functionally significant insulin-like growth factor I receptor
(IGF1R) mutations have been observed in centenarians (Suh et al., 2008). More
convincingly, the forkhead box O3A (FOXO3A) transcription factor, which is
regulated by the insulin/IGF1 signaling pathway, contains alleles associated with
longevity in both Asian and European populations (Flachsbart et al., 2009; Li et al.,
2009; Pawlikowska et al., 2009; Willcox et al., 2008). In a male population of
Japanese descent, the odds ratio for homozygous minor vs. homozygous major alleles
for SNP rs2802292 in FOXO3A between the long-lived individuals (≥95 years) and
controls was 2.75 (Willcox et al., 2008). In two replication studies including both
sexes, the odds ratios of the corresponding alleles were 1.26 for a German population
and 1.36 for a Chinese population (Flachsbart et al., 2009; Li et al., 2009). These
enriched alleles may promote better health and contribute toward extended lifespan.
Other aging genes found to affect model organism longevity have been
examined in humans. SIRT3 is a human homolog of SIR2, the deacetylase that when
overexpressed, extends lifespan in yeast, worms and files (Kaeberlein et al., 1999;
Rogina and Helfand, 2004; Tissenbaum and Guarente, 2001). An intronic VNTR
allele in SIRT3 was significantly depleted in a population of male Italian
nonagenarians and centenarians (Bellizzi et al., 2005). Mutant mice for the klotho
gene exhibit a syndrome resembling human aging, including atherosclerosis,
osteoporosis, emphysema, and infertility (Kuro-o et al., 1997). In humans, a
5
heterozygous advantage for longevity was observed at the KLOTHO locus in
Ashkenazi Jews (Arking et al., 2005). Unfortunately, attempts to replicate the SIRT3,
KLOTHO and APOC3 findings in additional long-lived populations have been
unsuccessful (Lescai et al., 2009; Novelli et al., 2008). The only two genes associated
with human longevity that have been replicated in multiple populations are APOE and
FOXO3A (Corder et al., 1993; Flachsbart et al., 2009; Kervinen et al., 1994; Li et al.,
2009; Pawlikowska et al., 2009; Schachter et al., 1994; Willcox et al., 2008). The
effect sizes of these two genes are small (odds ratios ranging from 1.26-1.45 in
replicate studies) and thus much of the heritability of longevity remains to be
explained (Flachsbart et al., 2009; Li et al., 2009; Nebel et al., 2005).
A portion of the mechanisms that contribute to longevity in humans over an
~80 year lifespan may be very different from those that affect longevity in mice that
have a 2-3 year lifespan. Unlike most other biological processes, the genetic factors
that influence aging may not be evolutionarily conserved because wild animals usually
die from predation and infection, not aging (Kirkwood and Austad, 2000). Since
animals do not usually live long enough to grow old in the wild, mutations that cause
damage in old age would not be selected against and would thus accumulate over time
(Harman, 1956). Since processes that occur late in life are under little or no natural
selection, some mutations that affect physiology late in life may be species-specific
and there is little reason to expect that all mutations that cause aging should be
conserved from rodents to humans. Thus, a troubling aspect about aging is that animal
models for aging may have limited relevance to human aging. Therefore, an unbiased
approach to search for genes that specify human aging is also necessary.
6
A genome-wide linkage scan of human longevity using long-lived siblings
identified a locus on chromosome 4 (Puca et al., 2001). Fine mapping of this 12 Mb
locus revealed an association between microsomal transfer protein (MTP) and human
lifespan (Geesaman et al., 2003). However, this finding could not be replicated in
additional populations, which highlighted the population structure problems that can
arise when the case-control design is used as a means to map longevity genes in
humans (Nebel et al., 2005). Genome-wide association studies usually test hundreds of
thousands to millions of single nucleotide polymorphisms (SNPs) across the genome
for association with a particular trait (Kottgen et al., 2009; Weedon et al., 2008;
WTCCC, 2007). Currently, no genome-wide association studies comparing
centenarians or other long-lived individuals to average-aged individuals have been
published. This is likely because large enough datasets of centenarians and well-
matched controls have not been collected to detect associations that are probably
either of small effect size or rare alleles. Another important concern is that while
case-control studies of centenarians may find global contributors to aging, they may
miss specific contributors to aging of a particular organ or tissue.
Genetics of Kidney Aging
We chose to identify genes that associate with a focused phenotype of aging
rather than the nonspecific phenotype of living to age 100. Important human aging
molecular pathways may be more easily found by examining physiological aging in
particular organs or tissues. Because tissues age at different rates and because the
presence of disease varies immensely among individuals, humans become increasingly
7
different from each other with age. Thus, chronological age fails to provide an
accurate indicator of the aging process. More important than simply reaching age 100
is the health of the individual. Physiological age serves as an indicator of an
individual’s general health status in a particular organ or tissue and may also serve as
an indicator of remaining healthy lifespan (Borkan and Norris, 1980). Measuring how
a tissue changes with chronological age in a population can identify biomarkers that
can be used as an index of physiological age (Karasik et al., 2005). The biomarker can
then be used to determine if an individual is physiologically younger or older than his
or her chronological age. Determining genes and pathways that associate with the
measures of physiological age can reveal molecular processes important for the aging
of particular tissues. Studies in different tissues can be compared to find specific and
common regulators of aging. Only common regulators are likely to be revealed by
centenarian studies.
Specifically, we examined aging in the kidney, an organ that shows an
objectively quantifiable decline in function with age. Overall, the kidney gets smaller,
particularly in the cortex, and kidney function begins to decline after age 40-50
(Gourtsoyiannis et al., 1990; Lindeman and Goldman, 1986). Old kidneys show an
involution and thinning of the renal cortical cells, increased renal vascular resistance,
reduced renal plasma flow and increased filtration fraction (Fliser et al., 1993;
Lindeman and Goldman, 1986). Furthermore, there is relatively little cell turnover
compared to other organs, such as the bone marrow that continuously generates blood
cells, so that kidney aging reflects post-mitotic tissue changes rather than changes in
cell proliferation capacity (Lindeman et al., 1985; Silva, 2005a, b). These age-related
8
changes in the kidney can be assessed and quantified with relative ease, which makes
the study of kidney aging particularly tractable for quantitative genetic analysis.
The four main compartments of the kidney (arteries, tubules, interstitium and
glomeruli) each show changes in morphology and function with age. Changes that
occur in one part of the kidney affect other parts of the kidney in a complex and
dynamic fashion. The renal arteries become rigid with age due to fibrous thickening of
the arterial interior. Aging causes structural changes in the tubules and interstitium.
The tubules begin to atrophy with age, and decrease in length and number. Aging
results in an increase in interstitial volume and interstitial fibrosis. These changes in
tubules and interstitium decrease the ability of the kidney to conserve or excrete NaCl
(Epstein and Hollenberg, 1976; Schmidt et al., 2001), excrete ammonium acid load
(Adler et al., 1968) and maintain other electrolyte balances (Faubert, 1998).
The glomeruli are ball-shaped structures in the kidney composed of capillary
blood vessels actively involved in the filtration of the blood to form urine. An
increasing fraction of glomeruli show global sclerosis with age (Kaplan et al., 1975;
Kasiske, 1987; Kincaid-Smith, 1991; Li et al., 2002; Marcantoni et al., 2002;
Neugarten et al., 2002). The remaining glomeruli show compensatory enlargement of
their capillary tufts (Goyal, 1982; Newbold et al., 1992). The rate at which blood is
filtered through all of the glomeruli, and thus the measure of the overall renal function,
is the glomerular filtration rate (GFR). The major aging phenotype in the kidney is a
25% decline in GFR starting at age 40 (Hoang et al., 2003; Lindeman et al., 1985;
Lindeman et al., 1984; Rowe et al., 1976a; Rowe et al., 1976b).
9
Individuals show variable rates of kidney aging. In one longitudinal study, one
third of individuals showed no decrease in GFR measured over a 20 year period,
whereas the remainder of the population showed a distinct decline (Lindeman et al.,
1985). For those individuals who showed a significant decline in GFR, the slope of
the decrease varied widely (Lindeman et al., 1985). The heritability of GFR is
estimated to be 0.40-0.46 (Fox et al., 2004; Hunt et al., 2004). One possible method to
search for genes that associate with kidney aging is a genome-wide association study
of GFR. This approach is unbiased, but when hundreds of thousands of SNPs are
tested, the multiple hypothesis testing penalty is high. We chose a more focused
approach that combines multiple types of genomic information, including gene
expression data, in order to limit our GFR association test to SNPs more likely to be
functional. This allowed us to perform our analysis of kidney aging using a smaller
sample size than is required for a genome-wide association study. Our approach is
called genomic convergence.
Genomic Convergence
In genome-wide association studies the penalty for multiple hypothesis testing
is a large obstacle to overcome. A powerful alternative to genome-wide association
studies is genomic convergence, which selects candidate genes for a specific
phenotype based on genome-wide expression studies (Hauser et al., 2003; Le-
Niculescu et al., 2007; Liang et al., 2009; Mudge et al., 2008; Noureddine et al., 2005;
Oliveira et al., 2005). Differential expression between cases and controls may indicate
that the gene is functionally involved in disease pathogenesis. Gene expression
10
microarrays can be used to identify expression increases or decreases in affected
individuals compared to controls, and then SNPs within the genes that change
expression can be used as candidates in genetic association studies. This approach
scans the entire genome for expression changes associated with a disease in order to
prioritize genes with a greater chance of contributing to the disease phenotype.
Genomic convergence was first used to identify genes associated with Parkinson’s
disease, schizophrenia, and Alzheimer’s disease (Hauser et al., 2003; Le-Niculescu et
al., 2007; Liang et al., 2009; Mudge et al., 2008; Noureddine et al., 2005; Oliveira et
al., 2005).
We have extended the genomic convergence approach to find genes associated
with kidney aging by adding expression quantitative trait loci (eQTL) mapping after
the initial genome-wide transcriptional analysis. The eQTL analyses tested SNPs for
association with gene expression level. If a gene is functionally involved in kidney
aging and if DNA differences in the gene cause variation in expression among
individuals, then there may be an association between the specific allele carried by an
individual and that individual’s physiological aging trajectory. Finally, we tested the
set of eQTLs for association with kidney aging in two studies of normal aging, the
Baltimore Longitudinal Study of Aging and the InCHIANTI study. A schematic of
our genomic convergence method is shown in Figure 1.1.
Genome-wide Transcriptional Profile of Kidney Aging
Our genomic convergence study to find genes associated with kidney aging
began with genome-wide transcriptional profiling of normal kidney tissue from 74
11
individuals, aged 27-92 years (Rodwell et al., 2004). This study stands out because it
involves one of the largest numbers of samples of any study of human aging
performed to date, and thus has very high resolution and sensitivity. Gene expression
levels were measured in both kidney cortex and medulla samples using Affymetrix
microarrays. A linear regression algorithm was used to determine whether gene
expression increases or decreases with age (i.e. has a positive or negative slope;
p<0.001) and 447 age-regulated genes in the kidney were identified (Rodwell et al.,
2004; Figure 1.2).
In addition to marking chronological age, these 447 genes were also shown to
mark physiological age. Some people age slowly and retain kidney function into their
70s whereas others age rapidly and show a marked decline in renal function. To
measure the relative physiology of the kidney, a histological score called the
chronicity index was developed (Rodwell et al., 2004). Three scores were given to
each kidney section corresponding to the appearance of the glomeruli, the tubules, and
the arteries. Scores ranged from zero for normal appearance for youthful patients to
four for an advanced state of glomerular sclerosis, tubular atrophy/interstitial fibrosis,
or arterial intimal fibrosis (Figure 1.3). The glomerular, tubular, and arteriolar scores
were then added together to form the chronicity index ranging from zero (best) to 12
(worst).
For the 74 kidney samples, the physiological state of the organ was compared
to its respective gene expression profile (Figure 1.2). The gene expression profiles
were found to correlate well with chronicity index (Rodwell et al., 2004). Patients
with poor organ function for their age also had expression profiles that looked like
12
those from people that are much older. For example, the chronicity index of
individual number 81 was high for her age of 78 years old, and the kidney expression
profile was similar to those from patients that were 10 to 20 years older. Conversely,
patients with good organ function for their age tended to have expression profiles
normally associated with younger people. For instance, individual number 95 was 81
years old but had an expression profile similar to other patients that are 30-40 years
younger and had a lower chronicity index. Although the age-regulated genes were
selected solely on the basis of their change with chronological age, these results
indicate that the expression profiles also correlate with physiological aging. Thus,
some of the age-regulated genes may be functioning in the aging process, rather than
simply being markers of aging. The 447 age-regulated genes were used as candidates
in the next step of our genomic convergence approach, eQTL mapping.
In addition to the 447 age-regulated kidney genes, we included an additional
183 genes in our candidate set for eQTL mapping that stem from a study of common
gene regulation of aging across human tissues (Zahn et al., 2006). Expression profiles
that are common to aging in all tissues would reveal core mechanisms that underlie
cellular aging. Gene expression data from aging kidney (Rodwell et al., 2004), aging
muscle (Zahn et al., 2006) and aging brain (Lu et al., 2004) were compared by
analyzing the behavior of entire genetic pathways using an approach called Gene Set
Enrichment Analysis (Subramanian et al., 2005). With this approach, age-regulation
for every gene in a pathway (defined by the Gene Ontology Consortium) is combined
to generate an overall effect on regulation of the entire pathway (Ashburner et al.,
2000). This approach is more sensitive than examining genes one-at-a-time as
13
significant results can be obtained from the accumulation of small changes in many
genes in a pathway. This systems biology method is especially powerful in studies of
aging because of the polygenic nature of the phenotype. Furthermore, the specific
biological processes associated with each genetic pathway provide insights into
mechanisms of aging. From a total of 624 sets of genes defined by the Gene Ontology
Consortium, extracellular matrix genes and the cytosolic ribosomal genes were found
to increase expression with age in all three human tissues, whereas chloride transport
genes and electron transport genes were found to significantly decrease expression
with age in those same tissues (Zahn et al., 2006; Figure 1.4). These commonly age-
regulated pathways include 152 extracellular matrix genes, 85 ribosomal genes, 35
chloride transport genes and 95 electron transport chain genes (Zahn et al., 2006). We
combined the age-regulated genes with the age-regulated pathways and obtained a set
of 630 genes that were used as candidates in our eQTL analyses.
Significance and Dissertation Content
In this dissertation, I describe our genomic convergence approach to find genes
associated with kidney aging. This is the first time such a sequence of genomic
approaches has been used to study aging in humans. We began with a genome-wide
transcriptional profile of aging in the kidney. Graham Rodwell and Jacob Zahn led
this profiling study before I arrived in the Stuart Kim laboratory. Importantly, not
only did the age-regulated genes mark chronological age, but their gene expression
profiles also correlated with kidney physiology, making the genes excellent candidates
for functioning in the aging process. This led us to our genomic convergence
14
approach, developed by Stuart Kim and myself. Rather than simply testing
polymorphisms in the age-regulated genes for association with kidney aging, we first
performed eQTL analyses on these candidate genes. We reasoned that if a gene is
functionally involved in kidney aging and if DNA differences in the gene cause
variation in expression among individuals, then there may be an association between
the specific allele carried by an individual and that individual’s physiological aging
trajectory. Portions of this dissertation have been published (Wheeler et al., 2009).
In Chapter 2, I describe our first method of eQTL analysis, the total expression
method. In this method, SNP genotypes are tested for association with total gene
expression level, taken from Affymetrix microarrays. In addition to the microarray
data available from Rodwell et al. (2004), I collected 26 new kidney tissue samples
and prepared the RNA for microarray analysis. I also extracted DNA from 96 kidney
samples and performed the genotyping using Illumina BeadChips. I performed all of
the statistical analyses. The total expression analysis led to the identification of 12
eQTLs in the kidney.
The subject of Chapter 3 is our second method of eQTL analysis, the allele-
specific expression method. Here, we examined heterozygotes for SNPs within a
transcript for differential expression of each allele. The cDNAs of heterozygotes were
examined for allelic transcript levels that differ from each other, using genomic DNA
allelic ratios as controls of 1:1 hybridization intensity. I made cDNA from each of the
96 kidney samples and hybridized both cDNA and genomic DNA to separate Illumina
genotyping BeadChips. I also performed all of the statistical analyses. Using the
allele-specific expression method, we found 93 kidney eQTLs. In this chapter, I also
15
discuss general trends observed in the data and compare the allele-specific expression
results to the total expression results.
Chapter 4 presents our genetic association study for kidney aging. We tested
the set of 101 eQTLs for association with kidney aging (GFR) in two studies of
normal aging, the Baltimore Longitudinal Study of Aging and the InCHIANTI study.
The SNP genotype and GFR data from these two studies was obtained from our
collaborators at the National Institute on Aging. I performed the statistical analyses
with some assistance from E. Jeffrey Metter and Toshiko Tanaka. Using this
sequential approach of genomic convergence, we were able to find SNPs in the matrix
metalloproteinase gene MMP20 that are significantly associated with kidney aging.
In Chapter 5, I present conclusions and future directions. The results of the
sequential genomic analyses presented in the previous four chapters may provide the
first evidence for a gene association with kidney aging in humans. I explain how our
method of genomic convergence can be applied to any phenotype of interest to
increase the power to find genetic associations. I also describe possible future
methods to find more genes that specify and control human aging.
16
Figure 1.1 The genomic convergence approach. Beginning with a genome-wide transcriptional profile, genes are filtered in each step for those the most likely to be functional. In the final step, expression quantitative trait loci (eQTLs) are tested for association with glomerular filtration rate (GFR), a phenotype of kidney aging.
17
Figure 1.2 A transcriptional profile of kidney aging. 447 genes in the kidney were significantly age-regulated (p<0.001). Rows correspond to age-regulated genes, ordered from most highly induced to most highly repressed. Columns correspond to individual patients, ordered from youngest to oldest. The age of certain patients is shown for reference. Left panel refers to data from cortex samples, and right panel depicts data from medulla samples. The first row shows the chronicity index (ChI; morphological appearance and physiological state of the kidney), from blue (healthiest) to yellow (least healthy) as indicated in the scale bar. Scale shows log2 of the expression level (Exp). Figure adapted from (Rodwell et al., 2004).
18
Figure 1.3 Chronicity index examples. Histology from patient aged 29 yrs is shown on the left, demonstrating a normal glomerulus (G), tubules and interstitial space (T), and artery (A), respectively (chronicity index of zero). Histology from patient aged 84 yrs is shown on the right, demonstrating glomerulosclerosis (g), tubular atrophy and interstitial fibrosis (t), and arterial intimal hyalinosis (a), respectively (chronicity index of ten). Hematoxylin and eosin staining of formalin-fixed, paraffin-embedded sections. Figure from (Rodwell et al., 2004).
19
Figure 1.4 Common signature for aging. Shown are aging signatures for four genetic pathways. Rows are human tissues. Columns correspond to individual genes in each gene set. Scale represents the slope of the change in log2 expression level with age. Gray indicates genes were not present in the dataset. Figure adapted from (Zahn et al., 2006).
20
Chapter 2: Identification of eQTLs by Total Expression
Analysis
Portions of this chapter were previously published in PLoS Genetics (2009), 5(10): e1000685 with the following authors:
Heather E. Wheeler1, E. Jeffrey Metter2,3, Toshiko Tanaka2,3, Devin Absher4, John Higgins5, Jacob M. Zahn6, Julie Wilhelmy6, Ronald W. Davis6, Andrew Singleton7,
Richard M. Myers4, Luigi Ferrucci2,3, Stuart K. Kim1,8
1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Longitudinal Studies Section, Clinical Research Branch, National Institute on Aging,
Baltimore, MD, USA, 3Medstar Research Institute, Baltimore, MD, USA, 4HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA, 5Department of
Pathology, Stanford University Medical Center, Stanford, CA, USA, 6Stanford Genome Technology Center, Palo Alto, CA, USA, 7Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA, 8Department of Developmental
Biology, Stanford University Medical Center, Stanford, CA, USA
21
Background
As discussed in the previous chapter, the expression profiles of the age-
regulated genes correlated with kidney physiology, making the genes excellent
candidates for functioning in the aging process. For example, a gene that decreases
expression with age may contribute to poor renal function because it is expressed at
levels below a physiological threshold in the elderly. If age-regulated genes are
important for kidney function, then variation in gene expression may correlate with
variation in kidney function. As the second step in our genomic convergence
approach (Figure 1.1), we performed expression quantitative trait (eQTL) mapping of
the age-regulated genes. We focused on finding expression-associated SNPs (eSNPs)
using two methods. The first method, known as total expression analysis, is the
subject of this chapter.
We searched for eQTLs by pooling individuals that have the same genotype
for a particular SNP, and then determining whether the different SNP genotypes are
associated with expression of the corresponding gene. The candidate genes were
chosen from two studies of gene expression changes that occur with age. We obtained
a set of 447 age-regulated genes from a genome-wide transcriptional profile of aging
in the human kidney (Rodwell et al., 2004). In addition, a previous gene set
enrichment analysis identified four genetic pathways that were coordinately age-
regulated in each of three human tissues (kidney, muscle and brain). These pathways
include 152 extracellular matrix genes, 85 ribosomal genes, 35 chloride transport
genes and 95 electron transport chain genes (Zahn et al., 2006). We combined the
22
age-regulated genes with the age-regulated pathways and obtained a set of 630 genes
that change expression with age.
We selected 1041 SNPs in the promoter regions and 386 SNPs in the coding
and untranslated regions of the 630 age-regulated genes. We first searched for
suitable coding and untranslated region SNPs because they can also be tested for
allele-specific expression (Chapter 3). For genes without suitable mRNA SNPs, we
chose SNPs in their promoter regions (defined as 5kb upstream and downstream of the
transcription start site). Chosen SNPs had a minor allele frequency greater than 0.05
in the HapMap CEU (Utah residents with northern and western European ancestry)
population (Altshuler et al., 2005). A list of SNPs tested for association with
expression is available at http://www.plosgenetics.org/doi/pgen.1000685 (Table S1).
We genotyped these SNPs, corrected for population structure, and then tested the
alleles for association with gene expression in 96 kidney samples.
Results
Ancestry Analysis
We first used a custom Illumina GoldenGate assay to genotype these 1427
SNPs using DNA from 96 frozen kidney tissue samples and 197 formalin-fixed,
paraffin-embedded kidney tissue samples. These normal kidney tissue samples were
obtained from Stanford University Medical Center with informed consent either from
biopsies of kidneys from transplantation donors or from nephrectomy patients with
localized pathology. Because our kidney tissue samples were from individuals living
in the diverse San Francisco Bay Area, we chose to control for population structure by
23
including a covariate for ancestry in our regression analysis. Most of the individuals
in our total expression association study self reported their ancestry (84/96 frozen
kidney samples). Genetic clustering analysis has been shown to highly correlate with
self-identified ancestry (Tang et al., 2005). To determine the ancestry of the 12
unknown individuals, we used the clustering program STRUCTURE (Pritchard et al.,
2000). We included DNA from the 96 individuals who had frozen kidney tissue as
well as the 197 individuals with formalin-fixed, paraffin-embedded tissue. We used
the genotypes of 839 unlinked SNPs from our 293 samples and from the CEU, YRI,
and JPT+CHB HapMap populations in our analysis (Altshuler et al., 2005). The YRI
population contains individuals from the Yoruba people of Ibadan, Nigeria. The
JPT+CHB population contains individuals from Tokyo, Japan and Beijing, China. We
determined our Stanford samples cluster with the greatest probability into three
populations, each clustering with one of the HapMap populations (See Methods for a
detailed analysis). Because most of the Stanford samples were predominantly of
Caucasian genetic ancestry and because it is simplest to use a Boolean covariate value
in regression analysis when chronological significance of the state (genetic ancestry in
this case) is unknown, we chose to divide the individuals into two groups. In the first
group we included individuals with an average percent CEU ancestry >75%. This
group included 211 individuals. The second group contained the other 82 individuals.
Total Expression QTL Analysis
RNA extraction of sufficient quantity and quality was only possible for our
frozen tissue samples, not the formalin-fixed, paraffin-embedded tissues. Total
expression data for 96 kidney tissue samples was obtained from whole-genome
24
microarrays of 70 kidneys from Rodwell et al. (2004) and new expression data from
26 kidney samples. The kidney samples were dissected into cortex (94 samples) and
medulla (59 samples). Kidney samples were from normal tissue from patients aged 29
to 92 years. Expression levels of each gene in the genome were determined using
Affymetrix HG-U133A and HG-U133B microarrays. DNA was also extracted from
these 96 samples and 1427 SNPs from our candidate genes were genotyped on custom
Illumina GoldenGate arrays.
We compared the genotypes from our chosen SNPs to their corresponding
gene expression levels using linear regression. Our model corrected for age, sex,
tissue type (cortex or medulla), and ancestry group. The CEU>75% ancestry group
included 72 individuals. The second group contained the other 24 individuals. We
found 16 SNPs in 12 genes associated with total expression level (Linear Regression,
p<0.001, Table 2.1). The percent of variance in gene expression explained by these
SNPs ranged from 6-43% (Table 2.1). Four of the genes have two significant SNPs;
in two cases, the SNPs are in different linkage disequilibrium blocks indicating that
the eSNPs are independent, and in two cases, the SNPs are linked to each other (r2>0.8
HapMap CEU population) and thus represent only one significant association
(Altshuler et al., 2005).
One example of a promoter region SNP that showed strong association with
total expression is rs705704, which is 274 base pairs upstream of the transcription start
site of ribosomal protein S26 (RPS26, p = 1.2 x 10-20, Figure 2.1A). Individuals with
the AA genotype have the highest expression, heterozygotes have medium expression,
and GG homozygotes have the lowest expression of RPS26. RPS26 has been
25
identified as an eQTL in other studies (Figure 2.1B; Cheung et al., 2005; Dixon et al.,
2007; Myers et al., 2007; Schadt et al., 2008; Webster et al., 2009). The 12 kidney
eQTLs found by our total expression analysis were used as candidate genes in our
kidney aging association study, which is step three in our genomic convergence
approach (Figure 1.1) and the subject of chapter 4. A second type of eQTL analysis,
allele-specific expression analysis, is the subject of chapter 3.
Discussion
We tested 1427 SNPs in 630 age-regulated genes for association with gene
expression in kidney tissue from 96 individuals. Our goal was to find genes that
associate with kidney aging and as an intermediate step we performed this eQTL
analysis in hopes of converging on genes most likely to be functional. Genes that
show allele associations with expression level may also show allele associations with a
biological function, such as glomerular filtration rate, our chosen phenotype of kidney
aging. We found 16 SNPs in 12 genes that associate with total expression level.
Of these 12 genes, five of them have been shown to be cis-acting eQTLs in
other studies (Table 2.2; Myers et al., 2007; Schadt et al., 2008; Stranger et al., 2007;
Veyrieras et al., 2008). Three of our kidney eQTLs (RPS26, COX7A2L, RPS18) were
also eQTLs in the liver (Schadt et al., 2008). Two kidney eQTLs (RPL12, RPS9) were
also found in lymphoblastoid cell lines (Stranger et al., 2007; Veyrieras et al., 2008)
and one (RPS26) in brain cortex (Myers et al., 2007). In the liver and brain studies,
8% and 21% of tested genes were found to be cis-acting eQTLs, respectively (Myers
et al., 2007; Schadt et al., 2008). Here, just 2% of the genes we tested were found to
26
be cis-acting in the kidney. However, both the liver and brain studies had more
samples, 427 in liver and 193 in brain (Myers et al., 2007; Schadt et al., 2008).
Therefore, these studies had more power to detect expression associations.
Although our power was limited to detect eQTLs using this total expression
analysis, we did find 12 and each allele was able to explain a large proportion of the
variance in gene expression (0.06-0.43, Table 2.1). We were able to compare the
results of this total expression analysis to that of the allele-specific expression analysis
in the coding and untranslated region SNPs (Chapter 3). Also, for the 354 genes that
did not have assayable mRNA SNPs, we were able to test for eQTLs using this
method. SNPs in the 12 kidney eQTLs were tested for association with kidney aging
as the final step in our genomic convergence approach (Chapter 4).
Methods
Ethics Statement
Ethical approval for the study was obtained from the Stanford University
Institutional Review Board (IRB). All subjects provided written informed consent for
the collection of samples and subsequent analysis. This study was conducted
according to the principles expressed in the Declaration of Helsinki.
Stanford Kidney Samples
Normal kidney tissue was obtained from Stanford University Medical Center
with informed consent either from biopsies of kidneys from transplantation donors or
from nephrectomy patients with localized pathology. Kidney tissue from nephrectomy
patients was harvested meticulously with the intention of gathering normal tissue
27
uninvolved by the tumor. Samples that showed evidence of pathological involvement
or in which there was only tissue in close proximity to the tumor were not used.
Kidney sections were either immediately frozen on dry ice and stored at −80°C until
use or formalin-fixed and paraffin-embedded.
RNA and DNA Preparation
Frozen kidney samples were weighed (25-50 mg), cut into small pieces on dry
ice, and then placed in 1 ml of TRIzol Reagent (Invitrogen, Carlsbad, California,
United States) for RNA extraction or 600 µl of Buffer RLT Plus (Qiagen, Valencia,
California, United States) for DNA extraction. The tissue was homogenized using a
PowerGen700 homogenizer (Fisher Scientific, Pittsburgh, Pennsylvania, United
States). Total RNA was isolated according to the TRIzol Reagent protocol and
genomic DNA was isolated according to the Qiagen AllPrep DNA/RNA Mini Kit
protocol.
Normal kidney tissue (25-35 mg) from formalin-fixed, paraffin-embedded
blocks was cut out with a scalpel and crushed in liquid nitrogen with a mortar and
pestle. The samples were then treated with 1 ml xylene to remove the paraffin. DNA
was extracted from the tissue according to the RecoverAll Total Nucleic Acid
Isolation Kit for FFPE kit protocol (Ambion, Austin, Texas, United States).
SNP Selection
Candidate aging genes were chosen from previous transcriptional profiling
studies and include 447 age-regulated kidney genes (Rodwell et al., 2004) as well as
the genes in the four pathways that are commonly age-regulated in the kidney, muscle
28
and brain: extracellular matrix, ribosome, chloride transport and electron transport
chain (Zahn et al., 2006). The candidate kidney aging genes were first searched for
mRNA SNPs that could be used in an allele-specific expression assay. In addition to
being within the transcript on an autosome, the SNPs had to have a minor allele
frequency greater than 0.05 in the HapMap CEU population, an Illumina SNP score
greater than 0.4, and be greater than 30 bp from an exon boundary (NCBI Build 36.1)
to ensure the Illumina genotyping assay would work properly for both genomic DNA
and cDNA. For genes that had multiple assayable mRNA SNPs, those closest to the
5’ end of the gene were chosen, with a maximum of two SNPs per gene. These
criteria were met for 386 SNPs in 276 genes. For candidate aging genes that did not
have an appropriate mRNA SNP, promoter region (defined as 5kb upstream or
downstream of the transcription start site) SNPs meeting the same minor allele
frequency (>0.05) and SNP score (>0.4) criteria were chosen. One to four SNPs were
chosen per gene for analysis, totaling 1041 promoter SNPs in 354 candidate aging
genes.
Genotyping
The candidate aging SNPs were genotyped using a GoldenGate Custom Panel
from Illumina (San Diego, California, United States). Oligonucleotides specific for
each allele of each SNP were designed for use in a multiplex PCR. A standard
protocol designed by Illumina and implemented at the Stanford Human Genome
Center was used to determine the genotypes of the 96 individuals for whom we had
kidney tissue. Samples were hybridized to custom Sentrix Array Matrices and
scanned on the Illumina BeadStation 500GX. Allele calls were determined using the
29
Illumina BeadStudio clustering software. The genotyping was successful (>90% call
rate, HWE p > 0.001) at 1341/1427 of the SNP loci in 599/630 genes (95%). A list of
the 1341 SNPs is available at http://www.plosgenetics.org/doi/pgen.1000685 (Table
S1).
Total Expression Quantification
Most of the microarrays (68 cortex and 59 medulla samples) used in our total
expression association study were previously analyzed (Rodwell et al., 2004). The
same Affymetrix (Santa Clara, California, United States) HG-U133A and HG-U133B
high-density oligonucleotide arrays used in Rodwell et al. were used here to measure
total expression levels in 26 additional cortex samples. The samples were processed at
the Stanford Genome Technology Center using their standard protocol (Rodwell et al.,
2004). Eight micrograms of total RNA was used to synthesize cRNA for each sample,
and 15 µg of cRNA was hybridized to each microarray. Using the dChip program
(Zhong et al., 2003), microarray data (.cel files) were normalized according to the
stable invariant set, and gene expression values were calculated using a perfect match
model. All arrays passed the quality controls set by dChip. The raw microarray data
are available at the Stanford Microarray Database (http://smd.stanford.edu).
Ancestry Analysis
Because the samples come from the diverse San Francisco Bay Area
population, we needed to control for population structure. We chose to use the
program STRUCTURE to determine the genetic ancestry of the individuals in our
sample (Pritchard et al., 2000). We had self-reported ancestry for 84 of the 293
30
individuals in our sample and could compare this data to the results of STRUCTURE.
One key to successful use of the STRUCTURE program is that the markers used
cannot be closely linked. Some of our markers are partially linked as they lie within
the same gene region. We searched the HapMap CEU population and found no
pairwise linkage disequilibrium (LD) data for 839 of our SNPs and thus they are
completely unlinked and were chosen for the STRUCTURE analysis. We used the
CEU LD data because the blocks of LD are known to be larger than in the YRI
population and of the samples for which we have self-reported ancestry, ~80% are
Caucasian. We included genotype data at our 839 SNPs from three HapMap
populations (CEU, JPT+CHB, YRI) in our analysis to verify that our SNPs can
distinguish genetic ancestry (Altshuler et al., 2005). Because the CEU and YRI
populations contain family trios, we only included the parents in our analysis.
We performed 3 runs of STRUCTURE at each K from 1-2 and 10 runs at each
K from 3-7 using the admixture ancestry model. K is the number of populations the
program assumes. The admixture model allows individuals to have mixed ancestry,
and is thus flexible to deal with the complexity of the Bay Area population. A burnin
length of 10,000 and a run length of 10,000 were used for each run. The estimated
natural log probability of the data is shown in Table 2.3. The probabilities are similar
for K=3,4,6,7. In the documentation for the STRUCTURE software, the authors note
that P(K) is often very small for K less than the appropriate value (in our case, K=1
and 2), and then plateaus for larger K, as is observed in Table 2.3. In this situation,
where several values of K give similar estimates of Ln P(Data), it seems that the
smallest of these if often “correct” (Pritchard et al., 2007).
31
Figure 2.2 shows the clustering of genetic ancestry of one run at each value of
K from 3 to 7. At each value of K, we see the three HapMap populations cluster
perfectly, with the exception of a few individuals in the CEU population at K=6 and 7.
Most of the Stanford patients cluster with the CEU population at K=3, indicating they
are Caucasian. However, we do see some patients clustering with the JPT+CHB
population and a few with the YRI population, as well some admixed individuals. Of
the 84 individuals with self-reported ancestry information, 78 matched the genetic
information. One individual reported as African American, but in all runs at K=3, was
estimated to be greater than 99% Caucasian. This may be a data collection error.
Three individuals were estimated as admixed for Asian and Caucasian ancestry at
K=3, while two of them reported being Asian and one reported being Caucasian. The
two individuals who self-reported as Hispanic, show a mix of Asian and Caucasian
ancestry at K=3, which makes sense historically. At K=4, we see a fourth cluster
emerging from the Stanford patients, made up of individuals who were admixed for
Asian and Caucasian ancestry at K=3. This cluster (yellow on Figure 2.2) includes the
two self-reported Hispanic individuals, so this may be a Hispanic cluster. At K=5, we
see a subset of the Stanford Patients who previously clustered with the CEU
population breaking into a second Caucasian population (purple on Figure 2.2). A
third Caucasian population emerges at K=6 (white on Figure 2.2) and a fourth at K=7
(gray on Figure 2.2). The mean variance of the percent ancestry in each cluster of
these additional Caucasian populations increases as K increases (Table 2.4). The
clustering of these Caucasian individuals is inconsistent as K increases and therefore
the best clustering to use in additional analyses is that at K=3. Because most of the
32
Stanford samples were predominantly of Caucasian genetic ancestry and because it is
simplest to use a Boolean covariate value in regression analysis when chronological
significance of the state (genetic ancestry in this case) is unknown, we chose to divide
the individuals into two groups for our total expression QTL analysis. In the first
group we included individuals with an average percent CEU ancestry >75%. This
group included 211 individuals. The second group contained the other 82 individuals.
Total Expression Regression Models
We used a linear regression model to determine which SNP genotypes showed
a statistically significant association with gene total expression levels:
€
Yij = β0 j + β1 jgij + β2 jagei + β3 j ti + β4 janci + β5 j si + εij (2.1)
In equation 2.1, Yij is the base 2 logarithm of the expression level for the gene of SNP j
in kidney sample i, gij is the genotype (0,1,2 for AA, AB, BB) of individual i at SNP j,
agei is the age in years of the individual i, ti is 0 if sample i was from kidney cortex
and 1 if sample i was from kidney medulla, anci is 0 if the individual contributing
sample i has >75% CEU ancestry and 1 for other ancestry proportions, si is 0 for males
and 1 for females, and εij is a random error term. The coefficients βkj for k = 0-5 were
estimated by least squares from the data. Our primary interest was β1j values that
significantly differed from zero, indicating that SNP j associates with total expression
level. Because our microarrays were processed on two different scanners three years
apart, we analyzed the two sets of data separately. The first set comprised the 127
samples previously analyzed in Rodwell et al. and the second set comprised the 26
additional samples processed here. We combined the results from the two regression
33
analyses using Fisher’s combined probability test (Fisher, 1948). The β1j p-values
from each of the two analyses were combined into one test statistic (χ2) having a chi-
square distribution and four degrees of freedom using the formula:
€
χ 2 = −2 loge (pi)i=1
2
∑ (2.2)
Using Fisher’s method, we found 11 promoter SNPs in seven genes and five mRNA
SNPs in five genes that associated with total expression level (p < 0.001).
34
Table 2.1 SNPs that associate with total gene expression level.
Gene SNP β1jgij P-value
R2*
High expression allele
Low expression allele
RPS26 rs705704 1.2 x 10-20 0.430 A G POLR1D rs10492487 6.2 x 10-6 0.143 T A COX7A2L rs1997 8.9 x 10-6 0.133 T A ZNF6 rs1006629 1.2 x 10-5 0.101 C T RPL12 rs2247322 3.7 x 10-5 0.103 A T TXNDC5 rs8643 1.2 x 10-4 0.100 G A RPS9 rs2304524 1.3 x 10-4 0.083 G C RPS18 rs213204 2.5 x 10-4 0.091 C A CFB rs641153 2.6 x 10-4 0.094 T C ANTXR1 rs7584385 2.8 x 10-4 0.086 G A POLR1D rs7097 3.0 x 10-4 0.095 G A RPL12 rs1139400 3.7 x 10-4 0.078 G A COL15A1 rs1051105 7.0 x 10-4 0.083 A G ATP5F1 rs1264899 7.4 x 10-4 0.076 A G RPS18 rs213199 8.3 x 10-4 0.077 C T RPS9 rs3810229 9.5 x 10-4 0.063 A C
*Proportion of variance in gene expression explained by the SNP
35
Figure 2.1 Total expression analysis. Genotypic associations with total expression level. (A) Boxplot of RPS26 expression according to genotype at the promoter SNP rs705704 (p = 1.2 x 10-20). The boxes define the interquartile range and the thick line is the median. Open dots are possible outliers. (B) Haploview (Barrett et al., 2005) linkage disequilibrium (LD) plot of the RPS26 region. The SNP rs705704 is 274 bp upstream of the RPS26 transcription start site. Values in boxes correspond to the pairwise r2 LD values (darker boxes correspond to higher r2 values) for the HapMap CEU population. rs705704 (red) is partially linked to three SNPs (black) previously shown to associate with RPS26 expression levels (Cheung et al., 2005; Dixon et al., 2007; Myers et al., 2007; Schadt et al., 2008; Webster et al., 2009).
36
Table 2.2 Common total expression associations across studies. Gene SNP Study Tissue Same? * RPS26 rs705704 Myers, Schadt brain, liver snp COX7A2L rs2041354 Schadt liver gene RPL12 rs2247322 Veyrieras LCLs snp RPS9 rs17273267 Stranger LCLs gene RPS18 rs1810472 Schadt liver gene
*Those labeled "snp" showed association with expression at the same snp in our dataset and the respective study (p < 0.001). Those labeled "gene" showed association with expression at the same gene in our dataset and at the shown snp in the respective study (p < 0.001). LCLs = lymphoblastoid cell lines. Table data compiled from the eQTL Genome Browser (http://eqtl.uchicago.edu). Studies: Myers, A.J. et al. A survey of genetic human cortical gene expression. Nat Genet 39, 1494-9(2007). Schadt, E.E. et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6, e107 (2008). Stranger, B.E. et al. Population genomics of human gene expression. Nat Genet 39, 1217-24 (2007). Veyrieras, J.B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4, e1000214 (2008).
37
Table 2.3 The probability of the genotype data at K=1-7. K Mean Ln P(Data) Mean Var[Ln P(Data)]
3 runs 1 -427220 414
3 runs 2 -412674 1563
10 runs 3 -397937 2455
10 runs 4 -397893 3878
10 runs 5 -399198 7467
10 runs 6 -397796 5895
10 runs 7 -397551 6108
K = number of populations the program STRUCTURE assumes.
38
Figure 2.2 Estimated genetic ancestry. Each individual is represented by a thin vertical line, which is partitioned into K colored segments that represent the individual's estimated membership fractions in K clusters. Black lines separate individuals of different populations. Populations are labeled above the figure. Made using the program DISTRUCT (Rosenberg, 2004).
39
Table 2.4 Mean variance of the percent ancestry in each cluster (10 runs at each K).
Cluster: YRI JPT+CHB CEU
Asian/Cauc.
Admix. Cauc. 2 Cauc. 3 Cauc. 4
K = 3 5.0 x 10-6 1.3 x 10-5 1.5 x 10-5
K = 4 1.4 x 10-5 3.2 x 10-4 8.0 x 10-3 8.7 x 10-3
K = 5 3.0 x 10-5 3.7 x 10-6 4.4 x 10-2 6.2 x 10-4 1.9 x 10-1
K = 6 2.1 x 10-5 2.0 x 10-4 6.1 x 10-2 9.8 x 10-4 6.1 x 10-2 2.4 x 10-2
K = 7 1.1 x 10-5 4.3 x 10-5 9.4 x 10-3 1.3 x 10-2 2.7 x 10-2 4.5 x 10-2 2.2 x 10-2
K = number of populations the program STRUCTURE assumes.
40
Chapter 3: Identification of eQTLs by Allele-Specific
Expression Analysis
Portions of this chapter were previously published in PLoS Genetics (2009), 5(10): e1000685 with the following authors:
Heather E. Wheeler1, E. Jeffrey Metter2,3, Toshiko Tanaka2,3, Devin Absher4, John Higgins5, Jacob M. Zahn6, Julie Wilhelmy6, Ronald W. Davis6, Andrew Singleton7,
Richard M. Myers4, Luigi Ferrucci2,3, Stuart K. Kim1,8
1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Longitudinal Studies Section, Clinical Research Branch, National Institute on Aging,
Baltimore, MD, USA, 3Medstar Research Institute, Baltimore, MD, USA, 4HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA, 5Department of
Pathology, Stanford University Medical Center, Stanford, CA, USA, 6Stanford Genome Technology Center, Palo Alto, CA, USA, 7Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA, 8Department of Developmental
Biology, Stanford University Medical Center, Stanford, CA, USA
41
Background
If age-regulated genes are important for kidney function, then variation in gene
expression may correlate with variation in kidney function. As the second step in our
genomic convergence approach (Figure 1.1), we performed expression quantitative
trait (eQTL) mapping of the age-regulated genes. We focused on finding expression-
associated SNPs (eSNPs) using two methods. The results of the first method, total
expression analysis, were presented in the previous chapter. The second method,
allele-specific expression, is presented here.
This second method identified differential allelic expression within individuals
that are heterozygous for a specific SNP. In this method, the expression levels of each
allele are measured directly by assaying SNPs within the mRNA transcript. The
cDNAs of heterozygotes were examined for allelic transcript levels that differ from
each other. Genomic DNA allelic ratios were used as controls of 1:1 hybridization
intensity. Because differential expression is examined within heterozygotes, mRNA
levels are measured within the same genetic background and cellular environment.
We combined the 447 kidney age-regulated genes (Rodwell et al., 2004) with
the genes in the four commonly age-regulated pathways (Zahn et al., 2006) and
obtained a set of 630 genes that change expression with age. Of the 630 genes in this
candidate set, 276 of them had assayable SNPs within the coding or untranslated
regions. We genotyped both cDNA derived from kidney tissue and genomic DNA in
96 individuals and tested 386 SNPs in these 276 genes for allele-specific expression.
42
Results
Allele-specific expression analysis was used to test all of the age-regulated
genes that had SNPs in their mRNAs for differential allelic expression. We assayed
the relative expression levels of 386 mRNA SNPs in 276 age-regulated genes in 96
individuals. Most of the mRNA SNPs were in the 3’ untranslated regions of genes
(249), some were in coding regions (115), and a few were in the 5’ untranslated
regions (22).
Oligonucleotides specific for each allele of each SNP were designed for use in
the Illumina GoldenGate multiplex PCR assay. Kidney cortex mRNA was reverse
transcribed into cDNA prior to the start of the GoldenGate assay. In the assay, the
PCR products for each allele were labeled with a different fluorophore and the
intensities of each allele were compared to determine if one allele was expressed
higher than the other. The cDNA allelic intensities for each SNP were compared
within heterozygotes to test for differential allelic expression. Because the intensities
from each fluorophore (Cy3 and Cy5) can differ, the genomic DNA allelic intensities
of heterozygotes were used as a control to define a 1:1 allelic ratio for each SNP. A
schematic of the allele-specific expression assay is shown in Figure 3.1. The cDNA
allelic ratio for each heterozygote was compared to the 95% confidence interval
surrounding the mean genomic DNA allele intensity ratio for each SNP. At least five
heterozygotes were tested per SNP. If the cDNA allele intensity ratio for more than
50% of individual heterozygotes fell outside the 95% confidence interval and the
combined p-value was less than 10-6, the SNP was considered to be an eSNP.
43
In total, 105 eSNPs in 93 age-regulated genes were detected (Table 3.1, Figure
3.2). The median fold-change of the higher expressed allele to the lower-expressed
allele was 2.1. The level of overexpression of one allele varied widely among genes,
from 1.4-fold to apparent monoallelic (>10-fold) expression (Table 3.1, Figure 3.3).
Two genes (SPP1 and TIMP3) had linked eSNPs (r2>0.8 HapMap CEU population)
that both showed allele-specific differences in expression. Ten genes contained two
unlinked eSNPs that independently showed differences in expression.
For most of these eSNPs (96/105), the higher-expressed allele was usually the
same across heterozygotes. For example, the A allele is expressed higher than the C
allele in 11 of 12 heterozygotes tested at rs2245803 in the gene matrix
metalloproteinase 20 (MMP20, Figure 3.4), and the A allele is expressed higher than
the C allele in 12 of 13 heterozygotes tested at rs2296292 in LAMC1 (Figure 3.5A). In
these SNPs, the functional SNP causing the expression difference is likely linked to
the SNP we measured. For a smaller subset of the SNPs (9/105 eSNPs), both alleles
were observed at a higher level in different heterozygotes. One explanation for this is
that the functional SNP causing the expression difference is not closely linked to the
SNP we measured in the transcript. Another explanation is that epigenetic effects
such as imprinting could cause the differences in expression from the two homologs.
For example, one of the genes in which either allele was associated with higher
expression is PEG3 (paternally expressed 3), which is a known imprinted gene
(Figure 3.5B; Murphy et al., 2001; Van den Veyver et al., 2001). Presumably, the
higher-expressed allele in our studies is from the paternal homolog.
44
386 SNPs were tested for association with expression by both the allele-
specific method and the total expression method. While 105 eSNPs were identified by
the allele-specific method, only five eSNPs were identified by the total expression
method. Of the five SNPs found by the total expression method, four were also found
by the allele-specific expression method (Bold in Table 3.1). One example is rs8643
in the gene TXNDC5, in which both methods found that the G allele is associated with
higher expression than the A allele (Figure 3.6). These results indicate that the allele-
specific assay identified many more eSNPs and is likely more sensitive in detecting
expression differences than the total expression assay. A probable reason is that for
the allele specific assay, expression is measured from two alleles in heterozygotes and
thus variability due to genetic background and environmental effects are reduced or
eliminated.
Discussion
We tested 386 mRNA SNPs in 276 age-regulated genes for allele-specific
expression in kidney tissue from 96 individuals. Our goal was to find genes that
associate with kidney aging and as an intermediate step we performed this eQTL
analysis in hopes of converging on genes most likely to be functional. Genes that
show allele associations with expression level may also show allele associations with a
biological function, such as glomerular filtration rate, our chosen phenotype of kidney
aging. By comparing the cDNA of heterozygotes to their genomic DNA, we found
105 SNPs in 93 age-regulated genes that are allele-specifically expressed.
45
Other groups have used the allele-specific expression approach to identify
differentially-expressed genes in lymphoblastoid cell lines (Pastinen et al., 2005;
Pastinen et al., 2004; Serre et al., 2008; Yan et al., 2002), brain (Bray et al., 2003),
white blood cells (Pant et al., 2006), fetal kidney and fetal liver (Lo et al., 2003).
These studies found that 20-50% of the genes in the genome are differentially
expressed. Sixteen of the genes showing allele-specific expression found by our study
were also found in previous studies (Table 3.2; Lo et al., 2003; Milani et al., 2009;
Pant et al., 2006; Serre et al., 2008). Thus, 77 of the 93 allele-specifically expressed
genes identified in this work represent novel findings. Our finding that 41% of tested
genes showed allele-specific expression is similar to the percentage found in previous
studies (Bray et al., 2003; Lo et al., 2003; Pant et al., 2006; Pastinen et al., 2005;
Pastinen et al., 2004; Serre et al., 2008; Yan et al., 2002).
We were able to compare the results of this allele-specific expression analysis
to that of total expression analysis from Chapter 2. Specifically, 41% of genes
assayed contained eSNPs using the allele-specific expression method, whereas only
2% of genes assayed contained eSNPs using the total expression method. The
statistical cutoff for finding eSNPs using the allele-specific method was more stringent
than the one used for the total expression method. Thus, our results may
underestimate the improved sensitivity of the allele-specific method over the total
expression method. Unlike the total expression method, the allele-specific method
examines alleles within the same cellular environment in heterozygous individuals.
This maximizes the sensitivity of the assay because the alleles are expressed from the
same environment and genetic background. The implications of this result that the
46
allele-specific method is more sensitive than the total expression method are discussed
in Chapter 5. SNPs in the 93 kidney eQTLs found by the allele-specific expression
method were tested for association with kidney aging as the final step in our genomic
convergence approach (Chapter 4).
Methods
Ethics Statement
Ethical approval for the study was obtained from the Stanford University
Institutional Review Board (IRB). All subjects provided written informed consent for
the collection of samples and subsequent analysis. This study was conducted
according to the principles expressed in the Declaration of Helsinki.
Stanford Kidney Samples
Normal kidney tissue was obtained from Stanford University Medical Center
with informed consent either from biopsies of kidneys from transplantation donors or
from nephrectomy patients with localized pathology. Kidney tissue from nephrectomy
patients was harvested meticulously with the intention of gathering normal tissue
uninvolved by the tumor. Samples that showed evidence of pathological involvement
or in which there was only tissue in close proximity to the tumor were not used.
Kidney sections were immediately frozen on dry ice and stored at −80°C until use.
RNA and DNA Preparation
Frozen kidney samples were weighed (25-50 mg), cut into small pieces on dry
ice, and then placed in 1 ml of TRIzol Reagent (Invitrogen, Carlsbad, California,
United States) for RNA extraction or 600 µl of Buffer RLT Plus (Qiagen, Valencia,
47
California, United States) for DNA extraction. The tissue was homogenized using a
PowerGen700 homogenizer (Fisher Scientific, Pittsburgh, Pennsylvania, United
States). Total RNA was isolated according to the TRIzol Reagent protocol and
genomic DNA was isolated according to the Qiagen AllPrep DNA/RNA Mini Kit
protocol.
SNP Selection
Candidate aging genes were chosen from previous transcriptional profiling
studies and include 447 age-regulated kidney genes (Rodwell et al., 2004) as well as
the genes in the four pathways that are commonly age-regulated in the kidney, muscle
and brain: extracellular matrix, ribosome, chloride transport and electron transport
chain (Zahn et al., 2006). The candidate kidney aging genes were first searched for
mRNA SNPs that could be used in an allele-specific expression assay. In addition to
being within the transcript on an autosome, the SNPs had to have a minor allele
frequency greater than 0.05 in the HapMap CEU population, an Illumina SNP score
greater than 0.4, and be greater than 30 bp from an exon boundary (NCBI Build 36.1)
to ensure the Illumina genotyping assay would work properly for both genomic DNA
and cDNA. For genes that had multiple assayable mRNA SNPs, those closest to the
5’ end of the gene were chosen, with a maximum of two SNPs per gene. These
criteria were met for 386 SNPs in 276 genes.
Genotyping
The candidate aging SNPs were genotyped using a GoldenGate Custom Panel
from Illumina (San Diego, California, United States). Oligonucleotides specific for
48
each allele of each SNP were designed for use in a multiplex PCR. A standard
protocol designed by Illumina and implemented at the Stanford Human Genome
Center was used to determine the genotypes of the 96 individuals for whom we had
kidney tissue. Samples were hybridized to custom Sentrix Array Matrices and
scanned on the Illumina BeadStation 500GX. Allele calls were determined using the
Illumina BeadStudio clustering software. The genotyping was successful (>90% call
rate, HWE p > 0.001) at 95% of the genes.
Allele-Specific Expression Quantification
Total RNA was reverse transcribed into cDNA using the SuperScript Double-
Stranded cDNA Synthesis Kit (Invitrogen, Carlsbad, California, United States). The
same Illumina GoldenGate Custom Panel used for genotyping was used to measure
cDNA levels according to which allele of the SNP is present in the transcript. Only
SNPs for which the genomic DNA genotyping was successful were analyzed. After
the cDNA PCR products were hybridized and scanned, the raw allelic intensities were
first used to determine which transcripts were expressed. The expression threshold
was defined by the absent allele in normal homozygotes. That is, for an AA genotype,
the intensity of the B allele was taken to be background. The expression threshold
was calculated for each SNP as the mean of the background intensity plus two
standard deviations. SNPs with five or more heterozygotes showing expression of at
least one of the two alleles were carried through the rest of the analysis. Of the SNPs
measured, 309 of them in 225 genes were genotyped correctly (call rate >90%, HWE
p>0.001) and expressed above a background threshold in at least 5 heterozygotes. To
determine which alleles were associated with expression level, a confidence interval
49
was calculated for each SNP using the genomic DNA allele intensities of
heterozygotes. The confidence interval for each SNP was defined as the mean of the
normalized genomic DNA allele A/B raw intensity ratios plus or minus two standard
deviations. If the cDNA allele intensity ratio for more than 50% of individual
heterozygotes fell outside the 95% confidence interval and the meta p-value (Fisher,
1948) was less than 10-6, the SNP was considered to be an eSNP. eSNPs were not
observed simply due to low, noisy transcript levels because the relative abundance of
each gene in the total cDNA sample (calculated from whole-genome microarray data)
was greater than the relative abundance of the gene in the genomic DNA sample.
50
Figure 3.1 Assay for allele-specific expression. (A) Example: the G/A regulatory SNP in the upstream region causes the top transcript to be expressed higher than the bottom transcript. The C/G SNP is in the mRNA and linked to the regulatory SNP. (B) cDNA levels are measured by using a SNP genotyping assay designed to measure SNPs located in the mRNA of a gene (the C/G SNP in this example). The cDNA allelic ratio is compared to the genomic DNA allelic ratio (1:1 reference) to determine if the two alleles are expressed at significantly different levels (see Methods section in Chapter 3).
51
Table 3.1 Expression QTLs identified by allele-specific expression analysis in heterozygotes. Bold SNPs also significantly associated with total expression levels.
Gene SNP Number heterozygotes ASE proportion
Major allele (# higher expression)
Minor allele (# higher expression)
Mean fold change (higher allele / lower allele)
Fisher's Meta P-value
PEG3 rs1055359 32 1 A (12) G (20) 11.7 <10-100 COL17A1 rs805701 23 1 A (23) G (0) 2.2 <10-100 LAMC1 rs2296292 13 1 A (12) C (1) 2.2 10-44 MMP8 rs1276282 12 1 C (0) T (12) 2.6 <10-100 CLCA1 rs1882753 10 1 T (10) C (0) 2.4 <10-100 PAPPA2 rs2294654 9 1 G (0) A (9) 3.2 <10-100 GABRG2 rs211037 8 1 C (1) T (7) 2.1 <10-100 SLC16A7 rs10506399 36 0.97 G (0) A (35) 2 <10-100 PDIA4 rs1052549 31 0.97 T (0) G (30) 1.7 10-55 ATHL1 rs2242565 37 0.95 A (34) G (1) 3.1 <10-100 FAM83F rs17406386 33 0.94 A (31) G (0) 4.1 <10-100 BRP44L rs3728 43 0.93 T (0) G (40) 1.6 <10-100 LAMA3 rs1154226 29 0.93 C (0) G (27) 2 <10-100 KERA rs1990548 27 0.93 A (24) C (1) 1.9 <10-100 TXNDC5 rs8643 15 0.93 G (14) A (0) 3.1 10-40 FLJ38725 rs7992315 14 0.93 T (13) C (0) 1.6 10-46 COX7A2L rs1997 52 0.92 A (0) T (48) 1.6 10-91 MMP20 rs2245803 12 0.92 C (0) A (11) 2.1 <10-100 COL17A1 rs9425 12 0.92 G (0) A (11) 1.8 <10-100 RGS6 rs3291 23 0.91 A (20) G (1) 2.6 <10-100 CXCL14 rs1046092 11 0.91 A (0) G (10) 1.8 10-65 SPP1 rs4754 40 0.9 T (34) C (2) 4.8 <10-100 PHCA rs591043 31 0.9 A (0) G (28) 2.3 10-99 MATN2 rs3088121 21 0.9 A (1) G (18) 1.6 10-62
52
Gene SNP Number heterozygotes ASE proportion
Major allele (# higher expression)
Minor allele (# higher expression)
Mean fold change (higher allele / lower allele)
Fisher's Meta P-value
GPC6 rs1535692 20 0.9 G (0) A (18) 2.2 <10-100 GPC5 rs553717 10 0.9 G (0) A (9) 2.1 <10-100 GABRA4 rs7678338 45 0.89 T (40) C (0) 4.5 <10-100 GPR61 rs17575798 9 0.89 G (0) A (8) 1.9 10-23 ADRA2A rs3750625 9 0.89 C (0) A (8) 1.8 <10-100 TPD52 rs10098470 8 0.88 G (7) A (0) 1.4 10-43 TIMP3 rs1065314 36 0.86 T (31) C (0) 3.1 <10-100 DSPP rs2615489 22 0.86 A (19) G (0) 2.8 <10-100 SPP1 rs9138 40 0.85 A (28) C (6) 6.2 <10-100 TMEM92 rs2254177 12 0.83 C (0) T (10) 1.8 <10-100 CHRNA3 rs660652 11 0.82 G (0) A (9) 2.8 <10-100 OSMR rs1239344 36 0.81 G (0) A (29) 1.7 <10-100 TIMP3 rs1427384 36 0.81 A (1) G (28) 2.9 <10-100 GABRP rs929762 15 0.8 T (12) C (0) 2.4 <10-100 GPNMB rs5850 42 0.79 G (32) A (1) 1.8 10-59 C3 rs2230199 28 0.79 C(1) G (21) 2.8 <10-100 C7 rs14190 37 0.78 A (6) G (23) 1.7 10-77 GOT2 rs6993 32 0.78 C (0) T (25) 1.5 <10-100 RAFTLIN rs6900 13 0.77 C (10) T (0) 1.6 10-16 NDUFC2 rs499799 29 0.76 G (21) C (1) 4.6 10-54 THBS4 rs423906 20 0.75 G (0) A (15) 2.1 <10-100 MMP9 rs13969 42 0.74 A (0) C (31) 3 10-67 RPL15 rs1133926 27 0.74 A (1) G (19) 3.6 <10-100 LPL rs3208305 41 0.73 A (9) T (21) 2.3 <10-100 MMP25 rs1043298 37 0.73 T (1) A (26) 1.8 <10-100 NDUFAF1 rs1899 33 0.73 G (24) A (0) 2 10-50 CLCA1 rs1321694 15 0.73 A (0) T (11) 4 <10-100
53
Gene SNP Number heterozygotes ASE proportion
Major allele (# higher expression)
Minor allele (# higher expression)
Mean fold change (higher allele / lower allele)
Fisher's Meta P-value
LOC387758 rs7111860 15 0.73 T (11) G (0) 3.4 <10-100 SOHLH2 rs2296967 11 0.73 G (0) A (8) 3 <10-100 SPARCL1 rs9933 45 0.71 C (32) T (0) 1.6 10-69 PLK2 rs15915 35 0.71 G (1) A (24) 1.7 10-67 THBS4 rs1866389 28 0.71 C (19) G(1) 1.8 <10-100 MMP3 rs602128 24 0.71 T (17) C (0) 2.1 <10-100 LTF rs1126478 24 0.71 A (3) G (14) 4.1 <10-100 GPC6 rs17645969 30 0.7 C (2) A (19) 1.9 <10-100 DCN rs7441 10 0.7 C (0) T (7) 1.8 10-39 FMO5 rs894469 10 0.7 A (7) G (0) 3.7 <10-100 PRICKLE1 rs1043652 35 0.69 G (1) A (23) 2.1 10-48 FA2H rs1046371 37 0.68 C(1) G (24) 1.4 10-44 MMP9 rs20544 49 0.67 T (32) C (1) 3.2 <10-100 HAPLN1 rs2242128 18 0.67 G (1) C (11) 2.8 <10-100 SMPD2 rs1476387 6 0.67 G (3) T (1) 6.7 10-95 ATP5C1 rs4655 41 0.66 T (26) C (1) 2.1 <10-100 RHOBTB3 rs12351 37 0.65 T (24) G (0) 3.3 <10-100 PHYH rs11133 31 0.65 G (9) A (11) 1.7 10-70 IGF1R rs2229765 23 0.65 G (8) A (7) 1.4 10-53 POSTN rs6750 20 0.65 G (7) C(6) 2.8 <10-100 SP2 rs2229358 47 0.64 G (1) A (29) 1.6 <10-100 MATN1 rs20566 36 0.64 A (22) G (1) 1.6 10-51 RARRES1 rs2307064 28 0.64 C (9) T (9) 1.5 <10-100 RPL28 rs7255657 11 0.64 A (6) G (1) 1.9 10-21 PECI rs3177253 30 0.63 G (18) A (1) 2.4 10-59 LAMB1 rs7561 42 0.62 C (25) A (1) 2.7 10-59 GABRA4 rs17599102 39 0.62 C (1) T (23) 2.2 <10-100
54
Gene SNP Number heterozygotes ASE proportion
Major allele (# higher expression)
Minor allele (# higher expression)
Mean fold change (higher allele / lower allele)
Fisher's Meta P-value
ATP5F1 rs1264899 38 0.61 G (22) A (1) 1.4 10-32 MTR rs2853522 33 0.61 C (14) A (6) 1.5 10-44 NOV rs14324 28 0.61 C (1) T (16) 1.7 10-58 CLIC6 rs2834601 18 0.61 C (2) T (9) 2.2 10-60 KIAA0644 rs740252 40 0.6 G (1) C (23) 2 <10-100 EGF rs3733625 15 0.6 A (1) G (8) 5.1 <10-100 MFGE8 rs8530 27 0.59 G (3) A (13) 1.6 <10-100 FLRT2 rs17646457 22 0.59 G (1) A (12) 1.9 10-55 LIX1 rs316234 37 0.57 C (1) A (20) 1.9 <10-100 IGF1R rs3743262 14 0.57 C (2) T (6) 3.4 10-54 GLRB rs1129304 36 0.56 T (19) A (1) 1.5 10-77 FN1 rs2289202 20 0.55 G (0) A (11) 2 10-42 MAP4 rs1061003 41 0.54 C (22) G (0) 1.5 10-32 AATF rs1045056 41 0.54 T (0) C (22) 2.8 10-30 ADAMTS5 rs457947 37 0.54 C (19) G (1) 1.6 <10-100 CFB rs641153 13 0.54 C (3) T (4) 1.4 10-38 TGFB2 rs900 13 0.54 A (7) T (0) 2.1 10-15 MMP7 rs10502001 32 0.53 C (13) T (4) 1.6 10-25 ADCY1 rs2280495 19 0.53 C (1) T (9) 1.8 10-37 SLC16A7 rs3763979 17 0.53 G (3) A (6) 2.5 <10-100 HIBADH rs1052741 29 0.52 C (13) T (2) 2.7 10-45 FBLN2 rs1061375 40 0.5 G (1) A (19) 2 10-35 FLRT2 rs10309 38 0.5 G (2) A (17) 2.1 <10-100 C18orf1 rs3744811 32 0.5 C (1) T (15) 1.9 10-38 SPARCL1 rs1049539 32 0.5 A (1) G (15) 1.7 10-22 PTPRO rs1050646 18 0.5 T (0) C (9) 1.4 10-25 COL6A3 rs4663722 18 0.5 C (2) G (7) 1.5 10-24
55
Figure 3.2 Distribution of allele-specific expression. The white bars show the distribution of the allelic expression ratio for all heterozygotes that express the transcript of the 309 SNPs tested. The red bars show the distribution of the allelic expression ratio for heterozygotes that show allele-specific expression.
56
Figure 3.3 Distribution of mean allelic fold change in allele-specifically expressed genes. The level of overexpression of one allele varied widely among genes, from 1.4- fold to apparent monoallelic (>10-fold) expression. The median fold change was 2.1.
57
Figure 3.4 Allele-specific expression analysis. The red lines indicate the 95% confidence interval surrounding the normalized genomic DNA allelic ratio. Each bar represents one heterozygous individual at the particular SNP listed. Individuals above the upper bound or below the lower bound display allele-specific expression. (A) Negative control: no allele-specific expression was observed at SNP locus rs11553763 in the gene TSC1 in the ten heterozygotes tested. (B) Allele-specific expression was observed at SNP locus rs2245803 in the gene MMP20 in 11 of 12 heterozygotes tested. The A allele was expressed higher than the C allele in all the individuals displaying allele-specific expression.
58
Figure 3.5 Allele-specific expression eQTL characteristics. In eQTLs discovered by allele-specific expression analysis, the more highly expressed allele was most often the same across heterozygotes. The red lines indicate the 95% confidence interval surrounding the normalized genomic DNA allelic ratio. Each bar represents one heterozygous individual at the particular SNP listed. Individuals above the upper bound or below the lower bound display allele-specific expression. (A) In 12 of 13 heterozygotes at rs2296292 in LAMC1, the A allele is expressed higher than the C allele and in 1 heterozygote, the C allele is expressed higher than the A allele. In 91% of discovered eQTLs, greater than 75% of heterozygotes with differential expression have the same allele higher, which indicates the functional SNP is likely linked to the SNP that was interrogated in the transcript. (B) In 12 of 32 heterozygotes at rs1055359 in PEG3, the A allele is expressed higher than the G allele and in 20 of 32 heterozygotes, the G allele is expressed higher than the A allele. A similar pattern was observed in 9% of discovered eQTLs. PEG3 is a known imprinted gene.
59
Figure 3.6 Comparison of eQTL methods at one locus. (A) Allele-specific expression method: the red lines indicate the 95% confidence interval surrounding the normalized genomic DNA allelic ratio. Each bar represents one heterozygous individual at the particular SNP listed. Individuals above the upper bound or below the lower bound display allele-specific expression. Allele-specific expression was observed at SNP locus rs8643 in the gene TXNDC5 in 14 of 15 heterozygotes tested. The G allele was expressed higher than the A allele in all the individuals displaying allele-specific expression. (B) Total expression method: boxplot of TXNDC5 total expression according to genotype at the 3’ UTR SNP rs8643 (p = 1.2 x 10-4). The boxes define the interquartile range and the thick line is the median. Open dots are possible outliers. GG homozygotes at rs8643 have higher expression of TXNDC5 than heterozygotes.
60
Table 3.2 Common allele-specific expression across studies. Gene SNP Study Same? * C3 rs17030 Lo, Pant gene CLIC6 rs2834601 Serre snp COX7A2L rs1997 Pant snp FBLN2 rs9843344 Milani gene LAMB1 rs7561 Serre snp LAMC1 rs20563 Milani gene LTF rs1126478 Pant snp MATN2 rs2615 Lo gene MFGE8 rs1878326 Milani gene MMP7 rs10502001 Serre snp MMP8 rs1940475 Pant gene MMP9 rs13925 Pant gene MTR rs2229276 Pant gene RGS6 rs3291 Lo snp SLC16A7 rs3763980 Pant gene SP2 rs2229358 Milani snp
*Those labeled "snp" showed ASE in our dataset and the respective study at the same snp. Those labeled "gene" showed ASE at the same gene in our dataset and at the shown snp in the respective study. Studies: Lo, H.S. et al. Allelic variation in gene expression is common in the human genome. Genome Res 13, 1855-62 (2003). Milani, L. et al. Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation. Genome Res 19, 1-11 (2009). Pant, P.V. et al. Analysis of allelic differential expression in human white blood cells. Genome Res 16, 331-9 (2006). Serre, D. et al. Differential allelic expression in the human genome: a robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genet 4, e1000006 (2008).
61
Chapter 4: Genetic Association with Kidney Aging
Portions of this chapter were previously published in
PLoS Genetics (2009), 5(10): e1000685 with the following authors:
Heather E. Wheeler1, E. Jeffrey Metter2,3, Toshiko Tanaka2,3, Devin Absher4, John Higgins5, Jacob M. Zahn6, Julie Wilhelmy6, Ronald W. Davis6, Andrew Singleton7,
Richard M. Myers4, Luigi Ferrucci2,3, Stuart K. Kim1,8
1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Longitudinal Studies Section, Clinical Research Branch, National Institute on Aging,
Baltimore, MD, USA, 3Medstar Research Institute, Baltimore, MD, USA, 4HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA, 5Department of
Pathology, Stanford University Medical Center, Stanford, CA, USA, 6Stanford Genome Technology Center, Palo Alto, CA, USA, 7Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA, 8Department of Developmental
Biology, Stanford University Medical Center, Stanford, CA, USA
62
Background
Our sequential genomic convergence approach identified 101 genes that show
age-related changes in expression in the kidney and that also contain SNPs associated
with expression (eSNPs), indicating a presence of functional polymorphisms. Two
methods were used to find these expression quantitative trait loci (eQTLs), the total
expression method and the allele-specific expression method (Chapters 2-3). We used
these eQTLs as candidates in a gene association study of normal kidney aging. We
genotyped a total of 2038 SNPs within these 101 genes in two different cohorts
selected to study normal aging. A list of the 2038 SNPs genotyped can be found at
http://www.plosgenetics.org/doi/pgen.1000685 (Table S4).
In the two cohorts of aging, the function of the kidney was measured by
glomerular filtration rate (GFR) using 24-hour creatinine clearance. The first cohort is
the Baltimore Longitudinal Study of Aging (BLSA), which is a long-running study of
human aging begun in 1958 (Lindeman et al., 1984). This study has enlisted over
3000 healthy volunteers from the Baltimore area for clinical evaluations of many age-
related traits and diseases (Ferrucci, 2008). GFR was measured at multiple ages for
each individual, with an average of 3-4 measurements per individual taken at different
times spanning decades. Thus, this study shows not only the average level of kidney
function with respect to age, but also shows the age-related downward trend in kidney
function for each individual. Multiple GFR measurements and genotype data were
available for 1066 participants.
63
The second cohort is the InCHIANTI study, which is a population-based
epidemiological study aimed at measuring factors important for aging in the older
population living in the Chianti region of Tuscany, Italy (Ferrucci et al., 2000). About
90% of the individuals age 65 and older from two towns participated in this study,
making it an exceptionally useful source to study genetic determinants of normal
aging. GFR measurements were performed at one age in 1130 individuals.
Characteristics of both cohorts are shown in Table 4.1. The 2038 genotyped SNPs in
the 101 eQTLs were tested for association with GFR, our chosen phenotype of kidney
aging.
Results
We used regression models that included age as a covariate to test the SNP
genotypes in each population for association with GFR (See Methods). In order for an
allelic association with GFR to be considered significant, we first required evidence of
association in both populations (p<0.05 in each population). A total of 13 genes
contained SNPs that met these criteria (Table 4.2). Next, we combined these p-values
using Fisher’s meta analysis, a method for combining p-values from independent tests
with the same overall hypothesis (Fisher, 1948). To correct for multiple hypothesis
testing, we performed 1000 permutations of each model by swapping identification
labels and keeping the genotypes together to preserve linkage disequilibrium (See
Methods). Two linked SNPs (rs1711437 and rs1784418) in matrix metalloproteinase
20 (MMP20) remained significant after permutation testing (uncorrected p < 5 x 10-5,
corrected p = 0.01).
64
We considered whether associations found in the BLSA cohort could have
been due to population structure. Concern for population structure was minimal in the
InCHIANTI cohort because it is a homogeneous Italian population. Most of the
BLSA cohort is made of Caucasian individuals (84%). Our mixed-effect regression
model included a covariate for self-reported race, which should control for differences
due to population structure. In addition, we found that rs1711437 in MMP20 showed
an association with kidney aging using only data from self-reported Caucasians in the
BLSA cohort (uncorrected p = 0.0010). These results indicate that the MMP20 SNPs
associate with kidney aging per se, and are not artifacts arising from genetic
differences between races.
A SNP in the insulin-like growth factor 1 receptor gene (IGF1R) was strongly
associated with GFR when taking age into account in the meta-analysis (rs11630259,
p = 7.8 x 10-5, Table 4.2). Decreased activity of this gene has been associated with
longer lifespan in model organisms and humans (Holzenberger et al., 2003; Kenyon et
al., 1993; Suh et al., 2008). However, SNPs in IGF1R did not remain significant
following permutation testing. Therefore, further studies are required to establish a
connection between this SNP and kidney aging.
In both populations, one or two copies of the A allele at rs1711437 in MMP20
associated with a higher GFR (Figure 4.1). For an individual who carries the A allele,
his or her creatinine clearance is approximately that of someone 4-5 years younger
who does not carry the A allele. In the BLSA population, the genotype of rs1711437
explains 2.1% of the variation in creatinine clearance and in the InCHIANTI
65
population, the genotype explains 0.9% of the variation. Similar results were found
for the second SNP rs1784418, which is in linkage disequilibrium with rs1711437.
Both rs1711437 and rs1784418 are associated with variation in kidney aging,
but the functional SNP is not known. The eSNP rs2245803 identified by allele-
specific expression analysis is not linked to rs1711437 and rs1784418 (Figure 4.2).
Thus, some other SNP in this linkage disequilibrium block, such as a coding SNP or a
different eSNP, may cause differences in activity of MMP20 and be responsible for
association with the kidney aging phenotype. Interestingly, two nonsynonymous
coding SNPs, rs1784424 (Asn281Thr) and rs1784423 (Ala275Val) are contained
within this linkage disequilibrium block (Figure 4.2). These amino acid differences
might affect MMP20 function and these coding changes may be causal for differences
in kidney aging among individuals.
Discussion
As the final step in our genomic convergence approach, we tested 2038 SNPs
in 101 eQTLs for association with kidney aging. Two SNPs in MMP20 significantly
associated with age-related decline in GFR of the kidney. Matrix metalloproteinases
degrade extracellular matrix proteins including laminin, elastin, proteoglycans,
fibronectin, and collagens (Jormsjo et al., 2001). Most previous studies of MMP20
describe its role in tooth development (Bartlett et al., 2006). A role for MMP20 in
renal function has not been previously described. Changes in the extracellular matrix
play a key role in aging of the kidney. Interstitial fibrosis occurs during aging because
of an increase in matrix (Abrass et al., 1995).
66
The insulin-like growth factor 1 receptor (IGF1R) was the second-highest
scoring gene in our kidney aging association study (Table 4.2). Although the SNP in
this gene did not reach statistical significance in this study, this result is interesting
because this gene is part of the insulin-like signaling pathway that has been shown in
be involved in aging in worms, flies and mice (Guarente and Kenyon, 2000). In
humans, rare variants in the IGF1R gene in centenarians are associated with reduced
IGF1R levels and defective IGF signaling (Suh et al., 2008).
In a genome-wide association study, SNPs in three gene regions (UMOD,
SHROOM3, GATM-SPATA5L1) were shown to associate with GFR (Kottgen et al.,
2009). None of these genes were age-regulated in the kidney and thus they were not
tested for expression associated SNPs in our study. Also, the study did not have
longitudinal data like the BLSA. This study was published just as our project was
finishing.
Our genomic convergence approach began with a genome-wide transcriptional
profile of kidney aging. We narrowed down which age-regulated genes to test for
association with kidney aging by performing eQTL mapping. Testing SNPs in 101
age-regulated eQTLs for association with GFR in two populations chosen to study
normal aging led to the discovery of a SNP in MMP20 that associates with GFR. This
finding needs to be replicated in additional populations, but may be the first gene
association found for normal kidney aging. Genomic convergence, combining
expression and association analyses, can be used to detect genetic associations with
any phenotype of interest.
67
Methods
BLSA Samples
The Baltimore Longitudinal Study of Aging (BLSA) is an intramural research
program within the National Institute on Aging (Lindeman et al., 1984). Healthy
volunteers aged 18 and older were enrolled in the study starting in 1958. BLSA
participants are predominantly Caucasian, community-residing volunteers who tend to
be well-educated, with above-average income and access to medical care. These
subjects visit the Gerontology Research Center at regular intervals for two days of
medical, physiological, and psychological testing. Each participant has a health
evaluation by a health provider (physician, nurse practitioner, or physician assistant).
Currently, the study population has 1450 active participants, aged 18-97 years
(http://www.grc.nia.nih.gov/branches/blsa/blsa.htm). The level of kidney function in
the participants has been measured longitudinally in each individual between 1 and 16
times over a 10 to 50 year time period. The kidney aging phenotype of glomerular
filtration rate (GFR) was measured by calculating creatinine clearance. Specifically,
serum creatinine and 24-hour urinary creatinine levels were obtained from participants
using standard clinical procedures (Metter et al., 2004), and were used to calculate
creatinine clearance as follows:
€
CCr =UCr ×VUPCr ×1440
(4.1)
where CCr is creatinine clearance in ml/min, UCr is urinary creatinine concentration,
VU is the volume of urine collected over 24 hours, PCr is the plasma concentration of
68
creatinine, and 1440 is the number of minutes in 24 hours. We were granted access to
genotype and GFR data for 1066 individuals. The genotype data comprised the 2038
SNPs genotyped on the Illumina HumanHap550 Genotyping BeadChip that are within
the 101 genes that contain SNP associations with expression and have minor allele
frequencies > 0.01 (Table S4). The GFR data included 3672 creatinine clearance
measurements.
InCHIANTI Samples
The participants in the InCHIANTI study consist of residents of two small
towns in Tuscany, Italy (Ferrucci et al., 2000). The study includes 1320 participants
(age range 20-102 yrs), who were randomly selected from the population registry of
Greve in Chianti (population 11,709) and Bagno a Ripoli (population 4,704) starting
in 1998 (Ferrucci et al., 2000). Over 90% of the population that were over the age of
65 participated in this study, and thus the cohort is a good representation of normal
aging (http://www.inchiantistudy.net).
GFR was calculated using creatinine clearance from 24-hour urine collection
as in the BLSA study. In this study, the measurement for creatinine clearance was
performed at one age only. The genotype data generated by HumanHap550
Genotyping BeadChip consisted of the same 2038 SNPs in 101 candidate aging genes
obtained from the BLSA (http://www.plosgenetics.org/doi/pgen.1000685, Table S4).
The sample size was 1130 individuals.
69
Glomerular Filtration Rate Regression Models
Due to the longitudinal nature of the BLSA data, we used a mixed-effect
regression analysis to search for SNP associations with creatinine clearance. Because
the creatinine clearance measurements within one subject over time are correlated, the
regression coefficients are allowed to vary between individuals. First, we developed
the following model using a likelihood ratio approach to explain how creatinine
clearance changes with time:
€
Yia = β0i + β1iai + β2ia i
2 + β3idia + β4 id i
2 + β5isi + β6iri + εia (4.2)
In equation 4.2, Yia is the creatinine clearance of subject i at age a, ai is the age of
subject i, dia is the date in decimal years of the visit of subject i at age a, si is the sex of
subject i, ri is the self-reported race of subject i, and εia is a random error term. Most
of the data points (84%) came from self-reported Caucasian individuals. These
individuals were coded 0 for the ri term and everyone else was coded 1. The
coefficients βki of each subject i for k = 0-6 were estimated by maximum likelihood
from the data using the “lmer” function from the “lme4” package of R version 2.8.0.
Next, to determine if the genotype of any of our candidate aging genes can account for
some of the variance in creatinine clearance, we added two terms to the model:
€
Yia = β0ij + β1ijai + β2ija i
2 + β3ijdia + β4 ijd i
2 + β5ij si + β6ij ri + β7ijgij + β8ij (gij × ai) + εija (4.3)
In equation 4.3, gij is the genotype of SNP j in subject i. We obtained estimates for
three different inheritance models: additive, recessive and dominant. In the additive
model g is 0, 1, or 2 for homozygous dominant, heterozygous, and homozygous
recessive genotypes, respectively. In the recessive model, g is 0 for the homozygous
dominant and heterozygous genotypes and g is 1 for the homozygous recessive
70
genotype. In the dominant model, g is 0 for the homozygous dominant genotype and g
is 1 for the heterozygous and homozygous recessive genotypes. For each SNP and
each inheritance model, we compared the results from equation 4.3 to the results from
equation 4.2 using a likelihood ratio test to generate a p-value for each SNP. Even
though we included a self-reported race term in our models, we also confirmed the
rs1711437 association with GFR by analyzing only the data points from Caucasian
individuals (p = 0.0010).
For the InCHIANTI data, because the data are not longitudinal, we used a
simple linear regression model to search for SNP associations with creatinine
clearance. We tested the three inheritance models for SNP association with creatinine
clearance at every age (equation 4.4) and for SNP association with the rate of
creatinine clearance decline with age (equation 4.5):
€
Yi = β0 j + β1 jgij + β2 jai + β3 j si + εij (4.4)
€
Yi = β0 j + β1 jgij + β2 jai + β3 j (gij × ai) + β4 j si + εij (4.5)
In equations 4.4 and 4.5, Yi is the creatinine clearance of subject i, gij is the genotype
of subject i at SNP j, ai is the age of subject i, si is the sex of subject i, and εij is a
random error term. The coefficients were estimated by least squares from the data. In
equation 4.4, our primary interest was β1j values that significantly differed from zero,
indicating that SNP j associates with creatinine clearance at every age. In equation
4.5, our primary interest was β3j values that significantly differed from zero, indicating
that SNP j associates with the rate of creatinine clearance decline with age.
71
Testing for Evidence of SNP Association with GFR in Both Datasets
In order to be confident of a SNP association with GFR, we required the SNP
to show evidence of association in both the BLSA and InCHIANTI populations. That
is, we combined the p-values from the BLSA and InCHIANTI data using Fisher’s
method (equation 2.2) only if the individual p-values for a particular SNP and
inheritance model in each population were both less than 0.05. We used the p-value
from the likelihood ratio test for the BLSA data and the p-value from the β1j estimate
from equation 4.4 or the β3j estimate from equation 4.5 for the InCHIANTI data to
calculate the meta p-value.
Permutation Analysis
To correct for multiple hypothesis testing, we performed permutations to test
how often our results could appear by chance. We resampled the data for each
population and each model 1000 times, keeping the genotypes together, but swapping
the sample labels. The creatinine clearance, age, date and sex information remained
together, but the 2038 SNP genotypes connected to each individual were changed in
each permutation. Therefore, only the phenotype-genotype relationship was altered by
permutation, the linkage disequilibrium patterns between SNPs remained the same.
For each permutation, we calculated Fisher’s meta p-values only when both individual
p-values from each population were less than 0.05, as we did in the observed data.
Then, for each model, we determined how many of the permutations met or exceeded
the number of SNPs we found in the observed data at meta p-value thresholds. The
72
permuted p-value was the number of permutations that met these criteria divided by
1000. Permuted p-values less than 0.05 were considered significant.
73
Table 4.1 Characteristics of kidney aging study samples.
BLSA
Mean (SD) or n InCHIANTI
Mean (SD) or n Age 57.6 (17.1) 68.4 (15.5) Date of Birth 1932 (13.5) 1931 (15.5) No. Subjects 1066 1130 No. GFR measurements per subject 3.4 (2.6) 1 (0) No. Male datapoints 2313 515 No. Female datapoints 1359 615 24-hour Creatinine Clearance 112.9 (42.4) 82.4 (30.2)
74
Table 4.2 Top SNPs that show association with kidney aging in two populations.
Gene SNP Model BLSA P InCHIANTI P Fisher’s Meta P* Permuted P MMP20 rs1711437 DOM 0.0017 0.0015 3.6 x 10-5 1.0 x 10-2 IGF1R rs11630259 REC 0.0001 0.0443 7.8 x 10-5 NS RGS6 rs8007684 ADD x AGE 0.0165 0.0009 1.9 x 10-4 NS FAM83F rs3021274 DOM x AGE 0.0063 0.0234 1.4 x 10-3 NS MMP25 rs1004792 REC x AGE 0.0038 0.0427 1.6 x 10-3 NS ADCY1 rs11766192 REC x AGE 0.0352 0.0054 1.8 x 10-3 NS ADAMTS5 rs10482979 REC 0.0169 0.0211 3.2 x 10-3 NS GPC5 rs342693 REC x AGE 0.0325 0.0149 4.2 x 10-3 NS MTR rs2275568 ADD 0.0286 0.0319 7.3 x 10-3 NS RPL15 rs2360610 DOM 0.0469 0.0226 8.3 x 10-3 NS GLRB rs17035648 DOM x AGE 0.0252 0.0474 9.2 x 10-3 NS GPC6 rs4612931 DOM x AGE 0.0496 0.0270 1.0 x 10-2 NS SOHLH2 rs9593921 DOM x AGE 0.0380 0.0419 1.2 x 10-2 NS
*Calculated only if individual p-values from each population were <0.05
75
Figure 4.1 A SNP in MMP20 associates with a kidney aging phenotype. Loess smoothing lines through a scatter plot of creatinine clearance versus age stratified by genotype at rs1711437 in the BLSA (A) and InCHIANTI (B) populations. (corrected p = 0.01).
76
Figure 4.2 Linkage disequilibrium pattern of MMP20. The two SNPs (green) for which we found significant associations with kidney aging are located in introns of MMP20. They are linked to each other and to two nonsynonymous SNPs (black) located in exon 6 of MMP20. Pairwise r2 LD values (darker boxes correspond to higher r2 values) from the HapMap CEU population are displayed. These four SNPs are not linked to the SNP (red) in exon 1 that associated with expression level of the gene. Plot made using Haploview (Barrett et al., 2005).
77
Chapter 5: Conclusions
Portions of this chapter will be submitted for publication in Philosophical Transactions of the Royal Society B: Biological Sciences (2010)
with the following authors:
Heather E. Wheeler1 and Stuart K. Kim1,2
1Department of Genetics, Stanford University Medical Center, Stanford, CA, USA, 2Department of Developmental Biology, Stanford University Medical Center,
Stanford, CA, USA
78
Summary and Discussion of Findings
The goal of our approach was to converge on genes that influence human
kidney aging through sequential genomic analyses. Our genomic convergence
procedure began with a genome-wide transcriptional profile of aging in the human
kidney, which gave an unbiased view of gene expression changes that occur with age
(Rodwell et al., 2004). Then, we used total expression analysis and allele-specific
expression analysis to determine which alleles are differentially expressed. We
identified 101 age-regulated eQTLs. SNPs in one of these genes, MMP20, showed a
statistically significant association with normal kidney aging. Although significant by
combining the data from two independent populations, the best way to confirm our
gene association with renal aging is to replicate the findings in additional populations.
The populations used to identify aging SNPs, BLSA and InCHIANTI, stand
out for their usefulness in studying normal kidney aging. Both of these studies were
purposefully designed to study healthy individuals, instead of those harboring diseases
associated with old age. The BLSA study includes longitudinal measurements of traits
associated with normal aging, which added considerable power to the analysis.
Two SNPs in MMP20 significantly associated with age-related decline in GFR
of the kidney. Matrix metalloproteinases are involved in the breakdown of
extracellular matrix in normal physiological processes, such as embryonic
development, reproduction, and tissue remodeling, as well as in disease processes,
such as arthritis and metastasis (Llano et al., 1997; Woessner, 1991). Matrix
metalloproteinases degrade extracellular matrix proteins including laminin, elastin,
79
proteoglycans, fibronectin, and collagens (Jormsjo et al., 2001). A role for MMP20 in
renal function has not been previously described, although prior studies show that
MMP20 plays an important role in tooth development (Bartlett et al., 2006). The
finding that a matrix metalloproteinase is involved in kidney aging is striking because
changes in the extracellular matrix play a key role in aging of the kidney. The
glomerular basement membrane thickens, and the mesangial matrix increases in
volume with age (McLachlan et al., 1977). Interstitial fibrosis occurs during aging
because of an increase in matrix and fibrillar collagen accumulation in the subintimal
space (Abrass et al., 1995).
MMP20 was included in our candidate aging gene set not because the gene
itself is significantly age-regulated in the kidney. Instead, MMP20 was included
because it is a component of the extracellular matrix, one of the pathways that
coordinately increased expression with age in three human tissues including the
kidney (Zahn et al., 2006). Therefore, polymorphisms in MMP20 may not only
associate with aging of the kidney, but may associate with phenotypes of aging in
other tissues as well. Additionally, if MMP20 is a common regulator of aging, certain
alleles may also be enriched in centenarians.
The second-highest scoring gene in our kidney aging association study is the
insulin-like growth factor 1 receptor. Although the SNP in this gene did not reach
statistical significance in this study, this result is interesting because this gene is part
of the insulin-like signaling pathway that has been shown in be involved in aging in
worms, flies and mice (Guarente and Kenyon, 2000). Specifically, reduced signaling
in this pathway results in longer lifespans for these model organisms. In worms, the
80
orthologous gene is called daf-2 (GeneID 175410), and daf-2 mutants can have
lifespans that are 100% longer than wild-type worms (Kenyon et al., 1993). In
humans, rare variants in the IGF1R gene in centenarians are associated with reduced
IGF1R levels and defective IGF signaling (Suh et al., 2008).
Sequential use of transcriptional profiling and eQTL mapping could be used as
a general method to increase the statistical power for any human gene association
study. Like candidate gene approaches, an advantage of our approach to identify
variants associated with kidney aging is that it increases the statistical power of the
gene association study by decreasing the number of SNPs that are tested to potentially
functional SNPs. An advantage of our sequential approach over a candidate gene
approach is that the entire genome was screened for genes that are age-regulated in the
first step.
Several groups have used DNA microarrays to measure gene expression in
lymphoblastoid cell lines and have found polymorphisms that associate with
expression level (Cheung et al., 2003; Cheung et al., 2005; Deutsch et al., 2005; Dixon
et al., 2007; Monks et al., 2004; Morley et al., 2004; Spielman et al., 2007; Stranger et
al., 2005; Stranger et al., 2007). In a total expression analysis of human brain cortical
tissue, 21% of genes have SNPs that associate with expression levels (Myers et al.,
2007). Other groups have used the allele-specific expression approach to identify
differentially-expressed genes in lymphoblastoid cell lines (Pastinen et al., 2005;
Pastinen et al., 2004; Serre et al., 2008; Yan et al., 2002), brain (Bray et al., 2003),
white blood cells (Pant et al., 2006), fetal kidney and fetal liver (Lo et al., 2003).
These studies found that 20-50% of the genes in the genome are differentially
81
expressed. Sixteen of the genes showing allele-specific expression found by our study
were also found in previous studies (Lo et al., 2003; Milani et al., 2009; Pant et al.,
2006; Serre et al., 2008). Thus, 77 of the 93 allele-specifically expressed genes
identified in this work represent novel findings. Our finding that 41% of tested genes
showed allele-specific expression is similar to the percentage found in previous studies
(Bray et al., 2003; Lo et al., 2003; Pant et al., 2006; Pastinen et al., 2005; Pastinen et
al., 2004; Serre et al., 2008; Yan et al., 2002).
Of the expression-associated SNPs we identified, most were found using
allele-specific expression measurements within heterozygotes. Specifically, 41% of
genes assayed contained eSNPs using the allele-specific expression method, whereas
only 2% of genes assayed contained eSNPs using the total expression method. The
statistical cutoff for finding eSNPs using the allele-specific method was more stringent
than the one used for the total expression method. Thus, our results may
underestimate the improved sensitivity of the allele-specific method over the total
expression method.
Unlike the total expression method, the allele-specific method examines alleles
within the same cellular environment in heterozygous individuals. This maximizes the
sensitivity of the assay because the alleles are expressed from the same environment
and genetic background. Previous work with a smaller set of 64 genes also showed
that allele-specific analysis in heterozygotes was more sensitive than total expression
methods for finding SNPs associated with expression levels in cis (Pastinen et al.,
2005). The results from the allele-specific analysis demonstrate that differential
expression is widespread across the human genome and suggest that differential
82
expression could be a major factor contributing to differences in phenotype among
individuals.
Future Directions for Human Aging Genomics
Finding new human aging genes, possibly MMP20, contributes to our
understanding of molecular mechanisms underlying the human aging process. Among
young individuals, an unfavorable SNP genotype may indicate risk for rapid decline in
kidney function and this information could be extremely useful to identify patients
who may require early intervention. Among older individuals, a favorable SNP
genotype may indicate that they may still be eligible as kidney donors even though
they are over the current upper age limit. As more aging genes are confirmed, the
alleles belonging to a patient can be combined to better predict the aging trajectory of
the kidney.
Our finding that the allele-specific expression method is more sensitive to
detect associations than the total expression method has implications for everyone
studying the genetics of gene expression. The Genotype-Tissue Expression (GTEx)
project aims to study and map the relationship between human gene expression and
genetic variation. The project is currently in a pilot phase and will analyze dense
genotyping and expression data collected from multiple human tissues and will
correlate genetic variation and gene expression, thus producing a list of genetic
regions associated with expression of specific transcripts (Hardy and Singleton, 2009).
As the GTEx project moves forward, it will be important to consider allele-specific
expression data to maximize sensitivity to detect differential allelic expression.
83
The GTEx project and other publicly available eQTL datasets will allow more
widespread use of the genomic convergence approach. Gene associations found in
genome-wide association studies (GWAS) that are also eQTLs provide a possible
functional mechanism and should be given higher priority in follow-up studies.
Expression studies are especially important became many of the SNP associations
found in GWAS are often intergenic or intronic (Hardy and Singleton, 2009; Kottgen
et al., 2009; WTCCC, 2007).
Currently, GWAS are only able to identify common alleles (minor allele
frequencies ≥ 0.05) associated with phenotypes of interest. Next-generation
sequencing technologies from companies like Illumina, Roche and Helicos can
identify rare variants (Pushkarev et al., 2009; Wang et al., 2008; Wheeler et al., 2008).
Although faster and cheaper than traditional capillary sequencers these next-
generation technologies produce shorter read lengths (35–250 bp, depending on the
platform) than capillary sequencers (650–800 bp) (Mardis, 2008). Therefore, they are
most useful when a reference genome is available and thus work well in human
studies. Deep human transcriptome-wide resequencing (RNA-seq) using next-
generation technologies has started to be used in allele-specific expression studies
(Bell and Beck, 2009; Wang et al., 2009). One study of primary T cells from four
individuals was able to test 1371 transcripts for allele-specific expression (Heap et al.,
2010). A major hurdle of allele-specific expression analysis using RNA-Seq data is
read-mapping biases. When sequence reads are mapped back to the genome, there is a
significant bias toward higher mapping rates of the allele in the reference sequence,
compared with the alternative allele in heterozygotes (Degner et al., 2009). After
84
controlling for these biases by masking known SNPs and relaxing the mismatch
threshold, 7.5% of the 1371 tested transcripts in Heap et al. (2010) were allele-
specifically expressed. Future studies of cells or tissues from a greater number of
individuals will provide more heterozygotes and thus more transcripts to test for
allele-specific expression.
All of the alleles found thus far to associate with aging and longevity have
small effect sizes. If rare variants rather than common variants explain most of the
genetic variation in aging among humans, new computational methods must be
developed to find genes and pathways involved in aging. Methods are needed that can
combine multiple rare variants in the same gene or the same pathway into a score that
can be tested for association with different phenotypes of aging and longevity.
Several association test methods for rare variants have been proposed and it remains to
be seen if they will be successful in accounting for the missing genetic variation
among individuals in aging and other complex traits (Guo and Lin, 2009; Li and Leal,
2008; Zhu et al., 2009).
Another approach is to link genes and pathways to aging through molecular
networks. Examination of the perturbations of expression, protein and metabolite
networks that occur with age could reveal pathways important in the aging process. A
study in mice found that computationally identified targets of the NF-κB transcription
factor decrease expression correlation with age (Southworth et al., 2009). Blockade of
NF-κB for 2 weeks in the epidermis of chronologically aged mice reverted the tissue
characteristics and global gene expression patterns to those of young mice,
emphasizing the importance of the pathway in aging (Adler et al., 2007). Also, by
85
examining tissue-to-tissue coexpression networks in mice, new obesity pathways were
identified (Dobrin et al., 2009). Complex machine learning techniques are likely
necessary to understand complex human phenotypes including aging. Aging and
common human diseases originate from a complex interplay between variations in
DNA (both rare and common) and a broad range of factors such as diet, sex and
exposure to environmental toxins. A benefit of examining molecular networks is that
both genetic and environmental perturbations affect the states of networks that in turn
affect the phenotype of interest (Schadt, 2009). Thus, molecular networks represent a
useful intermediate between the genetic/environmental input and the complex
phenotype to find associated molecular pathways. Molecular network data can also be
used to understand the biological context in which a given gene found in a traditional
association study operates.
Greater understanding of the pathways involved in aging of different human
tissues could lead to drug targets. For some aging genes, one allele may adversely
affect tissue function in old age because it increases the activity of the gene above a
healthy level. In these cases, one could develop a drug to target the gene or pathway in
individuals carrying the overactive allele, and thereby preserve function in old age.
The field of pharmacogenomics, which determines how genotype can predict an
individual’s response to a drug, will be important to treating age-related decline.
Already, the necessary dose of the drug warfarin is known to vary widely among
individuals according to their genotypes (Takeuchi et al., 2009). Warfarin is an
anticoagulant used to prevent age-related conditions like stroke, thrombosis,
pulmonary embolism and coronary malfunction (Daly and King, 2003). Dosing
86
strategies according to an individual’s genotype at the relevant loci are beginning to be
implemented clinically (Klein et al., 2009). As more aging genes are confirmed, the
alleles belonging to a patient can be combined to better predict the aging trajectory of
the relevant organ or tissue. Medical practitioners will know which organs or tissues
are likely to age the fastest and can act to prevent age-related decline.
Pharmacogenomics data will help medical practitioners determine the best mode of
prevention or treatment for an individual. The goal of human aging research is not
necessarily to extend lifespan, but instead is to extend the healthy years of life.
87
References
Abrass, C.K., Adcox, M.J., and Raugi, G.J. (1995). Aging-associated changes in renal extracellular matrix. Am J Pathol 146, 742-752.
Adler, A.S., Sinha, S., Kawahara, T.L., Zhang, J.Y., Segal, E., and Chang, H.Y. (2007). Motif module map reveals enforcement of aging by continual NF-kappaB activity. Genes Dev 21, 3244-3257.
Adler, S., Lindeman, R.D., Yiengst, M.J., Beard, E., and Shock, N.W. (1968). Effect of acute acid loading on urinary acid excretion by the aging human kidney. The Journal of laboratory and clinical medicine 72, 278-289.
Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P. (2005). A haplotype map of the human genome. Nature 437, 1299-1320.
Arking, D.E., Atzmon, G., Arking, A., Barzilai, N., and Dietz, H.C. (2005). Association between a functional variant of the KLOTHO gene and high-density lipoprotein cholesterol, blood pressure, stroke, and longevity. Circ Res 96, 412-418.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29.
Atzmon, G., Rincon, M., Schechter, C.B., Shuldiner, A.R., Lipton, R.B., Bergman, A., and Barzilai, N. (2006). Lipoprotein genotype and conserved pathway for exceptional longevity in humans. PLoS Biol 4, e113.
Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263-265.
Bartlett, J.D., Skobe, Z., Lee, D.H., Wright, J.T., Li, Y., Kulkarni, A.B., and Gibson, C.W. (2006). A developmental comparison of matrix metalloproteinase-20 and amelogenin null mouse enamel. Eur J Oral Sci 114 Suppl 1, 18-23; discussion 39-41, 379.
88
Bell, C.G., and Beck, S. (2009). Advances in the identification and analysis of allele-specific expression. Genome Med 1, 56.
Bellizzi, D., Rose, G., Cavalcante, P., Covello, G., Dato, S., De Rango, F., Greco, V., Maggiolini, M., Feraco, E., Mari, V., et al. (2005). A novel VNTR enhancer within the SIRT3 gene, a human homologue of SIR2, is associated with survival at oldest ages. Genomics 85, 258-263.
Bluher, M., Kahn, B.B., and Kahn, C.R. (2003). Extended longevity in mice lacking the insulin receptor in adipose tissue. Science 299, 572-574.
Borkan, G.A., and Norris, A.H. (1980). Assessment of biological age using a profile of physical parameters. J Gerontol 35, 177-184.
Bray, N.J., Buckland, P.R., Owen, M.J., and O'Donovan, M.C. (2003). Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet 113, 149-153.
Cheung, V.G., Conlin, L.K., Weber, T.M., Arcaro, M., Jen, K.Y., Morley, M., and Spielman, R.S. (2003). Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33, 422-425.
Cheung, V.G., Spielman, R.S., Ewens, K.G., Weber, T.M., Morley, M., and Burdick, J.T. (2005). Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365-1369.
Clancy, D.J., Gems, D., Harshman, L.G., Oldham, S., Stocker, H., Hafen, E., Leevers, S.J., and Partridge, L. (2001). Extension of life-span by loss of CHICO, a Drosophila insulin receptor substrate protein. Science 292, 104-106.
Corder, E.H., Saunders, A.M., Strittmatter, W.J., Schmechel, D.E., Gaskell, P.C., Small, G.W., Roses, A.D., Haines, J.L., and Pericak-Vance, M.A. (1993). Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921-923.
Daly, A.K., and King, B.P. (2003). Pharmacogenetics of oral anticoagulants. Pharmacogenetics 13, 247-252.
89
Degner, J.F., Marioni, J.C., Pai, A.A., Pickrell, J.K., Nkadori, E., Gilad, Y., and Pritchard, J.K. (2009). Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207-3212.
Deutsch, S., Lyle, R., Dermitzakis, E.T., Attar, H., Subrahmanyan, L., Gehrig, C., Parand, L., Gagnebin, M., Rougemont, J., Jongeneel, C.V., et al. (2005). Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes. Hum Mol Genet 14, 3741-3749.
Dixon, A.L., Liang, L., Moffatt, M.F., Chen, W., Heath, S., Wong, K.C., Taylor, J., Burnett, E., Gut, I., Farrall, M., et al. (2007). A genome-wide association study of global gene expression. Nat Genet 39, 1202-1207.
Dobrin, R., Zhu, J., Molony, C., Argman, C., Parrish, M.L., Carlson, S., Allan, M.F., Pomp, D., and Schadt, E.E. (2009). Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol 10, R55.
Epstein, M., and Hollenberg, N.K. (1976). Age as a determinant of renal sodium conservation in normal man. J Lab Clin Med 87, 411-417.
Faubert, P.F., and Parush, J.G. (1998). Disorders of potassium metabolism. In Renal Disease in the Elderly (New York City, Marcel Dekker, Inc.), pp. 39-60.
Ferrucci, L. (2008). The Baltimore Longitudinal Study of Aging (BLSA): a 50-year-long journey and plans for the future. J Gerontol A Biol Sci Med Sci 63, 1416-1419.
Ferrucci, L., Bandinelli, S., Benvenuti, E., Di Iorio, A., Macchi, C., Harris, T.B., and Guralnik, J.M. (2000). Subsystems contributing to the decline in ability to walk: bridging the gap between epidemiology and geriatric practice in the InCHIANTI study. J Am Geriatr Soc 48, 1618-1625.
Fisher, R.A. (1948). Combining independent tests of significance. American Statistician 2, 1.
Flachsbart, F., Caliebe, A., Kleindorp, R., Blanche, H., von Eller-Eberstein, H., Nikolaus, S., Schreiber, S., and Nebel, A. (2009). Association of FOXO3A variation with human longevity confirmed in German centenarians. Proc Natl Acad Sci U S A 106, 2700-2705.
90
Fliser, D., Zeier, M., Nowack, R., and Ritz, E. (1993). Renal functional reserve in healthy elderly subjects. J Am Soc Nephrol 3, 1371-1377.
Fox, C.S., Yang, Q., Cupples, L.A., Guo, C.Y., Larson, M.G., Leip, E.P., Wilson, P.W., and Levy, D. (2004). Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J Am Soc Nephrol 15, 2457-2461.
Geesaman, B.J., Benson, E., Brewster, S.J., Kunkel, L.M., Blanche, H., Thomas, G., Perls, T.T., Daly, M.J., and Puca, A.A. (2003). Haplotype-based identification of a microsomal transfer protein marker associated with the human lifespan. Proc Natl Acad Sci U S A 100, 14115-14120.
Gourtsoyiannis, N., Prassopoulos, P., Cavouras, D., and Pantelidis, N. (1990). The thickness of the renal parenchyma decreases with age: a CT study of 360 patients. AJR Am J Roentgenol 155, 541-544.
Goyal, V.K. (1982). Changes with age in the human kidney. Exp Gerontol 17, 321-331.
Guarente, L., and Kenyon, C. (2000). Genetic pathways that regulate ageing in model organisms. Nature 408, 255-262.
Guo, W., and Lin, S. (2009). Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol 33, 308-316.
Hamilton, W.D. (1966). The moulding of senescence by natural selection. J Theor Biol 12, 12-45.
Hardy, J., and Singleton, A. (2009). Genomewide association studies and human disease. N Engl J Med 360, 1759-1768.
Harman, D. (1956). Aging: a theory based on free radical and radiation chemistry. J Gerontol 11, 298-300.
Hauser, M.A., Li, Y.J., Takeuchi, S., Walters, R., Noureddine, M., Maready, M., Darden, T., Hulette, C., Martin, E., Hauser, E., et al. (2003). Genomic convergence:
91
identifying candidate genes for Parkinson's disease by combining serial analysis of gene expression and genetic linkage. Hum Mol Genet 12, 671-677.
Heap, G.A., Yang, J.H., Downes, K., Healy, B.C., Hunt, K.A., Bockett, N., Franke, L., Dubois, P.C., Mein, C.A., Dobson, R.J., et al. (2010). Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 19, 122-134.
Herskind, A.M., McGue, M., Holm, N.V., Sorensen, T.I., Harvald, B., and Vaupel, J.W. (1996). The heritability of human longevity: a population-based study of 2872 Danish twin pairs born 1870-1900. Hum Genet 97, 319-323.
Hoang, K., Tan, J.C., Derby, G., Blouch, K.L., Masek, M., Ma, I., Lemley, K.V., and Myers, B.D. (2003). Determinants of glomerular hypofiltration in aging humans. Kidney Int 64, 1417-1424.
Holzenberger, M., Dupont, J., Ducos, B., Leneuve, P., Geloen, A., Even, P.C., Cervera, P., and Le Bouc, Y. (2003). IGF-1 receptor regulates lifespan and resistance to oxidative stress in mice. Nature 421, 182-187.
Hunt, S.C., Coon, H., Hasstedt, S.J., Cawthon, R.M., Camp, N.J., Wu, L.L., and Hopkins, P.N. (2004). Linkage of serum creatinine and glomerular filtration rate to chromosome 2 in Utah pedigrees. Am J Hypertens 17, 511-515.
Jormsjo, S., Whatling, C., Walter, D.H., Zeiher, A.M., Hamsten, A., and Eriksson, P. (2001). Allele-specific regulation of matrix metalloproteinase-7 promoter activity is associated with coronary artery luminal dimensions among hypercholesterolemic patients. Arterioscler Thromb Vasc Biol 21, 1834-1839.
Kaeberlein, M., McVey, M., and Guarente, L. (1999). The SIR2/3/4 complex and SIR2 alone promote longevity in Saccharomyces cerevisiae by two different mechanisms. Genes Dev 13, 2570-2580.
Kaplan, C., Pasternack, B., Shah, H., and Gallo, G. (1975). Age-related incidence of sclerotic glomeruli in human kidneys. Am J Pathol 80, 227-234.
Karasik, D., Demissie, S., Cupples, L.A., and Kiel, D.P. (2005). Disentangling the genetic determinants of human aging: biological age as an alternative to the use of survival measures. J Gerontol A Biol Sci Med Sci 60, 574-587.
92
Kasiske, B.L. (1987). Relationship between vascular disease and age-associated changes in the human kidney. Kidney Int 31, 1153-1159.
Kenyon, C., Chang, J., Gensch, E., Rudner, A., and Tabtiang, R. (1993). A C. elegans mutant that lives twice as long as wild type. Nature 366, 461-464.
Kervinen, K., Savolainen, M.J., Salokannel, J., Hynninen, A., Heikkinen, J., Ehnholm, C., Koistinen, M.J., and Kesaniemi, Y.A. (1994). Apolipoprotein E and B polymorphisms--longevity factors assessed in nonagenarians. Atherosclerosis 105, 89-95.
Kincaid-Smith, P. (1991). "Age-related glomerular sclerosis: baseline values in Hong Kong". Pathology 23, 275.
Kirkwood, T.B. (1997). The origins of human ageing. Philos Trans R Soc Lond B Biol Sci 352, 1765-1772.
Kirkwood, T.B., and Austad, S.N. (2000). Why do we age? Nature 408, 233-238.
Klein, T.E., Altman, R.B., Eriksson, N., Gage, B.F., Kimmel, S.E., Lee, M.T., Limdi, N.A., Page, D., Roden, D.M., Wagner, M.J., et al. (2009). Estimation of the warfarin dose with clinical and pharmacogenetic data. N Engl J Med 360, 753-764.
Kottgen, A., Glazer, N.L., Dehghan, A., Hwang, S.J., Katz, R., Li, M., Yang, Q., Gudnason, V., Launer, L.J., Harris, T.B., et al. (2009). Multiple loci associated with indices of renal function and chronic kidney disease. Nat Genet.
Kuro-o, M., Matsumura, Y., Aizawa, H., Kawaguchi, H., Suga, T., Utsugi, T., Ohyama, Y., Kurabayashi, M., Kaname, T., Kume, E., et al. (1997). Mutation of the mouse klotho gene leads to a syndrome resembling ageing. Nature 390, 45-51.
Le-Niculescu, H., Balaraman, Y., Patel, S., Tan, J., Sidhu, K., Jerome, R.E., Edenberg, H.J., Kuczenski, R., Geyer, M.A., Nurnberger, J.I., Jr., et al. (2007). Towards understanding the schizophrenia code: an expanded convergent functional genomics approach. Am J Med Genet B Neuropsychiatr Genet 144B, 129-158.
93
Lescai, F., Blanche, H., Nebel, A., Beekman, M., Sahbatou, M., Flachsbart, F., Slagboom, E., Schreiber, S., Sorbi, S., Passarino, G., et al. (2009). Human longevity and 11p15.5: a study in 1321 centenarians. Eur J Hum Genet 17, 1515-1519.
Li, B., and Leal, S.M. (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83, 311-321.
Li, M., Nicholls, K.M., and Becker, G.J. (2002). Glomerular size and global glomerulosclerosis in normal Caucasian donor kidneys: effects of aging and gender. J Nephrol 15, 614-619.
Li, Y., Wang, W.J., Cao, H., Lu, J., Wu, C., Hu, F.Y., Guo, J., Zhao, L., Yang, F., Zhang, Y.X., et al. (2009). Genetic association of FOXO1A and FOXO3A with longevity trait in Han Chinese populations. Hum Mol Genet.
Liang, X., Slifer, M., Martin, E.R., Schnetz-Boutaud, N., Bartlett, J., Anderson, B., Zuchner, S., Gwirtsman, H., Gilbert, J.R., Pericak-Vance, M.A., et al. (2009). Genomic convergence to identify candidate genes for Alzheimer disease on chromosome 10. Hum Mutat 30, 463-471.
Lindeman, R.D., and Goldman, R. (1986). Anatomic and physiologic age changes in the kidney. Exp Gerontol 21, 379-406.
Lindeman, R.D., Tobin, J., and Shock, N.W. (1985). Longitudinal studies on the rate of decline in renal function with age. J Am Geriatr Soc 33, 278-285.
Lindeman, R.D., Tobin, J.D., and Shock, N.W. (1984). Association between blood pressure and the rate of decline in renal function with age. Kidney Int 26, 861-868.
Llano, E., Pendas, A.M., Knauper, V., Sorsa, T., Salo, T., Salido, E., Murphy, G., Simmer, J.P., Bartlett, J.D., and Lopez-Otin, C. (1997). Identification and structural and functional characterization of human enamelysin (MMP-20). Biochemistry 36, 15101-15108.
Lo, H.S., Wang, Z., Hu, Y., Yang, H.H., Gere, S., Buetow, K.H., and Lee, M.P. (2003). Allelic variation in gene expression is common in the human genome. Genome Res 13, 1855-1862.
94
Lu, T., Pan, Y., Kao, S.Y., Li, C., Kohane, I., Chan, J., and Yankner, B.A. (2004). Gene regulation and DNA damage in the ageing human brain. Nature 429, 883-891.
Marcantoni, C., Ma, L.J., Federspiel, C., and Fogo, A.B. (2002). Hypertensive nephrosclerosis in African Americans versus Caucasians. Kidney Int 62, 172-180.
Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet 24, 133-141.
McGue, M., Vaupel, J.W., Holm, N., and Harvald, B. (1993). Longevity is moderately heritable in a sample of Danish twins born 1870-1880. J Gerontol 48, B237-244.
McLachlan, M.S., Guthrie, J.C., Anderson, C.K., and Fulker, M.J. (1977). Vascular and glomerular changes in the ageing kidney. J Pathol 121, 65-78.
Metter, E.J., Talbot, L.A., Schrager, M., and Conwit, R.A. (2004). Arm-cranking muscle power and arm isometric muscle strength are independent predictors of all-cause mortality in men. J Appl Physiol 96, 814-821.
Milani, L., Lundmark, A., Nordlund, J., Kiialainen, A., Flaegstad, T., Jonmundsson, G., Kanerva, J., Schmiegelow, K., Gunderson, K.L., Lonnerholm, G., et al. (2009). Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation. Genome Res 19, 1-11.
Mitchell, B.D., Hsueh, W.C., King, T.M., Pollin, T.I., Sorkin, J., Agarwala, R., Schaffer, A.A., and Shuldiner, A.R. (2001). Heritability of life span in the Old Order Amish. Am J Med Genet 102, 346-352.
Monks, S.A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J.W., Sachs, A., and Schadt, E.E. (2004). Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 75, 1094-1105.
Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S., and Cheung, V.G. (2004). Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743-747.
Mudge, J., Miller, N.A., Khrebtukova, I., Lindquist, I.E., May, G.D., Huntley, J.J., Luo, S., Zhang, L., van Velkinburgh, J.C., Farmer, A.D., et al. (2008). Genomic
95
convergence analysis of schizophrenia: mRNA sequencing reveals altered synaptic vesicular transport in post-mortem cerebellum. PLoS ONE 3, e3625.
Murphy, S.K., Wylie, A.A., and Jirtle, R.L. (2001). Imprinting of PEG3, the human homologue of a mouse gene involved in nurturing behavior. Genomics 71, 110-117.
Myers, A.J., Gibbs, J.R., Webster, J.A., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M., Leung, D., Bryden, L., Nath, P., et al. (2007). A survey of genetic human cortical gene expression. Nat Genet 39, 1494-1499.
Nebel, A., Croucher, P.J., Stiegeler, R., Nikolaus, S., Krawczak, M., and Schreiber, S. (2005). No association between microsomal triglyceride transfer protein (MTP) haplotype and longevity in humans. Proc Natl Acad Sci U S A 102, 7906-7909.
Neugarten, J., Kasiske, B., Silbiger, S.R., and Nyengaard, J.R. (2002). Effects of sex on renal structure. Nephron 90, 139-144.
Newbold, K.M., Sandison, A., and Howie, A.J. (1992). Comparison of size of juxtamedullary and outer cortical glomeruli in normal adult kidney. Virchows Archiv 420, 127-129.
Noureddine, M.A., Li, Y.J., van der Walt, J.M., Walters, R., Jewett, R.M., Xu, H., Wang, T., Walter, J.W., Scott, B.L., Hulette, C., et al. (2005). Genomic convergence to identify candidate genes for Parkinson disease: SAGE analysis of the substantia nigra. Mov Disord 20, 1299-1309.
Novelli, V., Viviani Anselmi, C., Roncarati, R., Guffanti, G., Malovini, A., Piluso, G., and Puca, A.A. (2008). Lack of replication of genetic associations with human longevity. Biogerontology 9, 85-92.
Oliveira, S.A., Li, Y.J., Noureddine, M.A., Zuchner, S., Qin, X., Pericak-Vance, M.A., and Vance, J.M. (2005). Identification of risk and age-at-onset genes on chromosome 1p in Parkinson disease. Am J Hum Genet 77, 252-264.
Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R., and Frazer, K.A. (2006). Analysis of allelic differential expression in human white blood cells. Genome Res 16, 331-339.
96
Partridge, L. (2010). The new biology of ageing. Philos Trans R Soc Lond B Biol Sci 365, 147-154.
Partridge, L., and Gems, D. (2002). A lethal side-effect. Nature 418, 921.
Pastinen, T., Ge, B., Gurd, S., Gaudin, T., Dore, C., Lemire, M., Lepage, P., Harmsen, E., and Hudson, T.J. (2005). Mapping common regulatory variants to human haplotypes. Hum Mol Genet 14, 3963-3971.
Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H., et al. (2004). A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics 16, 184-193.
Pawlikowska, L., Hu, D., Huntsman, S., Sung, A., Chu, C., Chen, J., Joyner, A., Schork, N.J., Hsueh, W.C., Reiner, A.P., et al. (2009). Association of common genetic variation in the insulin/IGF1 signaling pathway with human longevity. Aging Cell.
Perls, T.T., Wilmoth, J., Levenson, R., Drinkwater, M., Cohen, M., Bogan, H., Joyce, E., Brewster, S., Kunkel, L., and Puca, A. (2002). Life-long sustained mortality advantage of siblings of centenarians. Proc Natl Acad Sci U S A 99, 8442-8447.
Pritchard, J.K., Stephens, M., and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics 155, 945-959.
Pritchard, J.K., Wen, X., and Falush, D. (2007). Documentation for structure software: Version 2.2.
Puca, A.A., Daly, M.J., Brewster, S.J., Matise, T.C., Barrett, J., Shea-Drinkwater, M., Kang, S., Joyce, E., Nicoli, J., Benson, E., et al. (2001). A genome-wide scan for linkage to human exceptional longevity identifies a locus on chromosome 4. Proc Natl Acad Sci U S A 98, 10505-10508.
Pushkarev, D., Neff, N.F., and Quake, S.R. (2009). Single-molecule sequencing of an individual human genome. Nat Biotechnol 27, 847-852.
Rodwell, G.E., Sonu, R., Zahn, J.M., Lund, J., Wilhelmy, J., Wang, L., Xiao, W., Mindrinos, M., Crane, E., Segal, E., et al. (2004). A transcriptional profile of aging in the human kidney. PLoS Biol 2, e427.
97
Rogina, B., and Helfand, S.L. (2004). Sir2 mediates longevity in the fly through a pathway related to calorie restriction. Proc Natl Acad Sci U S A 101, 15998-16003.
Rosenberg, N.A. (2004). DISTRUCT: a program for the graphical display of population structure. Molecular Ecology Notes 4, 137-138.
Rowe, J.W., Andres, R., and Tobin, J.D. (1976a). Letter: Age-adjusted standards for creatinine clearance. Ann Intern Med 84, 567-569.
Rowe, J.W., Andres, R., Tobin, J.D., Norris, A.H., and Shock, N.W. (1976b). The effect of age on creatinine clearance in men: a cross-sectional and longitudinal study. J Gerontol 31, 155-163.
Schachter, F., Faure-Delanef, L., Guenot, F., Rouger, H., Froguel, P., Lesueur-Ginot, L., and Cohen, D. (1994). Genetic associations with human longevity at the APOE and ACE loci. Nat Genet 6, 29-32.
Schadt, E.E. (2009). Molecular networks as sensors and drivers of common human diseases. Nature 461, 218-223.
Schadt, E.E., Molony, C., Chudin, E., Hao, K., Yang, X., Lum, P.Y., Kasarskis, A., Zhang, B., Wang, S., Suver, C., et al. (2008). Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6, e107.
Schmidt, R.J., Beierwaltes, W.H., and Baylis, C. (2001). Effects of aging and alterations in dietary sodium intake on total nitric oxide production. Am J Kidney Dis 37, 900-908.
Serre, D., Gurd, S., Ge, B., Sladek, R., Sinnett, D., Harmsen, E., Bibikova, M., Chudin, E., Barker, D.L., Dickinson, T., et al. (2008). Differential allelic expression in the human genome: a robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genet 4, e1000006.
Silva, F.G. (2005a). The aging kidney: a review -- part I. Int Urol Nephrol 37, 185-205.
Silva, F.G. (2005b). The aging kidney: a review--part II. Int Urol Nephrol 37, 419-432.
98
Southworth, L.K., Owen, A.B., and Kim, S.K. (2009). Aging mice show a decreasing correlation of gene expression within genetic modules. PLoS Genet 5, e1000776.
Spielman, R.S., Bastone, L.A., Burdick, J.T., Morley, M., Ewens, W.J., and Cheung, V.G. (2007). Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39, 226-231.
Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., Lyle, R., Hunt, S., Kahl, B., Antonarakis, S.E., Tavare, S., et al. (2005). Genome-Wide Associations of Gene Expression Variation in Humans. PLoS Genet 1, e78.
Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., et al. (2007). Population genomics of human gene expression. Nat Genet 39, 1217-1224.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550.
Suh, Y., Atzmon, G., Cho, M.O., Hwang, D., Liu, B., Leahy, D.J., Barzilai, N., and Cohen, P. (2008). Functionally significant insulin-like growth factor I receptor mutations in centenarians. Proc Natl Acad Sci U S A 105, 3438-3442.
Takeuchi, F., McGinnis, R., Bourgeois, S., Barnes, C., Eriksson, N., Soranzo, N., Whittaker, P., Ranganath, V., Kumanduri, V., McLaren, W., et al. (2009). A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet 5, e1000433.
Tang, H., Quertermous, T., Rodriguez, B., Kardia, S.L., Zhu, X., Brown, A., Pankow, J.S., Province, M.A., Hunt, S.C., Boerwinkle, E., et al. (2005). Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet 76, 268-275.
Tatar, M., Kopelman, A., Epstein, D., Tu, M.P., Yin, C.M., and Garofalo, R.S. (2001). A mutant Drosophila insulin receptor homolog that extends life-span and impairs neuroendocrine function. Science 292, 107-110.
99
Tissenbaum, H.A., and Guarente, L. (2001). Increased dosage of a sir-2 gene extends lifespan in Caenorhabditis elegans. Nature 410, 227-230.
Van den Veyver, I.B., Norman, B., Tran, C.Q., Bourjac, J., and Slim, R. (2001). The human homologue (PEG3) of the mouse paternally expressed gene 3 (Peg3) is maternally imprinted but not mutated in women with familial recurrent hydatidiform molar pregnancies. J Soc Gynecol Investig 8, 305-313.
Veyrieras, J.B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., and Pritchard, J.K. (2008). High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4, e1000214.
Wang, J., Wang, W., Li, R., Li, Y., Tian, G., Goodman, L., Fan, W., Zhang, J., Li, J., Guo, Y., et al. (2008). The diploid genome sequence of an Asian individual. Nature 456, 60-65.
Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57-63.
Webster, J.A., Gibbs, J.R., Clarke, J., Ray, M., Zhang, W., Holmans, P., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M., et al. (2009). Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet 84, 445-458.
Weedon, M.N., Lango, H., Lindgren, C.M., Wallace, C., Evans, D.M., Mangino, M., Freathy, R.M., Perry, J.R., Stevens, S., Hall, A.S., et al. (2008). Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 40, 575-583.
Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y.J., Makhijani, V., Roth, G.T., et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872-876.
Wheeler, H.E., Metter, E.J., Tanaka, T., Absher, D., Higgins, J., Zahn, J.M., Wilhelmy, J., Davis, R.W., Singleton, A., Myers, R.M., et al. (2009). Sequential use of transcriptional profiling, expression quantitative trait mapping, and gene association implicates MMP20 in human kidney aging. PLoS Genet 5, e1000685.
100
Willcox, B.J., Donlon, T.A., He, Q., Chen, R., Grove, J.S., Yano, K., Masaki, K.H., Willcox, D.C., Rodriguez, B., and Curb, J.D. (2008). FOXO3A genotype is strongly associated with human longevity. Proc Natl Acad Sci U S A 105, 13987-13992.
Williams, G.C. (1957). Pleiotropy, Natural-Selection, and the Evolution of Senescence. Evolution 11, 398-411.
Woessner, J.F., Jr. (1991). Matrix metalloproteinases and their inhibitors in connective tissue remodeling. FASEB J 5, 2145-2154.
WTCCC (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-678.
Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B., and Kinzler, K.W. (2002). Allelic variation in human gene expression. Science 297, 1143.
Zahn, J.M., Sonu, R., Vogel, H., Crane, E., Mazan-Mamczarz, K., Rabkin, R., Davis, R.W., Becker, K.G., Owen, A.B., and Kim, S.K. (2006). Transcriptional profiling of aging in human muscle reveals a common aging signature. PLoS Genet 2, e115.
Zhong, S., Li, C., and Wong, W.H. (2003). ChipInfo: Software for extracting gene annotation and gene ontology information for microarray analysis. Nucleic Acids Res 31, 3483-3486.
Zhu, X., Feng, T., Li, Y., Lu, Q., and Elston, R.C. (2009). Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol.