range-wide studies of genetic and transcriptomic …
TRANSCRIPT
RANGE-WIDE STUDIES OF GENETIC AND TRANSCRIPTOMIC DIVERSITY IN
SUNFLOWER
by
EDWARD VINCENT MCASSEY IV
Under the Direction of John M. Burke
ABSTRACT
With the pressures of climate change and human population growth affecting agricultural
land use, it is important to consider the use of wild adaptations to maintain and increase where
crops can be grown. The study of the genetic basis of local adaptation has accelerated with ease
of collecting sequence data from multiple individuals from across the range of a species. Here, I
present phenotypic, genotypic, and expression based studies of variation in wild sunflowers
across a latitudinal gradient in North America. I found that flowering time and saturated fatty
acid percentage of seeds were differentiated across the range, with northern populations both
flowering earlier and having less saturated fatty acids in their seeds. In order to understand the
genetic basis of these traits I genotyped individuals with two different marker technologies, a
SNP chip and Genotyping-by-Sequencing, in order to identify regions of the genome that were
exceptionally differentiated when comparing northern and southern populations. An analysis of
population genetic variation revealed a number of candidate regions for local adaptation
including multiple members of the flowering time pathway. To complement the study of
population genetic variation as it relates to local adaptation, I performed RNA-sequencing to
identify genes that may be influencing the differences in fatty acid saturation in wild sunflower.
When comparing expression levels of developing northern and southern wild sunflower seeds, I
found a number of differentially expressed genes, some of which were annotated as part of the
fatty acid biosynthesis pathway. Taken together, the genetic differentiation outliers and
differentially expressed genes represent excellent candidates for follow up experiments.
Importantly, by mapping these variants against the sunflower genome, I was able to further
prioritize candidates by assessing whether or not they co-localizing with important QTL. Future
work will focus on establishing the extent of linkage disequilibrium in these genomic intervals to
clarify the individual role of these putative adaptive variants.
INDEX WORDS: Gene expression, latitudinal variation, local adaptation, population
genetics, sunflower
RANGE-WIDE STUDIES OF GENETIC AND TRANSCRIPTOMIC DIVERSITY IN
SUNFLOWER
by
EDWARD VINCENT MCASSEY IV
BA, Vanderbilt University, 2010
A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial
Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
ATHENS, GEORGIA
2015
© 2015
EDWARD VINCENT MCASSEY IV
All Rights Reserved
RANGE-WIDE STUDIES OF GENETIC AND TRANSCRIPTOMIC DIVERSITY IN
SUNFLOWER
by
EDWARD VINCENT MCASSEY IV
Major Professor: John M. Burke
Committee: Mike Arnold
Katrien Devos
Jim Leebens-Mack
CJ Tsai
Electronic Version Approved:
Suzanne Barbour
Dean of the Graduate School
The University of Georgia
December 2015
iv
DEDICATION
I first dedicate this work to my family; my parents, Ed and Linda, provided me with a
loving home, excellent education and constant support. My sisters, Danielle, Jackie, and
Kathleen, as well as my grandparents Ed and Anne, have supported me along the way.
Additionally, I would like to dedicate this work to my wife, Karolina Heyduk, who has helped
and supported me for the past 3.5 years.
This dissertation is finally dedicated to the memory of Dr. Dave McCauley, who first
stirred my interest in studying genetic diversity when he allowed me to join his lab as an
undergrad at Vanderbilt University. I am eternally grateful for his support and advice that led me
to pursue a career in biology.
v
ACKNOWLEDGEMENTS
Many people have made this work possible. First, John Burke has provided me with
numerous opportunities to grow as a confident and independent scientist. His input has helped
frame and steer my research in new and exciting directions. The advisement I’ve received from
my committee members - Mike Arnold, Katrien Devos, Jim Leebens-Mack, and CJ Tsai - has
been crucial in helping me refine my understanding and analysis of key questions in plant
genomics. Burke lab members past and present including Adam Bewick, John Bowers, Jo Corbi,
Caitlin Ishibashi, Jen Mandel, Rishi Masalia, Savithri Nambeesan, Stephanie Pearl and Evan
Staton have all played a role by helping with analyses, discussing research strategies, and
providing valuable feedback. The University of Georgia has provided numerous resources that
assisted this work including greenhouse facilities provided by the Crop and Soil Science and
Plant Biology Departments, computing resources from the Georgia Advanced Computing
Resource Center, and sequencing from the Georgia Genomics Facility.
vi
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS .............................................................................................................v
LIST OF TABLES ....................................................................................................................... viii
LIST OF FIGURES .........................................................................................................................x
CHAPTER
I INTRODUCTION AND LITERATURE REVIEW .....................................................1
References ..............................................................................................................11
II RANGE-WIDE PHENOTYPIC AND GENETIC DIFFERENTIATION IN WILD
SUNFLOWER .............................................................................................................15
Abstract ..................................................................................................................16
Introduction ............................................................................................................17
Materials and Methods ...........................................................................................19
Results ....................................................................................................................23
Discussion ..............................................................................................................27
References ..............................................................................................................32
III GENOMIC PATTERNS OF SNP VARIATION AND THE GENETIC BASIS OF
LOCAL ADAPTATION IN WILD SUNFLOWER ...................................................44
Abstract ..................................................................................................................45
Introduction ............................................................................................................46
Materials and Methods ...........................................................................................49
vii
Results and Discussion ..........................................................................................54
References ..............................................................................................................61
IV TRANSCRIPTOMIC ANALYSIS OF DEVELOPING SEEDS ACROSS THE
RANGE OF WILD SUNFLOWER .............................................................................69
Abstract ..................................................................................................................70
Introduction ............................................................................................................71
Materials and Methods ...........................................................................................74
Results ....................................................................................................................79
Discussion ..............................................................................................................81
References ..............................................................................................................87
V CONCLUSIONS..........................................................................................................97
References ............................................................................................................102
APPENDICES
A Supporting information for chapter II ........................................................................104
B Supporting information for chapter III ......................................................................110
C Supporting information for chapter IV ......................................................................115
viii
LIST OF TABLES
Page
Table 2.1: Range-wide population sampling information .............................................................38
Table 2.2: Population genetic statistics for 15 wild sunflower populations ..................................39
Table 2.3: Summary of candidates for genes involved in local adaptation ...................................40
Table 3.1: Levels of population genetic diversity in three populations across a latitudinal gradient
in North America ...............................................................................................................64
Table 3.2: Pairwise population structure among three wild sunflower populations ......................65
Table 3.3: Co-localization of FST outliers with QTL .....................................................................66
Table 4.1: Original sample locations and associated sampling depth ...........................................91
Table 4.2: Differentially expressed fatty acid isoforms .................................................................92
Table 4.3: Fatty acid QTL co-localization with differentially expressed isoforms .......................93
Table 4.4: Gene ontology enrichment terms ..................................................................................94
Supplemental table 2.1: SNP genotypes for 286 individuals at 246 loci .....................................104
Supplemental table 2.2: Raw trait values for 286 common garden grown individuals ...............104
Supplemental table 2.3: Results of REML analysis of phenotype data .......................................104
Supplemental table 4.1: Top ten most differentially expressed nuclear isoforms with higher
expression in Canada ......................................................................................................115
Supplemental table 4.2: Top ten most differentially expressed nuclear isoforms with higher
expression in Texas .......................................................................................................116
ix
Supplemental table 4.3: Auto-correlation analysis of physical genome position and expression
similarity .........................................................................................................................117
x
LIST OF FIGURES
Page
Figure 2.1: Map of the locations of the 15 populations used in this study in the central USA and
Canada found in Table 1. ...................................................................................................42
Figure 2.2: STRUCTURE bar plot of full dataset .........................................................................43
Figure 3.1: Original USDA sampling location for the three populations genotyped in this
study…. ..............................................................................................................................67
Figure 3.2: Bar plot indicating the proportion of membership to three genetic clusters as
identified in the program STRUCTURE ...........................................................................68
Figure 4.1: Frequency histogram of sequencing effort across 20 RNA-seq libraries in wild
sunflower............................................................................................................................95
Figure 4.2: Plot of log fold change in gene expression between Texas and Canada across the
sunflower genome ..............................................................................................................96
Supplemental figure 2.1: Delta K plot of STRUCTURE results .................................................104
Supplemental figure 2.2: STRUCTURE bar plot of southern regions ........................................105
Supplemental figure 2.3: Delta K plot for southern STRUCTURE plot found in Supplemental
figure 2.2 ..........................................................................................................................106
Supplemental figure 2.4: STRUCTURE bar plot of northern regions ........................................107
Supplemental figure 2.5: Delta K plot for northern STRUCTURE plot found in Supplemental
figure 2.4 ..........................................................................................................................108
xi
Supplemental figure 2.6: STRUCTURE bar plot corresponding to K = 6 for the six populations
with the southern two regions ..........................................................................................109
Supplemental figure 3.1: Sequencing effort per library...............................................................110
Supplemental figure 3.2: SNP density per chromosome .............................................................111
Supplemental figure 3.3: STRUCTURE delta K plot ..................................................................112
Supplemental figure 3.4: STRUCTURE bar plot based on previous SNP genotyping ...............113
Supplemental figure 3.5: STRUCTURE delta K plot for the three populations genotyped with a
SNP chip. .........................................................................................................................114
Supplemental figure 4.1: Differentially expressed fatty acid isoforms 1 ....................................118
Supplemental figure 4.2: Differentially expressed fatty acid isoforms 2 ....................................120
Supplemental figure 4.3: Differentially expressed fatty acid isoforms 3 ....................................122
Supplemental figure 4.4: Differentially expressed fatty acid isoforms 4 ....................................124
Supplemental figure 4.5: Differentially expressed fatty acid isoforms 5 ....................................126
1
CHAPTER I
INTRODUCTION AND LITERATURE REVIEW
The diversity of morphological and physiological phenotypes seen when comparing
populations of the same species may be the result of adaptation to different environments.
Understanding the process of local adaptation is paramount in evolutionary biology as it is a
stepping stone towards the genesis of biodiversity over long timescales. The first half of the 20th
century provided numerous insights into the adaptive role of intraspecific diversity in numerous
taxa (Clausen et al., 1941; Hiesey et al., 1942; Clausen et al., 1947). However, the lack of genetic
tools precluded the understanding of what genetic changes were associated with the pattern of
adaptation. Present day genomic technologies, when paired with prior evidence of a trait’s role in
adaptation, have allowed for an unprecedented look at the genomes and transcriptomes of wild
individuals in order to document molecular patterns consistent with adaptation.
Historical study of local adaptation
Local adaptation represents the situation in which a particular population has higher
fitness in the environment which it is found in as opposed to other environments occupied by
different populations. To study this phenomenon, ecologists typically perform reciprocal
transplants where they grow members of a population in both their ‘home’ environment where
the population is naturally found and an ‘away’ environment where a different population is
normally located (Bradshaw 1960; Warwick and Briggs 1980; Waser and Price 1985; Wang et
al., 1997; Kawecki and Ebert 2004; Bennington et al., 2012). Each reciprocal transplant is
actually a series of common gardens whereby environmental factors are constant so that all
2
differences found in phenotypes can be solely attributed to differences in the underlying genetics.
Some of the most convincing evidence of local adaptation comes from the earliest experiments
done on the subject matter by Turesson in the 1920’s and Clausen, Keck and Hiesey in the
1930’s through the 1950’s (Turesson 1925; Clausen et al., 1941; Hiesey et al., 1942; Clausen et
al., 1947). These classic experiments combined common gardens and reciprocal transplants with
numerous taxa and quantitative measurements of fitness to determine whether or not local
populations were indeed adapted to their particular home environment. The motivation of these
particular experiments began with the observation of intraspecific diversity occurring in nature.
In particular, early researchers noticed that certain types of phenotypic traits tended to associate
with environmental and/or geographic factors.
For Clausen, Keck and Hiesey, the usage of an elevational gradient proved extremely
beneficial in the dissection of adaptation. Differences in altitude play a unique role in the study
of adaptation because they allow the comparison of plant populations that exist at the same
latitude but who differ in elevation (and its associated environmental changes). Clausen, Keck
and Hiesey, and others since then (Angert and Schemske 2005; Byars et al., 2007; Gonzalo-
Turpin and Hazard 2009), leveraged this experimental set up to study adaptation by establishing
common gardens across three different elevations. At each elevation, Clausen, Keck and Hiesey
planted out populations of California flora collected from each elevation and eventually
measured traits such as flowering time. By comparing the populations within a particular
common garden as well as the same populations grown at multiple elevations the authors were
able to assess extent to which traits were plastic (large environmental component) versus genetic.
Taken together, the work of these authors indicated that intraspecific variation needed to be
3
studied at a more quantitatively rigorous level and that much of the intraspecific variation had a
genetic component. The main goal of common garden studies is to deduce whether trait variation
has a genetic component. Determining the genetic differences that control trait variation became
an increasingly popular avenue of research as the 20th
century progressed. While common garden
studies of trait variation do not necessary speak to whether a trait is adaptive, they do indicate
which traits are most divergent, and therefore serve to prioritize traits for additional
investigations.
The advent of new genetic marker systems, coupled with traditional genetic mapping and
population genetic approaches, finally allowed researchers to get closer to establishing the
connection between genotype and phenotype. Population genetic analyses using allozymes
represent the earliest attempt to connect allele frequency variation among populations with
phenotypic variation seen in common garden grown plants. As allozyme work began to take off
in the late 60’s and early 70’s, the lab of Robert Allard at the University of California at Davis
began a classic set of population genetic studies that sought to establish a connection between
allelic variation and adaptation to soil water status in wild oat, Avena barbata. Briefly, the work
of Allard and colleagues showed that certain allelic combinations from a variety of allozyme loci
were found in a higher frequency of individuals residing in more mesic environments in
California (Clegg and Allard 1972; Hamrick and Allard 1972; Hamrick and Holden 1979). This
particular observation represented a purported example of a co-adapted gene complex (i.e., a set
of interacting loci that are selectively advantageous when particular alleles are found in the same
individual [Allard et al., 1972]). Focusing on population genetic differentiation and tying it to
divergence in environments highlighted the possibility that populations may be under natural
selection.
4
Range-wide studies of genetic diversity
Assessing the amount of genetic diversity in a species is crucial for several reasons. In the
context of the maintenance of biodiversity it is important that organisms with valuable ecological
roles have sufficient genetic variation to adapt to a changing environment. For plant species, this
is an extremely important avenue of research because these organisms cannot migrate over the
course of their life to track their ideal climate. It should be noted that human mediated
transplantation is an option to mitigate the sessile nature of plants, but it will require substantial
resources (Vitt et al., 2010). Instead, plants must rely on seed dispersal to colonize new habitats.
Assuming that transplantation and seed dispersal are relatively unlikely sources of climate
tracking, the only other option for plants is to adapt to changing biotic and abiotic forces.
Range-wide studies are important because they establish the levels of genetic diversity
that exist within populations, which is in turn required for local adaptation (Li and Adams 1989;
King et al., 2001; Spinks and Shaffer 2005; Bryja et al., 2010). Local adaptation across a range
can be explored by using latitudinal or longitudinal transects, both of which will correlate to a
number of biotic and abiotic factors that can drive local adaptation. For example, a latitudinal
transect across a species’ range correlates to differences in photoperiod, growing season,
maximum and minimum temperatures, as well as the frequencies of biotic and/or abiotic stresses
such as drought, nutrient deficit, herbivory, and pathogen load. Additionally, longitude tends to
correlate with precipitation in conjunction with the presence of a mountain range. The above
mentioned climatic factors are often implicated in driving adaptive differentiation (Franks et al.,
2007; Hancock et al., 2011; Colautti and Barrett 2013). However, to understand if the genetic
5
variation seen across a transect is related to local adaptation, it requires an understanding of the
levels of genetic diversity across all populations.
Range-wide studies of genetic variation in the progenitor species of crops is extremely
important for crop improvement. When genetic variation for traits has been limited in crops, the
main avenues for crop improvement become either using wild diversity (Tanksley and McCouch
1997), mutagenized populations (Wulff et al., 2004) or performing transgenic engineering
(Koziel et al., 1993). Since wild progenitor species have to constantly be adapting to a changing
world over evolutionary timescales they generally contain greater genetic and phenotypic
diversity than related crops (Burke et al., 2007). This diversity can be in the form of disease
resistance, drought resistance or yield improvement factors among many others. Thus, assessing
the level of diversity in natural populations can catalog potential variation that breeders may
require in the future. Unfortunately, there are drawbacks associated with the usage of wild
germplasm for crop improvement. When wild species are crossed to move an interesting trait
into a crop, large chromosomal regions containing many genes other than the target gene of
interest get incorporated as well, a process known as linkage drag. The genes found in this
linkage block often contain alleles that while presumably useful in the wild actually negatively
affect crop production. This necessitates a more nuanced understanding of the genomic intervals
controlling adaptive traits.
Leveraging genomics for answering questions at the level of natural populations
To identify the genetic signature of local adaptation it is essential to genotype many loci
throughout the genome in many individuals. The early work of Allard and colleagues sought to
6
address questions of adaptation only using 5 allozyme loci (Hamrick and Allard 1972). Over the
following decades amplified fragment length polymorphisms (AFLPs), simple sequence repeats
(SSRs), and SNP chips were all employed to achieve the same general goal: genotype
individuals at more loci without a corresponding increase in cost and labor. The general quick
decay of linkage disequilibrium in natural populations necessitates a large number of loci in
order to fully cover the genome. In order to bypass some of the negative aspects associated with
other marker systems (time, requirement of a priori sequence information, presence-absence
marker types), the most recent advance has been restriction site associated DNA sequencing
(RAD-seq, Baird et al., 2008) and Genotyping-by-Sequencing (GBS, Elshire et al., 2011). These
techniques leverage the fact that many restriction sites within the genome will be conserved
between individuals of the same species. Assuming that restriction sites are conserved, the
adjacent sequences can then be obtained via high throughput sequencing and can be compared to
determine whether or not polymorphisms exist. The advantages of this technique is that it is
highly customizable due to the ability to choose restriction enzymes with recognition sites of
different lengths and nucleotide composition. Additionally, the usage of methylation sensitive
restriction enzymes allows for the preferential sequencing of hypomethylated genic regions. If
we assume genic regions of the genome house local adaptation polymorphisms, GBS based
interrogations are well suited to provide genome-wide markers that will be useful for studies of
adaptation.
Population genetic theory provides the framework that allows researchers to determine
which of the thousands of GBS markers that have been genotyped harbor a signature of
selection. Population genetic structure describes the extent to which genetic variation is found
7
among populations as opposed to within populations or within individuals. A typical measure of
population genetic structure is FST and its many derivatives. When standardized, this metric
ranges from 0 to 1 with higher values indicating more differentiation among populations and
lower values indicating that populations are relatively similar in genetic composition. Theory
suggests that demographic factors like population size, migration and recent bottlenecks affect
the genetic structure of all loci in a similar fashion. On the other hand, in the case of local
adaptation where alternative alleles are selected for by different environments, typically only the
genomic region around the causal variant has elevated population structure. Therefore, if one can
genotype a polymorphism in each independent region of the genome, it should be possible to
establish selected regions (that is, regions of high FST) from background ‘noise’.
Population genomic datasets have allowed for entirely new scales of evolutionary
analyses. First, in species with an annotated genome sequence, it is possible to anchor loci to the
genome in order to establish their physical location, allowing for GBS datasets to be integrated
with quantitative trait locus (QTL) mapping datasets (Hohenlohe et al., 2010). Assessing whether
or not loci with exceptional population structure co-localize with known QTL for putatively
adaptive traits provides insight into the likelihood of a candidate gene for underlying adaptation
in a particular system. Furthermore, the list of candidate adaptive genes can be curated by
determining if the highly structured polymorphism is in a candidate pathway in a model species
like Arabidopsis thaliana.
8
The importance of gene expression in evolutionary biology
Mutations in DNA provide the heritable genetic variation that natural selection acts upon.
As described above, these mutations (or mutations genetically linked to an adaptive mutation)
carry a signature of selection in the form of FST. Many studies have identified adaptation
occurring via structural mutations to protein coding sequence (Smith and Eyre-Walker 2002). If
one allele underwent a non-synonymous mutation increasing the protein’s ability to perform
some cellular process, it may be selected for in a particular environment and thus become known
as a local adaptation gene. A general question remains though: are mutations that alter a protein’s
functional ability the major avenue for adaptation at the molecular level?
The amount that a gene gets transcribed and the subsequent transcript gets translated
provides an alternative possibility for adaptation at the molecular level. In other words, in the
absence of any genetic polymorphisms in the coding sequence of a gene, the difference in gene
expression when comparing populations could explain an adaptive trait. Since adaptive traits are
by definition heritable, there must be genetic variation somewhere in the genome to explain the
difference in gene expression that may be occurring in a set of populations. Mutations in cis
regulatory promoter sequence could explain differences in gene expression among populations.
Alternatively, mutations in trans could occur in the form of a coding sequence mutation in an
important upstream transcription factor. Recently, more work has been done in non-model
systems on trying to determine the extent to which this genetic variation for gene expression is
correlated with adaptation to a specific environment (Schoville et al., 2012). As with GBS, the
quantity of data associated with high throughput sequencing of RNA (RNA-seq) both eliminated
the need for any a priori sequence information and at the same time provided a means for
conducting tests of differential expression across many genes. In addition to quantitatively
9
establishing gene expression levels, RNA-seq data provides nucleotide information that can be
mined for SNPs (Ellison et al., 2011).
Utility of the genus Helianthus for answering questions in evolutionary biology
Helianthus has played both a major historical and a contemporary role in the
advancement of evolutionary biology. This North American genus has species that vary for a
number of traits including ploidy, mating system, habitat, and range size (Reviewed in Kane et
al., 2013). It is this tremendous amount of variation that allows this genus to be a source of
studies of adaptation. For example, the work of Loren Rieseberg and colleagues has utilized
Helianthus to study how homoploid hybridization between H. annuus and H. petiolaris resulted
in a set of three species (H. anomalus, H. deserticola, and H. paradoxus) adapted to very
different conditions (Rieseberg et al., 2003). The evolutionary biology work in this genus has
been greatly aided by the resources associated with having a cultivated congener in the group.
Furthermore, the large natural range of common sunflower, H. annuus, suggests there has been
ample opportunity for populations to differentially adapt to a number of environments.
H. annuus was cultivated approximately 4,000 years ago in eastern North America
(Crites 1993). After subsequent improvement in both Europe and North America, this species
has become a rich source of both oil and confectionary seeds for human consumption. In order to
study the process of domestication and improvement to this crop, there have been a number of
QTL mapping populations developed over the years from a wide variety of crosses between
wild, cultivated, and landrace individuals (Burke et al., 2002; Burke et al., 2005; Wills and Burke
2007). Surveys of variation in crop and wild genomes have allowed for the identification of
selected regions during sunflower domestication (Mandel et al., 2014; Baute et al., 2015) as well
10
as the assessment of mutational patterns associated with the domestication bottleneck (Renaut
and Rieseberg, 2015). Genome level data has recently been used in the genus to address
questions of adaptation in ecotypes of H. petiolaris (Andrew and Rieseberg, 2013) and H.
annuus (Moyers and Rieseberg, 2013) as well as speciation in sister species pairs of Helianthus
(Renaut et al., 2014).
Purpose of study
This study characterizes the range-wide phenotypic and genetic diversity of H. annuus.
Before undertaking any study of adaptation it is essential to identify phenotypic differences
among populations and have an understanding of the level of genetic structuring in populations
of interest. Here I first establish baseline levels of genetic diversity and structure. In tandem, I
phenotype populations from across the range for traits such as flowering time, plant height,
branching, as well as seed physical dimensions, oil quantity, and fatty acid content. I use high
throughput sequencing to perform a genome-scan for high levels of genetic structure in a series
of three populations that span the latitudinal range of this species. By combining population
genetic evidence with independent QTL data I was able to further refine a list of putative
adaptive genes. Finally, I use RNA-seq to uncover differentially expressed genes that correlate
with the adaptive phenotype of fatty acid profile in wild sunflower seeds. Taken together, these
studies represent a multi-faceted view of adaptation in wild sunflower.
11
References
Allard RW, Babbel, G.R., Clegg, M.T., Kahler, A.L. (1972) Evidence for coadaptation in Avena
barbata. Proc Natl Acad Sci U S A 69: 3043-3048.
Andrew RL, Rieseberg LH (2013) Divergence is focused on few genomic regions early in
speciation: incipient speciation of sunflower ecotypes. Evolution 67: 2468-2482.
Angert AL, Schemske DW (2005) The evolution of species' distributions: reciprocal transplants
across the elevation ranges of Mimulus cardinalis and M. lewisii. Evolution 59: 1671-
1684.
Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, et al. (2008) Rapid SNP discovery and
genetic mapping using sequenced RAD markers. PLoS One 3: e3376.
Baute GJ, Kane NC, Grassa CJ, Lai Z, Rieseberg LH (2015) Genome scans reveal candidate
domestication and improvement genes in cultivated sunflower, as well as post-
domestication introgression with wild relatives. New Phytol 206: 830-838.
Bennington CC, Fetcher N, Vavrek MC, Shaver GR, Cummings KJ, et al. (2012) Home site
advantage in two long-lived arctic plant species: results from two 30-year reciprocal
transplant studies. Journal of Ecology 100: 841-851.
Bradshaw AD (1960) Population differentiation in Agrostis tenuis Sibth. III. Populations in
varied environments. The New Phytologist 59: 92-103.
Bryja J, Smith C, Konecny A, Reichard M (2010) Range-wide population genetic structure of the
European bitterling (Rhodeus amarus) based on microsatellite and mitochondrial DNA
analysis. Mol Ecol 19: 4708-4722.
Burke JM, Burger JC, Chapman MA (2007) Crop evolution: from genetics to genomics. Curr
Opin Genet Dev 17: 525-532.
Burke JM, Knapp SJ, Rieseberg LH (2005) Genetic consequences of selection during the
evolution of cultivated sunflower. Genetics 171: 1933-1940.
Burke JM, Tang S, Knapp SJ, Rieseberg LH (2002) Genetic analysis of sunflower domestication.
Genetics 161: 1257-1267.
Byars SG, Papst W, Hoffmann AA (2007) Local adaptation and cogradient selection in the
alpine plant, Poa hiemata, along a narrow altitudinal gradient. Evolution 61: 2925-2941.
Clausen J, Keck, D.D., Hiesey, W.M. (1941) Regional differentiation in plant species. The
American Naturalist 75: 231-250.
12
Clausen J, Keck, D.D., Hiesey, W.M. (1947) Heredity of geographically and ecologically
isolated races. The American Naturalist 81: 114-133.
Clegg MT, Allard, R.W. (1972) Patterns of genetic differentiation in the slender wild oat species
Avena barbata. Proc Nat Acad Sci USA 69: 1820-1824.
Colautti RI, Barrett, S.C.H. (2013) Rapid adaptation to climate facilitates range expansion of an
invasive plant. Science 342: 364-366.
Crites GD (1993) Domesticated sunflower in 5th millennium Bp temporal context - New
evidence from Middle Tennessee. American Antiquity 58: 146-148.
Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, et al. (2011) Population genomics and local
adaptation in wild isolates of a model microbial eukaryote. Proc Natl Acad Sci U S A
108: 2831-2836.
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust, simple
genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:
e19379.
Franks SJ, Sim S, Weis AE (2007) Rapid evolution of flowering time by an annual plant in
response to a climate fluctuation. Proc Natl Acad Sci U S A 104: 1278-1282.
Gonzalo-Turpin H, Hazard L (2009) Local adaptation occurs along altitudinal gradient despite
the existence of gene flow in the alpine plant species Festuca eskia. Journal of Ecology
97: 742-751.
Hamrick JL, Allard, R.W. (1972) Microgeographical variation in allozyme frequencies in Avena
barbata. Proc Natl Acad Sci U S A 69: 2100-2104.
Hamrick JL, Holden, L.R. (1979) Influence of mircohabitat heterogeneity on gene frequency
distribution and gametic phase disequilibrium in Avena barbata. Evolution 33: 521-533.
Hancock AM, Brachi, B., Faure, N., Horton, M.W., Jarymowycz, L.B., Sperone, F.G.,
Toomajian, C., Roux, F., Bergelson, J. (2011) Adaptation to climate across the
Arabidopsis thaliana genome. Science 334: 83-86.
Hiesey WM, Clausen, J., Keck, D.D. (1942) Relations between climate and intraspecific
variation in plants. The American Naturalist 76: 5-22.
Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, et al. (2010) Population genomics
of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet
6: e1000862.
Kane NC, Burke JM, Marek L, Seiler G, Vear F, et al. (2013) Sunflower genetic, genomic and
ecological resources. Mol Ecol Resour 13: 10-20.
13
Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters 7: 1225-
1241.
King TL, Kalinowski ST, Schill WB, Spidle AP, Lubinski BA (2001) Population structure of
Atlantic salmon (Salmo salar L.): a range-wide perspective from microsatellite DNA
variation. Mol Ecol 10: 807-821.
Mandel JR, McAssey EV, Nambeesan S, Garcia-Navarro E, Burke JM (2014) Molecular
evolution of candidate genes for crop-related traits in sunflower (Helianthus annuus L.).
PLoS One 9: e99620.
Moyers BT, Rieseberg LH (2013) Divergence in gene expression is uncoupled from divergence
in coding sequence in a secondarily woody sunflower. International Journal of Plant
Sciences 174: 1079-1089.
Renaut S, Owens GL, Rieseberg LH (2014) Shared selective pressure and local genomic
landscape lead to repeatable patterns of genomic divergence in sunflowers. Mol Ecol 23:
311-324.
Renaut S, Rieseberg LH (2015) The accumulation of deleterious mutations as a consequence of
domestication and improvement in sunflowers and other Compositae crops. Mol Biol
Evol 32: 2273-2283.
Rieseberg LH, Raymond O, Rosenthal DM, Lai Z, Livingstone K, et al. (2003) Major ecological
transitions in wild sunflowers facilitated by hybridization. Science 301: 1211-1216.
Schoville SD, Barreto FS, Moy GW, Wolff A, Burton RS (2012) Investigating the molecular
basis of local adaptation to thermal stress: population differences in gene expression
across the transcriptome of the copepod Tigriopus californicus. BMC Evol Biol 12: 170.
Smith NG, Eyre-Walker A (2002) Adaptive protein evolution in Drosophila. Nature 415: 1022-
1024.
Spinks PQ, Shaffer HB (2005) Range-wide molecular analysis of the western pond turtle (Emys
marmorata): cryptic variation, isolation by distance, and their conservation implications.
Mol Ecol 14: 2047-2064.
Tanksley SD, McCouch SR (1997) Seed banks and molecular maps: unlocking genetic potential
from the wild. Science 277: 1063-1066.
Turesson G (1925) The plant species in relation to habitat and climate. Hereditas 6: 147-236.
Vitt P, Havens K, Kramer AT, Sollenberger D, Yates E (2010) Assisted migration of plants:
Changes in latitudes, changes in attitudes. Biological Conservation 143: 18-27.
14
Wang H, McArthur, E.D., Sanderson, S.C., Graham, J.H., Freeman, D.C. (1997) Narrow hybrid
zone between two subspecies of big sagebrush (Artemisia tridentata: Asteraceae). IV.
Reciprocal transplant experiments. Evolution 51: 95-102.
Warwick SI, Briggs, D. (1980) The genecology of lawn weeds. V. The adaptive significance of
different growth habitat in lawn and roadside populations of Plantago major L. The New
Phytologist 85: 289-300.
Waser NM, Price, M.V. (1985) Reciprocal transplant experiments with Delphinium nelsonii
(Ranunculacea): Evidence for local adaptation. American Journal of Botany 72: 1726-
1732.
Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domestication of
sunflower. Genetics 176: 2589-2599.
Wulff BBH, Thomas, C.M., Parniske, M., Jones, J.D.G. (2004) Genetic variation at the tomato
Cf-4/Cf-9 locus induced by EMS mutagenesis and intralocus recombination. Genetics
167: 459-470.
15
CHAPTER II
RANGE-WIDE PHENOTYPIC AND GENETIC DIFFERENTIATION IN WILD
SUNFLOWER1
1 McAssey, E.V., Corbi, J., Blackman, B.K., Burke, J.M. To be submitted to BMC Plant Biology
16
Abstract
Divergent phenotypes and genotypes are key signals for identifying the targets of natural
selection in locally adapted populations. Here, we use a combination of common garden
phenotyping for a variety of growth, plant architecture, and seed traits and SNP genotyping to
characterize range-wide patterns of diversity in 15 populations of wild sunflower (Helianthus
annuus L.) sampled along a latitudinal gradient in central North America. We analyzed
geographic patterns of phenotypic diversity, quantified levels of within-population SNP
diversity, and also determined the extent of population structure across the range of this species.
We then used these data to identify significantly over-differentiated loci as markers of genes or
genomic regions conferring local adaptation. Traits including flowering time, plant height, and
seed oil composition (i.e., percentage of saturated fatty acids) were found to correlate with
latitude, and thus differentiated northern vs. southern populations. Average pairwise FST was
found to be 0.21, and a STRUCTURE analysis identified two significant clusters that largely
separated northern and southern individuals. The significant FST outliers included a SNP in FT2,
a flowering time gene that has been previously shown to co-localize with flowering time QTL,
and exhibits a known cline in gene expression.
17
Introduction
Local adaptation, wherein populations have higher fitness in their ‘home’ environments
than in non-native locales, is a topic of great interest in the field of evolutionary biology (e.g.,
Kawecki and Ebert 2004). The genetic basis of such adaptive divergence has not, however, been
elucidated in the vast majority of non-model organisms. For plants, the selective pressures
leading to local adaptation can include a variety of abiotic and biotic factors such as: soil type
(Sambatti and Rice 2006; Turner et al., 2008; Turner et al., 2010), water availability (Knight et
al., 2006), photoperiod (Riihimäki and Savolainen 2004), temperature (Arnone and Körner
1997), herbivores (Sork et al. 1993), mycorrhizal associations (Johnson et al. 2010), and
proximity to agricultural fields (Mercer et al. 2006). Because these selective pressures are
expected to produce characteristic patterns of genetic variation in and near genes conferring
adaptive differences, population genetic approaches have the potential to provide insight into the
genes, or at least genomic regions, responsible for producing locally adapted traits across the
range of a species.
In the case of divergent selection, which would be expected to play an important role in
the production of locally adapted populations, the focus is typically on measures of population
genetic differentiation. More specifically, divergent selective pressures would be expected to
produce elevated population structure in the vicinity of targeted genes relative to the genome-
wide average (e.g., Lewontin and Krakauer 1973; Beaumont and Nichols 1996; Excoffier and
Lischer 2010; Foll and Gaggiotti 2008). In contrast, balancing selection (or a range-wide
selective sweep) would be expected to result in much lower levels of population genetic
differentiation (Polly et al. 2003; Cagliani et al. 2008). When combined with high-throughput
genotyping approaches, such population genetic approaches have been used to identify genes
18
thought to be involved in adaptation in a variety of species, including boreal black spruce
(Prunier et al. 2011), Atlantic cod (Nielsen et al. 2009), prairie-chickens (Bollmer et al. 2011),
and moor frogs (Richter-Boix et al. 2011).
In addition to overall levels of population differentiation, clinal patterns of genetic
variation can also be indicative of local adaptation (e.g., Coop et al. 2010; Kooyers and Olsen,
2012). A variety of environmental variables typically vary across the ranges of species, and thus
there may be selection for different phenotypic values at the extremes of a species’ range. While
allele frequencies at many loci might exhibit weak correlations across a given environmental
contrast due to the joint effects of genetic drift and gene flow, alleles at loci that play an
important role in local adaptation should clearly correlate with relevant environmental variables
(Coop et al., 2010). For example, putative adaptive clines in allele frequency have been
identified in Arabidopsis thaliana for the flowering time genes FRIGIDA (Stinchcombe et al.
2004) and PHYTOCROME C (Balasubramanian et al. 2006), in Populus tremula for the
flowering time gene PHYTOCHROME B2 (Ingvarsson et al. 2006), in Drosophila melanogaster
for the insulin-signaling gene INSULIN-LIKE RECEPTOR (Paaby et al. 2010), and in
Peromyscus polionotus for the coat color gene AGOUTI (Mullen and Hoekstra 2008). While the
above studies have provided tremendous insight into the genetic basis of local adaptation, studies
of non-model organisms will help to broaden our understanding of this fundamental evolutionary
process. In the present paper, we report on range-wide patterns of phenotypic and genetic
diversity in common sunflower, Helianthus annuus.
Sunflower is a member of the Compositae (a.k.a., the Asteraceae), which is one of the
largest and most diverse families of flowering plants. The native range of common sunflower
spans much of North America, and wild populations occur in habitats that are characterized by
19
variation in a wide range of environmental variables, including: photoperiod, growing season,
minimum/maximum temperatures, and precipitation. Common sunflower is also the wild
progenitor of cultivated sunflower (also H. annuus), which is native to east-central North
America (Crites 1993; Harter 2004; Blackman et al. 2011b) and is one of the world’s most
important oilseed crops (Schneiter 1997).
Here, we describe patterns of phenotypic and genetic diversity within and among 15 wild
sunflower populations across a latitudinal gradient in central North America. We grew and
phenotyped individuals from these populations in a common greenhouse environment and
genotyped them using a SNP array targeting 384 loci distributed throughout the sunflower
genome. We used these data to investigate geographic patterns of phenotypic differentiation,
describe overall patterns of population genetic variation, and identify loci that harbor the
population genetic signature of local adaptation. We also placed our population genetic results in
the context of prior QTL mapping studies in sunflower to determine whether highly
differentiated loci co-localize with known QTL regions.
Materials and Methods
Plant materials and phenotypic analyses
Seeds from 15 wild-collected populations of H. annuus were obtained from the USDA’s
North Central Regional Plant Introduction Station (Ames, IA). These populations, which were
sampled from a range of latitudes across central North America (Figure 2.1; Table 2.1), were
selected to represent truly wild populations that appear to be free from the effects of past
introgression with cultivated sunflower (L. Marek and G. Seiler, personal communication). Prior
to germination, all seeds were cleaned with 3% hydrogen peroxide, rinsed with deionized water,
20
and placed on moist filter paper in a petri dish. To break dormancy, petri dishes were placed at 4
C in a dark cold room for 14 days. After the cold treatment, they were moved into a growth room
where they were maintained under 16 hour days at 23 C. Following germination, seedlings were
planted in soil trays. Once established, these seedlings were transplanted into soil pots (900
Classic, Nursery Supplies Inc, Kissimmee, FL) and moved to the greenhouse, where
supplemental lighting provided a consistent cycle of 16 hour days and 8 hour nights.
Plants were arranged in the greenhouse in four blocks, each of which contained five
individuals from each of the 15 populations (75 individuals total per block). All plants were
phenotyped for a variety of traits, including: days to four pairs of true leaves, days to flowering,
plant height at senescence, branching architecture, seed size, and seed oil content/composition.
Because wild sunflower is self-incompatible, manual crosses were performed to produce seeds.
This involved intercrossing individuals within populations (i.e., bulked pollen collected from
individuals within a population was used to pollinate individuals within that population), with
inflorescences being bagged to prevent cross-contamination. Seeds were then collected at
physiological maturity and phenotyped. Oil traits were assessed following established protocols
(Burke et al. 2005). Briefly, percent oil content was determined via pulsed nuclear magnetic
resonance (NMR) analyses using a Bruker MQ20 Minispec NMR analyzer (Billerica, MA) that
had been calibrated with known standards. Fatty acid composition was determined by gas
chromatography (Hewlett-Packard, Palo Alto, CA) with known fatty acid standards (Nu-Check
Prep, Elysian, MN).
All traits were tested for deviations from normality by determining whether a frequency
histogram of trait values across all 286 full grown individuals was significantly different from a
normal distribution with the Shapiro-Wilk test in JMP 11 (SAS Institute, Cary, NC) and trait
21
values were transformed using a Box-Cox transformation (Box and Cox 1964) as necessary.
Restricted maximum likelihood was used with region as a fixed effect with blocks, and a block-
by-region interaction as a random effect to test for regional differences in trait values while
accounting for variation amongst blocks. For fatty acid traits, the date of fatty acid extraction
was used as a blocking factor instead of greenhouse block because an inspection of the raw data
indicated clear variation in extraction efficiency across days. Least squares means were
compared amongst regions using Tukey’s test.
DNA extractions and SNP genotyping
Leaf tissue was harvested from 286 of the 300 (Table 2.1) individuals described above
and DNA was extracted using the Qiagen DNeasy Plant Mini Kit (Valencia, CA). All DNA
samples were quantified using a NanoDrop (Thermo Scientific, Wilmington, DE) and diluted to
50 ng/µl prior to genotyping. Each sample was then genotyped using a GoldenGate assay
(Illumina, San Diego, CA) targeting 384 SNPs selected from the larger collection of sunflower
SNPs described by Bachlava et al. (2012). These loci were chosen to provide even coverage of
the 17 sunflower linkage groups (LGs), with an average of one SNP every 3.5 cM. Genotype
calls were made using Illumina’s GenomeStudio (ver. 2011.1) followed by manual inspection.
Loci that exhibited aberrant hybridization signals (perhaps due to presence/absence variation or
the occurrence of duplicate genes), an overall lack of polymorphism (i.e., minor allele frequency
< 0.05), and/or large amounts of missing data (i.e., fraction of missing data > 0.05) were
removed prior to population genetic analysis. A total of 246 loci (average = 14.5 per LG; range =
11-20 per LG) were retained for further analysis.
22
Population genetic analyses
Measures of genetic diversity, including the percentage of polymorphic loci and observed
(Ho) and Nei’s unbiased expected heterozygosity (UHe; Nei 1978) were calculated at the
population level using GenAlEx (version 6.501; Peakall and Smouse 2006). We also used
GenAlEx to investigate genetic differentiation amongst populations by performing an analysis of
molecular variance (AMOVA) with 999 permutations to determine the level of population
structure in our dataset. Finally, the program STRUCTURE (Pritchard et al., 2000) was used to
investigate population genetic structure across the species range. Specifically, STRUCTURE was
run from K = 1 to 17 population genetic clusters with a burn-in of 100,000 and 1,000,000
MCMC iterations (with 20 replicates for each K value). Results were imported into
STRUCTURE Harvester (Earl and von Holdt 2012) where the most likely value of K was
determined using the deltaK method (Evanno et al. 2005). STRUCTURE, was additionally used
to test individual subsets of the data to investigate finer levels of genetic structure.
The potential role of local selective pressures in shaping diversity at individual loci was
investigated using multiple approaches. First, we used Arlequin to calculate 20,000 simulations
in order to obtain a null distribution for FST, which was then used to develop a 99% confidence
interval for outlier identification (Version 3.5.1.2; Excoffier and Lischer 2010). In general terms,
over-differentiated loci are regarded as candidates for local adaptation, while under-
differentiated loci are generally viewed as candidates for balancing selection (Polley et al. 2003;
Cagliani et al. 2008) or a range-wide sweep. BayeScan was also used to test for selection by
comparing the posterior probabilities of two models (selection vs. no selection) for each locus
(Foll and Gaggiotti 2008). Following Foll and Gaggiotti (Version 2.1; 2008), loci whose
posterior probability for the model including selection was greater than 0.91 were regarded as
23
being ‘strong’ FST outlier candidates. We then mined the sunflower QTL literature to identify
any QTL whose confidence interval co-localized with a putative local adaptation SNP identified
in this study. Co-localization information was obtained using previously published studies from a
variety of sunflower crosses (Burke et al. 2002; Burke et al. 2005; Wills and Burke 2007; Baack
et al. 2008; Dechaine et al. 2009).
Results
Phenotypic diversity
We identified numerous traits that exhibited differentiation amongst the five sampled
regions, with latitude being a significant factor in the partitioning of phenotypic diversity for
traits such as flowering time, plant height, branching, and a number of seed oil traits
(Supplemental Table 2.3). Individuals from the southern regions (Texas and Oklahoma, Regions
1 and 2; Supplemental Table 2.3) tended to flower later, grow taller, and have a higher
proportion of saturated fatty acids within their seeds compared to individuals from the northern
regions found in Saskatchewan, North Dakota and Montana (Regions 4 and 5; Supplemental
Table 2.3). The fatty acid composition data also showed some interesting trends, with the
saturated type (i.e., palmitic and stearic acid) showing the same sort of regional differentiation as
noted above. In contrast, the unsaturated types (i.e., oleic acid and linoleic acid) did not show
significant differences between regions. Seed oil content showed no significant differences
among regions across the entire range (Supplemental Table 2.3). Aside from the aforementioned
differentiation in saturated fatty acid percentage in seed oils, regions were significantly
differentiated for seed length with respect to latitude. While seed weight and seed width both
exhibit some regional differences, the differences were not due to latitude as the most southern
24
region was not significantly different from the most northern region for these two traits
(Supplemental Table 2.3). Notably, the latitudinal trends found in saturated fatty acid content and
flowering time are consistent with the results of previous studies (Linder 2000; Blackman et al.
2011a). There was no significant latitudinal difference in total branching, although plants from
Texas and Oklahoma (Regions 1 and 2; Supplemental Table 2.3) had significantly more top
branching compared to the three northern regions. Other plant architecture traits, such as branch
length and the extent of secondary, tertiary, or higher-order branching, were significantly
different when regions were compared, but those differences did not show a latitudinal pattern
(Supplemental Table 2.3). Interestingly, no traits exhibited significant differentiation between all
five regions (Supplemental Table 2.3).
Population genetic structure
Calculation of population genetic statistics for each of the 15 populations revealed a
substantial, albeit variable, amount of genetic diversity across the range of wild sunflower (Table
2.2). An analysis of molecular variance revealed that approximately 20% of the observed genetic
variation could be attributed to population level differentiation. Of the remaining genetic
variation, 76% was seen at the within individual levels whereas only 4% was found at the
among-individual level. A STRUCTURE (Pritchard et al. 2000) analysis of the data coupled
with the deltaK method for determining the most likely number of population genetic clusters
(Evanno et al. 2005) identified K = 2 clusters (Figure 2.2; Supplemental figure 2.1). The
STRUCTURE bar plot for K = 2 revealed a north-south divide with the Region 3 corresponding
to a transition zone (Figure 2.2). An additional STRUCTURE run containing only the
southernmost six populations also indicated the K = 2. For this level of K, TX1 was separated
25
from the remaining five populations found in Texas and Oklahoma (Supplemental figure 2.2).
When the northernmost six populations were analyzed by STRUCTURE, K = 2 was again the
most well-supported number of genetic groups. Similar to the result for the southern portion of
the range, only a single population (ND1) in the northern portion of the range was separated
from the other five populations at K = 2 (Supplemental figure 2.4).
Outlier identification
Multiple outlier identification programs highlighted the existence of an overlapping set of
loci that exhibit the signature of local adaptation (Table 2.3). Arlequin identified eight loci that
were highly differentiated in a global FST calculation (all possible pairwise FST combinations;
99% confidence intervals). These loci included: one SNP on LG4 with homology to a
hydroxyproline-rich glycoprotein family protein; two SNPs located near the distal end of LG 6,
one in FT2 (Blackman et al. 2010; Blackman et al. 2011a) and the other in a gene with homology
to a mitogen-activated protein kinase kinase kinase 14; one SNP on LG7 in a gene with high
similarity to a gene in the ARM repeat family of proteins in A. thaliana; one SNP on LG10 in the
GRAS/DELLA transcription factor GAI; two SNPs on LG 12, one corresponding to an EF-hand-
like domain-containing gene, and the other corresponding to a protein of unknown function; and
one SNP located on LG 14 in a gene with high similarity to Defective Cuticle Ridges (DCR) in A.
thaliana. BayeScan provided complementary outlier results by identifying three highly
differentiated loci (SNPs within the DCR homolog, the GRAS/DELLA transcription factor, and
the gene containing the EF-hand-like domain) already highlighted by Arlequin. Four loci had
evidence of being significantly under-differentiated from both Arlequin and BayeScan. There
were two under-differentiated loci on LG 13, including one SNP in a gene with an alpha-beta
26
plait nucleotide binding role and another SNP in a gene with homology to 5’-AMP-activated
protein kinase. SNPs in a glycoside hydrolase and a guanylate binding gene also had
exceptionally low FST, and were found on LGs 8 and 17, respectively.
Outlier co-localization with known QTL
The locations of our eight over-differentiated loci were compared to the locations of
previously mapped sunflower QTL to identify traits potentially involved in local adaptation. On
LG 4, an unannotated gene co-localized with a QTL for leaf number (Dechaine et al. 2009). As
noted above the distal end of LG 6 contains two FST outliers: FT2 and a gene with a putative
kinase function. Both of these co-localize with QTL related to flowering time in two sunflower
mapping populations (Table 2.3; ANN1238 ×CMS 89 [Burke et al. 2002] and ANN1238 ×
Hopi [Wills and Burke 2007]). This genomic region is actually known to contain multiple FT
paralogs, including FT1, which has been shown to be important with respect to cultivated
sunflower’s photoperiod response (Blackman et al. 2010; Blackman et al. 2011a). In addition to
co-localization with the flowering time QTL in this region, there are QTL for morphological
traits (e.g., achene width, plant height, number of ray flowers) and even a QTL for leaf fungal
damage. The outlier on LG 7, a SNP from an EST with homology to an ARM repeat protein, co-
localizes with QTL for flowering time, plant height, leaf number, and head herbivory, as well
(Burke et al. 2002; Dechaine et al. 2009). Interestingly, two loci with strong support from both
Arlequin and BayeScan (the GRAS/DELLA transcription factor and the DCR homolog, which
map to LGs 10 and 14, respectively), did not co-localize with any known QTL. One of the two
outliers on LG12, an unannotated gene, co-localized with leaf shape and number of heads (Burke
et al., 2002). Finally, the EF-hand-domain containing gene co-localized with a QTL for head
27
total (one way of describing the degree of branching), as well as leaf and branch traits, found on
LG 12 (Table 2.3).
Under-differentiated loci co-localized with QTL for a variety of different traits. Of
particular interest were two low FST outliers located near each other on LG 13 that co-localized
with a shared set of QTL that included: number of branches, number of heads, head and leaf
herbivory, stem diameter, achene length, leaf area, and stem height (Burke et al. 2002; Wills and
Burke 2007; Baack et al. 2008; Dechaine et al. 2009).
Discussion
Populations across the range of wild sunflower harbor an exceptional amount of
phenotypic diversity. The extent to which those traits contribute to local adaptation is an
important question that can be addressed in a number of ways including reciprocal transplants,
common garden measurements, and population genome scans. In our analyses, many traits (e.g.,
flowering time, plant height, plant architecture, and seed oil composition) were differentiated in
conjunction with latitude. As sunflower is a seed oil crop, there has been a considerable of
research done to describe and uncover the genetic mechanism behind seed oil variation. In
breeding lines, strong artificial selection has created divergent germplasm groups with vastly
different oil profiles. In the wild, natural selection may act as a strong force in affecting what
relative amounts of saturated and unsaturated fatty acids are most beneficial for populations
living in certain environments.
28
Common garden phenotypic variation
Previous studies of seed oil composition in a variety of species have revealed a negative
correlation between saturated fatty acid content and latitude and degree of saturation at a
relatively coarse geographic scale (Linder 2000). By quantifying the percentage of saturated fatty
acids across the range of sunflower, we were able to identify a similar trend (Supplemental Table
2.3), albeit at a finer geographic scale. Given that these plants were grown in a common garden,
we can infer that the observed differences have a genetic basis, and that functional
polymorphisms in the oil biosynthetic pathway exist across the range of wild sunflower. The
percentage of saturated fatty acids in seed oils is of considerable evolutionary importance with
respect to germination under different environmental conditions. Saturated fatty acids are known
to store more usable energy per carbon as compared to unsaturated fatty acids (Linder 2000), but
saturated fatty acids also have higher melting points than unsaturated fatty acids. The resulting
inference is that the production of unsaturated fatty acids in higher latitudes is advantageous
because it ensures energy availability at lower temperatures (Linder 2000). Conversely, saturated
fatty acids are better in lower latitudes because they are more energy rich while still being
available to germinating seeds due to the comparably warmers temperatures.
Observed differences in flowering time can be interpreted in a similar framework.
Growing seasons tend to be shorter in higher latitudes; thus, there is a premium on flowering
early to allow seed set before the end of the growing season. Alternatively, in lower latitudes,
there is typically a longer growing season that may select for later flowering plants that may
grow to a larger size and produce more seeds. It must, however, be noted that plant height and
flowering time are developmentally correlated; as such they form a suite of inter-related traits
(Koester et al. 1993; Bezant et al. 1996).
29
Population genetic structure
The STRUCTURE analysis of the full dataset revealed an overall north/south division in
the natural range of wild sunflower, with a transitional zone occurring in the vicinity Nebraska
and Wyoming. Previous sampling of H. annuus genetic diversity had hinted at a similar
north/south division (Mandel et al. 2011), and our analysis builds on this finding by increasing
the marker density and sampling density within each population. Historically, this latitudinal
transect has seen similar patterns of genetic differentiation. For example, using transplant
gardens, McMillian (1959) showed that multiple grassland species exhibited heritable
differences in flowering time in which northern populations flowered significantly earlier.
Candidate adaptive loci
In terms of population genetic differentiation, we identified interesting possible
candidates for conferring local adaptation with respect to flowering time. We found two outlier
loci on chromosome 6 with SNPs that co-localize with a gene with putative kinase activity and
FT2. Both loci co-localize with previously identified QTL for flowering time, (Burke et al. 2002;
Wills and Burke 2007) in addition to other traits (Table 2.3). FT2 is a gene whose Arabidopsis
homolog has been shown to play a major role in promoting flowering (Turck et al., 2008).
Moreover, the region of sunflower LG 6 where this gene resides has been previously shown to
influence flowering time in domesticated vs. wild sunflower (Burke et al., 2002; Wills and Burke
2007; Blackman et al., 2010). It should be noted that the mapping parents for these crosses
consisted of a wild × crop and wild × landrace. The extent of linkage disequilibrium (LD) of
this region is currently unknown, although previous work indicates that, on average, LD decays
quickly in wild sunflower (Liu and Burke 2006). Studies of cultivated germplasm suggest that
there is variation in LD across the sunflower genome (Mandel et al. 2013). In addition to
30
mapping information, FT2 is an exceptional candidate for local adaptation due to previous gene
expression work across the range of wild H. annuus (Blackman et al. 2011a). In short days, a
cline in gene expression was seen for FT2 in which northern individuals exhibited higher
expression than southern individuals, consistent with this gene playing a role in adaptive
differentiation. Our results add to the observation that FT2 exhibits a latitudinal cline in gene
expression that is consistent with the effects of selection by providing population genetic
evidence of selection on this gene, as well.
We uncovered SNPs with significantly elevated population differentiation values on other
chromosomes. A strongly differentiated SNP on LG 14 resides in the sunflower homolog of
Defective in Cuticle Ridges (DCR). In A. thaliana, mutants of DCR have altered trichome
development during leaf growth (Marks et al. 2009; Panikashvili et al. 2009). Trichomes serve a
multitude of functions in plants including: reflectance of sunlight to prevent damage (Manetas
2003), retention of water (Brewer et al. 1991), and defense (Levin 1973). As many of the
aforementioned factors may correlate with growing season, it is difficult to draw any conclusions
without additional data. It is impossible to suggest that the patterns identified in this research are
in any way causative in nature. Furthermore, since we lack knowledge concerning the strength of
linkage disequilibrium in this genomic region, these SNPs may simply be linked to causal
polymorphisms found in nearby genes.
These FST outliers form a list of possible candidate genes for future experiments.
Importantly, the extent of linkage disequilibrium needs to be assessed in these genomic regions
in order to determine the size of the region of elevated population structure. A possible
explanation for the absence of co-localizing QTL for some SNPs is that no wild × wild mapping
populations currently exist for sunflower. Alternatively, many subtle (trichome density or
31
morphology) and biochemical phenotypes have not been measured and thus could not have co-
localized with population differentiation. Marker density has become the main limitation in
genome scan studies of local adaptation in natural populations (Flint-Garcia et al. 2003). The
advent of high-throughput methods such as restriction site associated DNA sequencing (RAD-
seq) and genotyping by sequencing (GBS) have allowed researchers to obtain both large
numbers of markers and an even genomic distribution (Hohenlohe et al. 2010; Davey et al. 2011;
Elshire et al. 2011).
Conclusions
In this study 246 loci characterized the range wide genetic diversity and structure of the
wild progenitor of an economically important crop species. Furthermore, these markers clearly
indicated a genetic disjunction between northern and southern populations that occurs around the
400
north latitude with Nebraska populations appearing to be admixed (Figure 2.2). This study
also generated multiple candidate genomic regions for local adaptation as defined by the extent
of their population genetic differentiation. The extent to which these genomic intervals are
associated with previous trait mapping experiments is also considered. These loci represent
larger physical genomic intervals that will be the focus of future molecular evolutionary
analyses, gene expression comparisons across the range, and field studies to further examine
their putative role in local adaptation.
Acknowledgements
We thank Scott Jackson’s laboratory in the Institute of Plant Breeding Genetics and
Genomics at the University of Georgia for greenhouse space and access to lab equipment. We
32
thank members of the Burke lab for comments on an earlier version of this manuscript. Special
thanks to Caitlin Ishibashi and Jeff Roeder for assisting with the DNA extractions and to
Shannon Ritter, Michael Cherry, and Shreyas Vangala for assistance in phenotyping. This
research was supported by grants from the NSF Plant Genome Research Program (DBI-0820451
and DBI-1444522).
References
Arnone, J.A. and Korner, C. (1997) Temperature adaptation and acclimation potential of leaf
dark respiration in two species of Ranunculus from worm and cold habitats. Arctic Alpine
Res, 29, 122-125.
Baack, E.J., Sapir, Y, Chapman, M.A., Burke, J.M. and Rieseberg, L.H. (2008) Selection on
domestication traits and quantitative trait loci in crop-wild sunflower hybrids. Molecular
Ecology, 17, 666-677.
Bachlava, E., Taylor, C.A., Tang, S., Bowers, J.E., Mandel, J.R., Burke, J.M. and Knapp, S.J.
(2012) SNP discovery and development of a high-density genotyping array for sunflower.
PLoS One, 7, e29814.
Balasubramanian, S., Sureshkumar, S., Agrawal, M., Michael, T.P., Wessinger, C., Maloof, J.N.,
Clark, R., Warthmann, N., Chory, J. and Weigel, D. (2006) The PHYTOCHROME C
photoreceptor gene mediates natural variation in flowering and growth responses of
Arabidopsis thaliana. Nat Genet, 38, 711-715.
Beaumont, M.A. and Nichols, R.A. (1996) Evaluating loci for use in the genetic analysis of
population structure. P Roy Soc B-Biol Sci, 263, 1619-1626.
Bezant, J., Laurie, D., Pratchett, N., Chojecki, J. and Kearsey, M. (1996) Marker regression
mapping of QTL controlling flowering time and plant height in a spring barely (Hordeum
vulgare L.) cross. Heredity, 77, 64-73.
Blackman, B.K., Michaels, S.D. and Rieseberg, L.H. (2011a) Connecting the sun to flowering in
sunflower adaptation. Molecular Ecology, 20, 3503-3512.
Blackman, B.K., Scascitelli, M., Kane, N.C., Luton, H.H., Rasmussen, D.A., Bye, R.A., Lentz,
D.L. and Rieseberg, L.H. (2011b) Sunflower domestication alleles support single
domestication center in eastern North America. P Natl Acad Sci USA, 108, 14360-14365.
33
Blackman, B.K., Strasburg, J.L., Raduski, A.R., Michaels, S.D. and Rieseberg, L.H. (2010) The
Role of Recently Derived FT Paralogs in Sunflower Domestication. Current Biology, 20,
629-635.
Bollmer, J.L., Ruder, E.A., Johnson, J.A., Eimes, J.A. and Dunn, P.O. (2011) Drift and selection
influence geographic variation at immune loci of prairie-chickens. Molecular Ecology,
20, 4695-4706.
Box, G.E.P. and Cox, D.R. (1964) An Analysis of Transformations. J Roy Stat Soc B, 26, 211-
252.
Brewer, C.A., Smith, W.K. and Vogelmann, T.C. (1991) Functional interaction between leaf
trichomes, leaf wettability and the optical properies of water droplets. Plant, Cell &
Environment, 14, 955-962.
Burke, J.M., Tang, S., Knapp, S.J. and Rieseberg, L.H. (2002) Genetic analysis of sunflower
domestication. Genetics, 161, 1257-1267.
Burke, J.M., Knapp, S.J. and Rieseberg, L.H. (2005) Genetic consequences of selection during
the evolution of cultivated sunflower. Genetics, 171, 1933-1940.
Cagliani, R., Fumagalli, M., Riva, S., Pozzoli, U., Comi, G.P., Menozzi, G., Bresolin, N. and
Sironi, M. (2008) The signature of long-standing balancing selection at the human
defensin beta-1 promoter. Genome Biol, 9.
Coop, G., Witonsky, D., Di Rienzo, A. and Pritchard, J.K. (2010) Using Environmental
Correlations to Identify Loci Underlying Local Adaptation. Genetics, 185, 1411-1423.
Crites, G.D. (1993) Domesticated Sunflower in 5th Millennium Bp Temporal Context - New
Evidence from Middle Tennessee. Am Antiquity, 58, 146-148.
Davey, J.W., Hohenlohe, P.A., Etter, P.D., Boone, J.Q., Catchen, J.M. and Blaxter, M.L. (2011)
Genome-wide genetic marker discovery and genotyping using next-generation
sequencing. Nat Rev Genet, 12, 499-510.
Dechaine, J.M., Burger, J.C., Chapman, M.A., Seiler, G.J., Brunick, R., Knapp, S.J. and Burke,
J.M. (2009) Fitness effects and genetic architecture of plant-herbivore interactions in
sunflower crop-wild hybrids. New Phytologist, 184, 828-841.
Earl, D.A. and Vonholdt, B.M. (2012) STUCTURE HARVESTER: a website and program for
visualizing STRUCTURE output and implementing the Evanno method. Conservation
Genetics Resources, 4, 359-361.
Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S. and Mitchell,
S.E. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high
diversity species. PLoS One, 6, e19379.
34
Excoffier, L. and Lischer, H.E. (2010) Arlequin suite ver 3.5: a new series of programs to
perform population genetics analyses under Linux and Windows. Mol Ecol Resour, 10,
564-567.
Evanno, G., Regnaut, S. and Goudet, J. (2005) Detecting the number of clusters of individuals
using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611-
2620.
Flint-Garcia, S.A., Thornsberry, J.M. and Buckler, E.S. (2003) Structure of linkage
disequilibrium in plants. Annu Rev Plant Biol, 54, 357-374.
Foll, M. and Gaggiotti, O. (2008) A Genome-Scan Method to Identify Selected Loci Appropriate
for Both Dominant and Codominant Markers: A Bayesian Perspective. Genetics, 180,
977-993.
Harter, A.V., Gardner, K.A., Falush, D., Lentz, D.L., Bye, R.A. and Rieseberg, L.H. (2004)
Origin of extant domesticated sunflowers in eastern North America. Nature, 430, 201-
205.
Hohenlohe, P.A., Bassham, S., Etter, P.D., Stiffler, N., Johnson, E.A. and Cresko, W.A. (2010)
Population genomics of parallel adaptation in threespine stickleback using sequenced
RAD tags. Plos Genet, 6, e1000862.
Ingvarsson, P.K., Garcia, M.V., Hall, D., Luquez, V. and Jansson, S. (2006) Clinal variation in
phyB2, a candidate gene for day-length-induced growth cessation and bud set, across a
latitudinal gradient in European aspen (Populus tremula). Genetics, 172, 1845-1853.
Johnson, N.C., Wilson, G.W.T., Bowker, M.A., Wilson, J.A. and Miller, R.M. (2010) Resource
limitation is a driver of local adaptation in mycorrhizal symbioses. P Natl Acad Sci USA,
107, 2093-2098.
Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters, 7, 1225-
1241.
Knight, C.A., Vogel, H., Kroymann, J., Shumate, A., Witsenboer, H. and Mitchell-Olds, T.
(2006) Expression profiling and local adaptation of Boechera holboellii populations for
water use efficiency across a naturally occurring water stress gradient. Mol Ecol, 15,
1229-1237.
Koester, R.P., Sisco, P.H. and Stuber, C.W. (1993) Indentification of quantitative trait loci
controlling days to flowering and plant height in two near isogenic lines of maize. Crop
Sci, 33, 1209-1216.
Kooyers, N.J., and Olsen, K.M. (2012) Rapid evolution of an adaptive cyanogenesis cline in
35
introduced North American white clover (Trifolium repens L.). Molecular Ecology, 21,
2455-2468.
Levin, D.A. (1973) The Role of Trichomes in Plant Defense. The Quarterly Review of Biology,
48, 3-15.
Lewontin, R.C. and Krakauer, J. (1973) Distribution of Gene Frequency as a Test of Theory of
Selective Neutrality of Polymorphisms. Genetics, 74, 175-195.
Linder, C.R. (2000) Adaptive evolution of seed oils in plants: Accounting for the biogeographic
distribution of saturated and unsaturated fatty acids in seed oils. American Naturalist,
156, 442-458.
Liu, A. and Burke, J.M. (2006) Patterns of nucleotide diversity in wild and cultivated sunflower.
Genetics, 173, 321-330.
Mandel J.R., Dechaine J.M., Marek L.F, and Burke J.M. (2011) Genetic diversity and population
structure in cultivated sunflower and a comparison to its wild progenitor, Helianthus
annuus L. Theor Appl Genet, 123, 693-704.
Mandel JR, Nambeesan S, Bowers JE, Marek LF, Ebert D, et al. (2013) Association mapping
and the genomic consequences of selection in sunflower. PLoS Genet, 9, e1003378.
Manetas, Y. (2003) The importance of being hairy: the adverse effects of hair removal on stem
photosynthesis of Verbascum speciosum are due to solar UV-B radiation. New Phytol,
158, 503-508.
Marks, M.D., Wenger, J.P., Gilding, E., Jilk, R. and Dixon, R.A. (2009) Transcriptome analysis
of Arabidopsis wild-type and gl3-sst sim trichomes identifies four additional genes
required for trichome development. Mol Plant, 2, 803-822.
McMillan, C. (1959) The Role of Ecotypic Variation in the Distribution of the Central Grassland
of North America. Ecol Monogr, 29, 285-308.
Mercer, K.L., Wyse, D.L. and Shaw, R.G. (2006) Effects of competition on the fitness of wild
and crop-wild hybrid sunflower from a diversity of wild populations and crop lines.
Evolution, 60, 2044-2055.
Mullen, L.M. and Hoekstra, H.E. (2008) Natural selection along an environmental gradient: A
classic cline in mouse pigmentation. Evolution, 62, 1555-1569.
Nei, M. (1978) Estimation of average heterozygosity and genetic distance from a small number
of individuals. Genetics, 89, 583-590.
Nielsen, E.E., Hemmer-Hansen, J., Poulsen, N.A., Loeschcke, V., Moen, T., Johansen, T.,
Mittelholzer, C., Taranger, G.L., Ogden, R. and Carvalho, G.R. (2009) Genomic
36
signatures of local directional selection in a high gene flow marine organism; the Atlantic
cod (Gadus morhua). BMC Evolutionary Biology, 9.
Paaby, A.B., Blacket, M.J., Hoffmann, A.A. and Schmidt, P.S. (2010) Identification of a
candidate adaptive polymorphism for Drosophila life history by parallel independent
clines on two continents. Molecular Ecology, 19, 760-774.
Panikashvili, D., Shi, J.X., Schreiber, L. and Aharoni, A. (2009) The Arabidopsis DCR encoding
a soluble BAHD acyltransferase is required for cutin polyester formation and seed
hydration properties. Plant Physiol, 151, 1773-1789.
Peakall, R. and Smouse, P.E. (2006) GENALEX 6: genetic analysis in Excel. Population genetic
software for teaching and research. Mol Ecol Notes, 6, 288-295.
Polley, S.D., Chokejindachai, W. and Conway, D.J. (2003) Allele frequency-based analyses
robustly map sequence sites under balancing selection in a malaria vaccine candidate
antigen. Genetics, 165, 555-561.
Pritchard, J.K., Stephens, M. and Donnelly, P. (2000) Inference of population structure using
multilocus genotype data. Genetics, 155, 945-959.
Prunier, J., Laroche, J., Beaulieu, J. and Bousquet, J. (2011) Scanning the genome for gene SNPs
related to climate adaptation and estimating selection at the molecular level in boreal
black spruce. Molecular Ecology, 20, 1702-1716.
Richter-Boix, A., Quintela, M., Segelbacher, G. and Laurila, A. (2011) Genetic analysis of
differentiation among breeding ponds reveals a candidate gene for local adaptation in
Rana arvalis. Molecular Ecology, 20, 1582-1600.
Riihimaki, M. and Savolainen, O. (2004) Environmental and genetic effects on flowering
differences between northern and southern populations of Arabidopsis lyrata
(Brassicaceae). Am J Bot, 91, 1036-1045.
Sambatti, J.B. and Rice, K.J. (2006) Local adaptation, patterns of selection, and gene flow in the
Californian serpentine sunflower (Helianthus exilis). Evolution, 60, 696-710.
Schneiter, A.A., American Society of Agronomy., Crop Science Society of America. and Soil
Science Society of America. (1997) Sunflower technology and production. American
Society of Agronomy : Crop Science Society of America : Soil Science Society of
America, Madison, Wis.
Sork, V.L., Stowe, K.A. and Hochwender, C. (1993) Evidence for Local Adaptation in Closely
Adjacent Subpopulations of Northern Red Oak (Quercus rubra L) Expressed as
Resistance to Leaf Herbivores. American Naturalist, 142, 928-936.
Stinchcombe, J.R., Weinig, C., Ungerer, M., Olsen, K.M., Mays, C., Halldorsdottir, S.S.,
37
Purugganan, M.D. and Schmitt, J. (2004) A latitudinal cline in flowering time in
Arabidopsis thaliana modulated by the flowering time gene FRIGIDA. P Natl Acad Sci
USA, 101, 4712-4717.
Turck, F., Fornara, F. and Coupland, G. (2008) Regulation and identity of florigen:
FLOWERING LOCUS T moves center stage. Annu Rev Plant Biol, 59, 573-594.
Turner, T.L., Bourne, E.C., Von Wettberg, E.J., Hu, T.T. and Nuzhdin, S.V. (2010) Population
resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nat
Genet, 42, 260-263.
Turner, T.L., von Wettberg, E.J. and Nuzhdin, S.V. (2008) Genomic analysis of differentiation
between soil types reveals candidate genes for local adaptation in Arabidopsis lyrata.
PLoS One, 3, e3183.
Wills, D.M. and Burke, J.M. (2007) Quantitative trait locus analysis of the early domestication of
sunflower. Genetics, 176, 2589-2599.
38
TABLES
Table 2.1 – Range-wide population sampling information.
Population State / Province Country Sample Size Latitude Longitude TX1 Texas USA 20 31.041 -104.821 TX2 Texas USA 20 31.189 -103.578 TX3 Texas USA 20 31.206 -102.635 TX4 Texas USA 20 35.190 -102.010 TX5 Texas USA 20 35.199 -100.799 OK1 Oklahoma USA 12 35.262 -99.669 NE1 Nebraska USA 20 41.063 -98.091 NE2 Nebraska USA 20 41.211 -101.649 WY1 Wyoming USA 20 41.418 -104.098 MT1 Montana USA 20 46.585 -108.592 MT2 Montana USA 20 46.795 -105.302 ND1 North Dakota USA 20 46.879 -102.789 SAS1 Saskatchewan Canada 19 50.048 -104.707 SAS2 Saskatchewan Canada 15 50.394 -108.480 SAS3 Saskatchewan Canada 20 50.660 -105.665
39
Table 2.2 – Population genetic statistics for 15 wild sunflower populations
Population Naa Ne
b Ho
c uHe
d FIS P
e
TX1 Mean 1.85 1.41 0.26 0.25 -0.03 0.85
SE 0.02 0.02 0.01 0.01 0.02
TX2 Mean 1.80 1.46 0.26 0.28 0.04 0.80
SE 0.03 0.02 0.01 0.01 0.02
TX3 Mean 1.84 1.49 0.28 0.30 0.05 0.84
SE 0.02 0.02 0.01 0.01 0.02
TX4 Mean 1.86 1.49 0.28 0.30 0.03 0.86
SE 0.02 0.02 0.01 0.01 0.02
TX5 Mean 1.91 1.49 0.26 0.30 0.10 0.91
SE 0.02 0.02 0.01 0.01 0.02
OK1 Mean 1.80 1.46 0.26 0.28 0.05 0.80
SE 0.03 0.02 0.01 0.01 0.02
NE1 Mean 1.91 1.51 0.30 0.31 0.00 0.91
SE 0.02 0.02 0.01 0.01 0.02
NE2 Mean 1.69 1.42 0.24 0.25 0.03 0.69
SE 0.03 0.02 0.01 0.01 0.02
WY1 Mean 1.57 1.36 0.22 0.21 -0.05 0.57
SE 0.03 0.03 0.02 0.01 0.02
MT1 Mean 1.89 1.46 0.30 0.28 -0.08 0.89
SE 0.02 0.02 0.01 0.01 0.02
MT2 Mean 1.86 1.48 0.28 0.29 -0.01 0.86
SE 0.02 0.02 0.01 0.01 0.02
ND1 Mean 1.59 1.36 0.24 0.22 -0.12 0.59
SE 0.03 0.02 0.02 0.01 0.02
SAS1 Mean 1.89 1.50 0.27 0.30 0.08 0.89
SE 0.02 0.02 0.01 0.01 0.02
SAS2 Mean 1.82 1.47 0.28 0.29 0.02 0.82
SE 0.02 0.02 0.01 0.01 0.02
SAS3 Mean 1.85 1.48 0.27 0.29 0.04 0.85
SE 0.02 0.02 0.01 0.01 0.02
a Number of alleles per locus;
b Effective number of alleles per locus;
c Observed heterozygosity;
d Unbiased expected heterozygosity;
e Percent polymorphic loci
40
Table 2.3 – Summary of candidates for genes involved in local adaptation. All of these loci had exceptionally high levels of FST as 1
determined by Arlequin and/or BayeScan and were cross-referenced against QTL information to determine the extent of QTL co-2
localization. 3
Gene name FST LG cM Position Arlequin a
Bayescan a
QTL b
Mitogen activated protein kinase kinase kinase 14 0.38 6 53.72 2/5 0/5
A, B, C, D, E, F, G, H,
N, O, X, Y, Z, AA
DCR 0.38 14 67.27 2/5 5/5 None
No annotated hit heliagene or NCBI 0.40 12 65.62 1/5 0/5 I, L, Y
FT2 0.47 6 65.9 2/5 0/5
E, F, H, M, N, O, T, U,
V, W
Armarillo type fold 0.36 7 19.29 1/5 0/5 D, E, J, L, P
No annotated hit heliagene or NCBI 0.37 4 73.86 1/5 0/5 J
GRAS / DELLA transcription factor 0.41 10 66.87 5/5 5/5 None
EF-hand-like domain 0.44 12 44.1 5/5 5/5 J, K, I, Q, R, S, Y
4
a Fraction represents the number of times a particular locus was detected as an FST outlier 5
b Letters represent co-localizing QTL. Key: A – Leaf shape (Baack et al. 2008), B – Number of ray flowers (Burke et al. 2002), C – 6
41
Disc diameter (Wills and Burke 2007), D – Height (Burke et al. 2002), E – Days to flower (Burke et al. 2002), F – Leaf fungal damage 7
(Dechaine et al. 2009), G – Achene width (Burke et al. 2002), H – Days to flower (Wills and Burke 2007), I – Leaf shape (Burke et al. 8
2002), J – Leaf number (Dechaine et al. 2009), K – Head total (Dechaine et al. 2009), L – Number of heads (Burke et al. 2002), M – 9
Seed total (Dechaine et al. 2009), N – Leaf herbivory (Dechaine et al. 2009), O – Head clipping weevil (Dechaine et al. 2009), P – 10
Head herbivory (Dechaine et al. 2009), Q – Branch number (Dechaine et al. 2009), R – Stem diameter (Dechaine et al. 2009), S – Leaf 11
shape (Dechaine et al. 2009), T – Days to flower (Baack et al. 2008), U – Height (Baack et al. 2008), V – Leaf number (Baack et al. 12
2008), W – Leaf moisture content (Baack et al. 2008), X – Height (Wills and Burke 2007), Y – Heads per branch (Burke et al. 2002), 13
Z – Stem diameter (Burke et al. 2002), AA – Achene weight (Burke et al. 2002) 14
42
Figure 2.1 – Map of the locations of the 15 populations used in this study in the central USA and
Canada.
TX1
TX4
NE1NE2
MT1 MT2 ND1
SAS2 SAS3
TX3
OK1
SAS1
TX2
TX5
WY1
0 500 1000 1500 km
-140 -120 -100 -80 -60
20
30
40
50
60
43
Figure 2.2 – STRUCTURE bar plot of full dataset. Populations correspond to those in Table 2.1.
TX1 TX2 TX3 TX4 TX5 OK1 NE1 NE2 WY1 MT1 MT2 ND1 SAS1 SAS2 SAS3
44
CHAPTER III
GENOMIC PATTERNS OF SNP DIVERSITY AND THE GENETIC BASIS OF LOCAL
ADAPTATION IN WILD SUNFLOWER2
2 McAssey, E.V., Burke, J.M. to be submitted to Journal of Heredity
45
Abstract
Examining genetic diversity across populations from multiple latitudes presents an ideal
situation to understand the genetic basis of adaption to divergent climates. In North America,
wild sunflower populations have a broad distribution, and thus individual populations have
experienced drastically different environments in low latitudes compared to high latitudes. In
sunflower, previous work has shown that flowering time and saturated fatty acid content in seeds
are especially differentiated with respect to latitude. In order to understand the genetic changes
associated with adaptation to climate regime, I took a population genomic approach whereby I
used Genotyping-by-Sequencing (GBS) to genotype individuals from a wild population in Texas,
Nebraska, and Canada. Using loci with low levels a missing data, I performed a STRUCTURE
analysis that successfully identified the presence of our three original populations indicating
clear population structure. Using over 10,000 single nucleotide polymorphisms (SNPs) I was
able to scan the genome for highly differentiated markers when comparing populations. A
number of SNPs in the flowering time pathway were found to be highly differentiated including
sunflower homologs of: ANAC52, ESD7, and EDM2. In addition to being highly structured in
our dataset, mutants of two of the above genes (ANAC52 and ESD7) have been shown in
Arabidopsis thaliana to affect the expression of FT, whose sunflower homologs have been the
focus of investigations of both domestication and local adaptation. An analysis of the genetic
location of all highly structured polymorphisms revealed numerous instances of co-localizing
with flowering time and fatty acid QTL including ANAC52. Future investigations will require a
clear determination of the extent of linkage disequilibrium in these candidate genomic regions in
order to assess the width of the selected genomic intervals.
46
Introduction
Investigating patterns of variation across the range of a species is a useful approach for
determining the genetic basis of local adaptation. Most species, especially those with large
geographic ranges, contain populations that exist in a wide variety of habitats. In many cases,
these populations exhibit trait differences that allow them to thrive in their local environments
(i.e., local adaptation [Kawecki and Ebert 2004]). Ecologically, these so-called local adaptations
can be assessed and confirmed via reciprocal transplant experiments in which individuals from
disparate populations are grown in home and away environments and resulting fitness levels are
compared (Hereford 2009). Additionally, local adaptation can be assessed at the genotypic level
using population genetic statistics (Savolainen et al., 2013).
The genomic study of local adaptation often begins with a baseline assessment of
population genetic structure. Population genetic structure, often assessed with metrics like FST
and its analogs, measures the proportion of the total genetic variation that can be attributed to the
level of populations (Wright 1951). Selection for locally adaptive traits results in elevated
population structure at a target locus, which indicates that populations are highly differentiated in
terms of allele frequencies at a single genomic region compared to the balance of the genome
(Beaumont and Nichols 1996). While relatively few markers are required to characterize broad-
scale patterns of gene flow and demography, many more markers are needed to identify the
genes, or at least genomic regions, responsible for local adaptation. The actual density of
markers required for detection of local adaptation depends on the extent of linkage
disequilibrium (LD, or the non-random association of alleles across loci; Flint-Garcia et al.,
47
2003). Species in which LD breaks down quickly require more genetic markers in order to
interrogate each individual genomic region.
With the development of increasingly efficient methods for DNA sequencing, the
generation of truly genome-wide collections of genetic markers has become feasible (e.g.,
Hohenlohe et al., 2010; Elshire et al., 2011; Poland et al., 2012). Once sequence data has been
obtained from multiple populations within a species, it is then possible to use such data to
investigate genome-wide patterns of population structure and identify genes or genomic regions
that have likely played a role in local adaptation. Such analyses have played a major role in
understanding adaptation in a variety of model and non-model species. For example, in
Arabidopsis lyrata population re-sequencing (i.e., the pooling of individuals within a population
before sequencing) led to the identification of candidate genes mediating adaptation to serpentine
soils (Turner et al., 2010). In Drosophila melanogaster, population genomic approaches
identified candidate genes that exhibited patterns of differentiation consistent with local
adaptation across eastern North America (Fabian et al., 2012). In stickleback fishes, QTL
mapping data and population genomic scans for differentiation were used to study adaptation to
local environments (Hohenlohe et al., 2010).
Beyond the utility of such analyses for improving our understanding of adaptive
differentiation in the wild, these sorts of studies also have important everyday applications. For
example, identifying the genetic mechanisms underlying local adaptation in the wild progenitors
of crop species can inform plant breeding efforts in the context of global climate change. That is,
if the genes/alleles underlying the ability to tolerate particular environmental stresses can be
48
identified in wild species, it may be possible to introgress them into crop species, thereby
facilitating growth in a novel climate. For example, microarray analyses (Friesen et al., 2010) as
well as whole genome re-sequencing approaches (Friesen et al., 2014) have been used to identify
candidate genes related to salinity resistance in Medicago truncatula, which may increase
production on marginal lands. Additionally, studies of genetic variation in ecotypes of Panicum
hallii have provided information on the genetic architecture of differentiation with respect to soil
water status, which may prove helpful in efforts to improve switchgrass Panicum virgatum
(Lowry et al., 2015). In a similar way, we analyze the substantial intraspecific diversity of wild
sunflower to understand the genetics of adaptation in a crop relative.
The common sunflower, Helianthus annuus, is a widespread annual species found in a
variety of habitats throughout North America and is the progenitor of cultivated sunflower. Wild
sunflower is differentiated for a variety of traits such as flowering time (Blackman et al., 2011a),
the degree of fatty acid desaturation (Linder 2000), plant height, and plant architecture (McAssey
et al., in prep). Of particular interest in the present study are flowering time and fatty acid profile
composition due to the extent of previous research on these traits with respect to adaptation. In
particular, flowering time has been the focus of a number of studies in H. annuus (Blackman et
al., 2010; Blackman et al., 2011a; Blackman et al., 2011b; Blackman et al., 2013), ranging from
its evolutionary role in domestication and adaptation, to the functional role of specific genes.
When grown in a common garden, high latitude populations flower earlier than low latitude
populations (McAssey et al., in prep), although this trait exhibits considerable phenotypic
plasticity (Blackman et al., 2011a). Presumably, the quick flowering phenotype in high latitudes
is related to the relatively short growing season placing a premium on flowering while conditions
49
are ideal for growth and obtaining pollinators. We additionally focus on the trait of seed fatty
acid profile (i.e., the relative proportion of saturated and unsaturated fatty acids) because of
previous reciprocal transplant experiments (Linder 2000) that showed higher percentages of fatty
acids to be selected for in lower latitudes. Linder (2000) reasoned that saturated fatty acids,
which have a higher melting point, were not found in high percentages in high latitude seed oils
because they would not be available at cool temperatures. In an analogous fashion, it was
reasoned that high latitude populations produced more unsaturated fatty acids because they are
liquid at the cool temperatures experienced while germinating.
In this study, we used genotyping-by-sequencing (GBS) to gather polymorphism data
from three populations of wild sunflower that span a latitudinal gradient from Saskatchewan to
Texas. These populations have been previously shown to be differentiated for a number of traits
including flowering time (McAssey et al., in prep; Blackman et al., 2011) and saturated fatty acid
percentage (McAssey et al., in prep; Linder 2000). We then used these data to investigate
genome-wide patterns of population differentiation, and to identify FST outliers that exhibit the
signature of local adaptation. We further compared the locations of these genes to those of
previously mapped QTL and made functional inferences based on similarity to get genes of
known effect from Arabidopsis.
Materials and Methods
Plant Material, Library Construction, and Sequencing
Seeds from three wild H. annuus populations across North America were obtained from
the USDA (Texas, USA [PI 664692]; Nebraska, USA [PI 586870]; Saskatchewan, Canada [PI
50
592316]; Figure 3.1). Individuals from these populations were previous genotyped at 246 SNP
loci and phenotyped for a variety of quantitative traits (McAssey et al., in prep). DNA samples
from these same individuals (extracted from fresh leaf tissue using the Qiagen DNeasy Plant
Mini kit [Valencia, CA, USA]; McAssey et al., in prep) were used in the construction of GBS
libraries. Specifically, libraries were made for 55 individuals (19 from Texas; 17 from Nebraska;
19 from Canada). Library prep followed a modified version of an existing GBS protocol (Elshire
et al., 2011; Poland et al., 2012). Briefly, 2 µg of DNA was digested at 37 C overnight by PstI
and MspI (NEB; Ipswich, MA, USA). Barcoded adaptors were then ligated onto PstI overhangs
and common adaptors to MspI overhangs by adding 2.5 µl ligase (1000U), 8 µl 10X buffer, 10 µl
barcoded adaptor (1.7 ng/ul), 4.5 µl common adaptor (694.5 ng/µl), and 15 µl ddH2O to 40 µl of
digested DNA before incubation for 4 hours at 22 C, 20 minutes at 65 C. The resulting DNA
fragments with ligated barcodes were then size-filtered using Ampure beads (Beckman Coulter,
Brea, CA, USA). A 0.4x concentration (v/v) of beads was added to each individual ligation
product in order to remove small fragments prior to PCR. This was done to reduce PCR bias due
to more efficient amplification of shorter fragments. A 50 µl polymerase chain reaction was
prepared by adding 5 µl of Ampure cleaned library to 25 µl of Phusion Taq master mix (NEB), 1
µl (12.5 µM) each of forward and reverse primers and 18 µl H2O. The reaction was first heated
to 95 C for 30 seconds, before undergoing 16 cycles of: 95 C for 30 seconds; 65 C for 20
seconds; 68 C for 25 seconds. The PCR finished with 5 minutes at 72 C. Ampure beads were
used again to remove leftover PCR primers prior to quantification via Nanodrop (Thermo Fisher
Scientific, San Diego, CA, USA). Equal amounts of DNA were established by performing
quantitative PCR (qPCR) on all 55 individual libraries, and then adding equimolar amounts of
each library to form a masterpool containing DNA from all 55 individuals. DNA from the master
51
pool was then loaded onto a 1% agarose gel and run for 2 hours at 96 V prior to gel extraction.
DNA in the 400-650 bp size range was gel extracted using a DNA gel extraction kit (Zymo;
Irvine, CA, USA) and the resulting samples were quantified via qPCR using Illumina standards
(Kapa Biosystems, Inc.; Wilmington, MA, USA). Sequencing (150 bp, paired-end reads) was
then performed on an Illumina NextSeq 500 at the Georgia Genomics Facility (Athens, GA).
Data Processing
Raw reads were filtered and analyzed using Stacks (Catchen et al., 2013). Reads were
first trimmed to 115 basepairs to avoid low quality regions and then filtered for the presence of
adaptor sequence and additional regions of low quality base calls. Reads were then assembled
within each individual using ‘ustacks’ requiring four reads to form a Stack, and allowing for two
polymorphic sites within a 115 base pair Stack. A catalog integrating data from all individuals
was then created via the ‘cstacks’ module, which allowed for an additional two variant sites
when comparing different individuals. Each individual was then matched to the catalog using
‘sstacks.’ Subsequently, the error correcting module ‘rxstacks’ was used before re-doing the
‘cstacks’ and ‘sstacks’ aspects of the pipeline, as recommended by the authors. Sequence stacks,
or loci, were then mapped to the genome using Bowtie2 (Langmead and Salzberg 2012), and
only uniquely mapping loci (q>10) were retained for downstream analyses. CAP3 (Huang and
Madan 1999) was then used to further curate the data by attempting to further assemble loci.
This was done to help limit the number of loci derived from read 2 that overlapped with read 1,
as they are an extreme case of non-independence (e.g., in the extreme they could represent the
same SNP being identified as an outlier twice). To collapse loci, we required them to have an
overlap length of 20 bp and the minimum sequence identity of the overlap to be greater than
52
95%. In order to retain a dataset with a reduced amount of overlaps among loci, if loci assembled
together via CAP3, only a single locus was retained for future analyses. This conservative
approach helps limit the usage of double-counted polymorphisms in downstream analyses. The
‘populations’ module within Stacks was used for further filtering the dataset prior to analyses.
We required at least eight individuals in each of the three populations in order to calculate
genetic structure for a particular locus.
Population Genetic Analyses
Using Stacks, the number of monomorphic and polymorphic loci, and number of
polymorphic sites were obtained for all individuals. Observed and expected heterozygosities
were then calculated at each polymorphic site using ‘populations’ within Stacks. After filtering
for a minor allele frequency (MAF) > 10%, FST, FST’, Dest, ΦST and FIS were calculated for all
polymorphic sites using the ‘populations’ module within Stacks. Furthermore, for each locus, the
number of haplotypes and haplotype diversity were also calculated in the ‘populations’ module
within Stacks. Data were exported into Genepop format using Stacks then converted into
Arlequin (Version 3.5.1.2; Excoffier and Lischer 2010) format in preparation for outlier testing.
Over-differentiated loci were identified using 20,000 simulations and with 100 demes per group.
Additionally, more stringent filtering was performed in which each population was required to
have 80% data present prior to performing a STRUCTURE analysis (Pritchard et al., 2000).
STRUCTURE (version 2.3.4) was run from K = 1 to 5 genetic clusters with an initial burn-in of
100,000 MCMC iterations followed by data collection over 1,000,000 MCMC iterations. These
analyses were replicated 20 times at each K value. STRUCTURE Harvester (Earl and von Holdt
53
2012) was used to determine the most likely number of genetic clusters in the dataset using the
delta-K method (Evanno et al., 2005).
Candidate Gene Analyses
Following identification, FST outliers were characterized by first determining whether
they co-localized with any previously identified QTL as well as previously identified FST outliers
from a lower density study. Loci were mapped to genetically ordered scaffolds (Bowers et al.,
2012). The genetic location of scaffolds containing FST outlier loci was then queried against the
genetic position of known QTL for flowering time (Burke et al., 2002; Wills and Burke 2007)
and fatty acid profile (Burke et al., 2005). Additionally, the FST outliers were mapped to FASTA
files of genes and 5’ UTRs identified from the sunflower genome in order to understand whether
these loci are found in or around genes. For outliers that were within genes, we determined
whether or not the A. thaliana ortholog was implicated in either of our two focal traits: flowering
time and fatty acid biosynthesis. We did this by first identifying the gene that each outlier locus
mapped to on the sunflower genome. The top BLASTp hit for that gene when searched against
the A. thaliana proteome was then compared against the genes found in TAIR associated with
the biological processes ‘Photoperiodism, flowering’ and ‘Vegetative to reproductive phase
transition in meristem,’ as well as the genes in the following KEGG pathways: ‘Biosynthesis of
unsaturated fatty acids,’ ‘Fatty acid elongation,’ ‘Fatty acid degradation,’ and ‘Fatty acid
biosynthesis’.
54
Results and Discussion
This genome-wide collection of GBS loci is capable of serving multiple purposes such as
quantifying levels of population genetic diversity, population structure, and identification of
putative candidate genes. When the distribution of genetic diversity is structured in a clear
latitudinal pattern it suggests the role of adaptation. By scanning the levels and structuring of
genetic diversity across the genome we were able to identify strong candidate genes for local
adaptation to three different regions of North America. By pairing population genetic
information with trait mapping data, we further verified the candidate status of the GBS loci for
playing a role in the establishment of two locally adaptive traits: flowering time and fatty acid
profile.
Genomic Organization of Markers
The program Stacks (Catchen et al., 2013) was used to process the 297,602,702 reads. An
average of 4,161,497 (Min = 979,193; Max = 10,895,995) reads were attributed to each of the 55
GBS libraries (Supplemental figure 3.1). When comparing the genomic location of loci with
sufficient data for outlier detection (see below), a locus was located an average 655,194 ± 14,585
bp from the nearest locus. The number of SNPs found on each of the 17 chromosomes in
sunflower was significantly correlated with the estimated length of each chromosome
(Supplemental figure 3.2). While it is encouraging that GBS markers are found more often on
larger chromosomes, a pattern consistent with expectations, it is still possible that within
individual chromosomes there could be substantial variation in marker density
55
Population Genetic Statistics
After filtering the data to retain only the 5759 loci (11,315 SNPs) with calls from at least
eight individuals per population, populations were tested for differences in the number of
haplotypes, haplotypic diversity, and expected heterozygosity. Specifically, the Texas population
had significantly less haplotypes and a lower haplotypic diversity than the Nebraska population,
but had more haplotypes and higher haplotypic diversity compared to the Canada population
(Table 3.1). Many forces including gene flow from a large number of unsampled populations,
and gene flow from Helianthus species that have range overlaps in Nebraska could have driven
this pattern. Clearly, sampling more populations with GBS markers will be required to assess the
forces affecting diversity across these latitudes. Heterozygosity values were similar among
populations (Table 3.1). All populations had significantly positive values of FIS. These positive
FIS values are most likely the result of allele dropout, a process in which only one of two alleles
is sequenced, possibly due to a SNP in the restriction cut site (Gautier et al., 2013). Additionally,
it is possible that alternative alleles for a particular locus may have diverged enough that they
ended up in different assembled loci.
Population Structure
Our STRUCTURE analysis of our restricted set (80% data present) of 1,029 SNPs
indicated the presence of three genetic clusters that correspond to the original USDA sampling
locations (Figure 3.2; Supplemental figure 3.3). After calculating various measures of genetic
differentiation, we found that the Texas vs. Canada comparison always produced the highest
level of average population structure across all loci (Table 3.2). When comparing the Nebraska
population individually to both Texas and Canada, for all metrics the Nebraska vs. Canada
56
comparison was always lowest. This means that Nebraska individuals share a higher proportion
of their genetic ancestry with northern compared to southern population. This is further
confirmed by visually inspecting the STRUCTURE bar chart (Fig. 3.2). Clearly, three
individuals have a visible level of genetic clustering with Canada. Additionally, when analyzing
the K = 2 level of structuring in this dataset, the Nebraska and Canadian populations were
merged into a single uniform cluster in 19 out of 20 runs, with Texas being assigned to its own
cluster.
FST Outliers
After identifying FST outliers using five separate runs within Arlequin, outliers were
filtered to retain only those that were found as an outlier in all five runs. This left us with 243 out
of 11,315 SNPs with consistently elevated levels of FST. These loci were located in both genic
(95) and intergenic (148) regions. FST outliers were found on all 17 chromosomes and co-
localized with multiple QTL for both flowering time and fatty acid biosynthesis (Table 3.3). Of
the 22 fatty acid and flowering time QTL, 14 had at least one FST outlier fall within the one LOD
interval (Table 3.3). However, when determining the amount of outliers one would expect to
randomly fall within QTL regions purely by chance, it was found that neither fatty acid or
flowering time traits showed any pattern of enrichment for FST outlier. In particular, two elevated
intergenic FST SNPs fell within a QTL for flowering time on chromosome 6. This is noteworthy
because previous work in sunflower has highlighted this region of the genome as playing a
strong role in adaptation. Originally, this genomic region drew interest in the context of the
genetics of sunflower domestication. Through genetic mapping (Burke et al., 2002; Burke et al.,
2005) and molecular evolutionary analyses (Blackman et al., 2010), this end of linkage group 6
57
was shown to harbor QTL for flowering time as well as a suite of duplicated FT genes which are
known to influence flowering time. Subsequently, this region of the genome has been shown to
play an important role in the wild. A low-density population genomic scan identified FT2 as
having elevated population structure compared to the balance of the genome when comparing a
range-wide sampling of wild H. annuus populations (McAssey et al., in prep). Additionally, this
gene has been shown to be differentially expressed across a latitudinal transect within the native
range of H. annuus, which further suggests a role in adaptation (Blackman et al., 2011a). The
extent of linkage disequilibrium in wild sunflower is unknown for this genomic region, so it is
currently impossible to know whether it represents a large island of elevated differentiation, or
several independent targets of selection. The GBS based outliers did not closely co-localize with
robust SNP outliers from a previous genome scan (McAssey et al., in prep). On LG 10 a SNP
chip outlier located within a GRAS transcription factor was about 1 MB away from a GBS
outlier in an unannotated gene. On LG 12 a SNP chip outlier in an EF-hand-like domain protein
was around 5MB away from the nearest GBS outlier. Finally, on LG 14 a GBS outlier near a
protein kinase was about 1 MB away from a SNP chip outlier within Defective in Cuticle Ridges.
Interestingly, three outlier loci were found in candidate pathways related to flowering
time, though none were found in the fatty acid biosynthetic pathway. A NAC domain-containing
protein is highly differentiated between Canada and the other two southern populations. In A.
thaliana, the ortholog of this gene has been shown to repress the activity of FT and hence repress
flowering (Ning et al., 2015). Interestingly this polymorphism co-localizes with a flowering time
QTL on chromosome 9. As FT2 has already been shown to be important in latitudinally
distributed wild sunflower populations (discussed above), the identification of this NAC domain-
58
containing protein as an FST outlier suggests that it might also play a role in latitudinal
adaptation. Again, a more complete picture of linkage disequilibrium in these two genomic
intervals is necessary to establish these genes as high quality adaptive candidates. Furthermore,
additional sampling will be required to confirm that latitudinal differentiation at these loci exists
in more than three populations.
Early in Short Days 7 (ENS7) was also identified as having a significantly elevated level
of genetic differentiation, although the pattern of differentiation was not correlated to latitude.
The high value of FST was due to both the Texas and Canadian populations being fixed for the
same allele; however, in Nebraska the alternative allele was found in high frequency. In A.
thaliana, a mutation in ENS7 has resulted in an early flowering phenotype (del Olmo et al.,
2010). Interestingly the mutation in ENS7 also appears to alter gene expression patterns of FT in
both long and short days (del Olmo et al., 2010). An additional significantly differentiated SNP
was found in Enhance Downy Mildew 2 (EDM2). This polymorphism exhibits latitudinal
differentiation in which Texas and Canada are fixed for different alleles whereas Nebraska is
intermediate in allele frequency. In addition to this gene having a role in plant defense, it has
been shown to be a positive regulator of flowering in A. thaliana (Tsuchiya and Eulgem 2010).
While the study of the genetic basis of adaptation has typically focused on coding
sequences, recent work has demonstrated functional relevance of intergenic sequences (e.g.,
Studer et al., 2011). Over half of the FST outlier SNPs identified herein were found in the
intergenic space / promoter regions of the sunflower genome. Here again, knowledge of the
extent of linkage disequilibrium in these regions will be needed to determine whether these
59
intergenic sequences are themselves likely to be involved in adaptive differentiation, perhaps due
to regulatory effects, or if they simply mark regions of elevated differentiation caused by
variation in a nearby gene.
Taken together, the results of this study indicate wild populations of H. annuus contain a
substantial level of genetic variation. A portion of this genetic variation was subsequently shown
to be exceptionally differentiated, which indicates its putative role in adaptation. These highly
structured polymorphisms are potential targets for introgression into cultivated germplasm.
While sunflower can be grown as far south as Texas, the majority of production is in the northern
central United States (sunflowernsa.com). As climate change continues in the coming decades,
the types of alleles that increase fitness in Texas over Canada could enhance (or at least
maintain) suitable growth and yields in cultivated lines. While the focus of this investigation was
on two traits of special importance to the sunflower community (flowering time and fatty acid
profile), future functional work on these FST outliers will be required to determine the precise
phenotypes affected by these genomic regions. Future research on the identified candidate
genomic regions must also include re-sequencing to establish the extent of linkage
disequilibrium in these important areas of the genome. The FST outlier regions containing well-
described flowering time homologs are consistent with the adaptive value of this trait and thus
are promising candidates for future work.
Acknowledgments
We would like to thank Stephan Schroeder and Greg Baute for helpful discussions
concerning library preparation. The Georgia Genomics Facility and the Georgia Advanced
60
Computing Resource Center provided valuable sequencing and bioinformatics support,
respectively.
61
References
Beaumont MA, and Nichols, R.A. (1996) Evaluating loci for use in the genetic analysis of
population structure. Proc R Soc Lond B 263: 1619-1626.
Blackman BK (2013) Interacting duplications, fluctuating selection, and convergence: the
complex dynamics of flowering time evolution during sunflower domestication. J Exp
Bot 64: 421-431.
Blackman BK, Michaels SD, Rieseberg LH (2011) Connecting the sun to flowering in sunflower
adaptation. Mol Ecol 20: 3503-3512.
Blackman BK, Rasmussen DA, Strasburg JL, Raduski AR, Burke JM, et al. (2011) Contributions
of flowering time genes to sunflower domestication and improvement. Genetics 187:
271-287.
Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH (2010) The role of
recently derived FT paralogs in sunflower domestication. Curr Biol 20: 629-635.
Bowers JE, Bachlava E, Brunick RL, Rieseberg LH, Knapp SJ, et al. (2012) Development of a
10,000 locus genetic map of the sunflower genome based on multiple crosses. G3
(Bethesda) 2: 721-729.
Burke JM, Knapp SJ, Rieseberg LH (2005) Genetic consequences of selection during the
evolution of cultivated sunflower. Genetics 171: 1933-1940.
Burke JM, Tang S, Knapp SJ, Rieseberg LH (2002) Genetic analysis of sunflower domestication.
Genetics 161: 1257-1267.
Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool
set for population genomics. Mol Ecol 22: 3124-3140.
del Olmo I, Lopez-Gonzalez L, Martin-Trillo MM, Martinez-Zapater JM, Pineiro M, et al.
(2010) EARLY IN SHORT DAYS 7 (ESD7) encodes the catalytic subunit of DNA
polymerase epsilon and is required for flowering repression through a mechanism
involving epigenetic gene silencing. Plant J 61: 623-636.
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust, simple
genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:
e19379.
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the
software STRUCTURE: a simulation study. Mol Ecol 14: 2611-2620.
Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform
population genetics analyses under Linux and Windows. Mol Ecol Resour 10: 564-567.
62
Fabian DK, Kapun M, Nolte V, Kofler R, Schmidt PS, et al. (2012) Genome-wide patterns of
latitudinal differentiation among populations of Drosophila melanogaster from North
America. Mol Ecol 21: 4748-4769.
Flint-Garcia SA, Thornsberry JM, Buckler ESt (2003) Structure of linkage disequilibrium in
plants. Annu Rev Plant Biol 54: 357-374.
Friesen ML, von Wettberg, E.J.B., Badri, M., Moriuchi, K.S., Barhoumi, F., Chang, P.L.,
Cuellar-Ortiz, S., Cordeiro, M.A., Vu, W.T., Arraouadi, S., Djebali, N., Zribi, K., Badri,
Y., Porter, S.S., Aouani, M.E., Cook, D.R., Strauss, S.Y., and Nuzhdin, S.V. (2014) The
ecological genomic basis of salinity adaptation in Tunisian Medicago truncatula. BMC
Genomics 15: 1160.
Friesen ML, Cordeiro MA, Penmetsa RV, Badri M, Huguet T, et al. (2010) Population genomic
analysis of Tunisian Medicago truncatula reveals candidates for local adaptation. Plant J
63: 623-635.
Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhue C, et al. (2013) The effect of RAD allele
dropout on the estimation of genetic variation within and between populations. Mol Ecol
22: 3165-3178.
Hereford J (2009) A quantitative survey of local adaptation and fitness trade-offs. Am Nat 173:
579-588.
Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, et al. (2010) Population genomics
of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet
6: e1000862.
Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9: 868-
877.
Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters 7: 1225-
1241.
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:
357-359.
Linder CR (2000) Adaptive evolution of seed oils in plants: Accounting for the biogeographic
distribution of saturated and unsaturated fatty acids in seed oils. The American Naturalist
156: 442-458.
Lowry DB, Hernandez K, Taylor SH, Meyer E, Logan TL, et al. (2015) The genetics of
divergence and reproductive isolation between ecotypes of Panicum hallii. New Phytol
205: 402-414.
63
Ning YQ, Ma ZY, Huang HW, Mo H, Zhao TT, et al. (2015) Two novel NAC transcription
factors regulate gene expression and flowering time by associating with the histone
demethylase JMJ14. Nucleic Acids Res 43: 1469-1484.
Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012) Development of high-density genetic
maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing
approach. PLoS One 7: e32253.
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus
genotype data. Genetics 155: 945-959.
Savolainen O, Lascoux M, Merila J (2013) Ecological genomics of local adaptation. Nat Rev
Genet 14: 807-820.
Studer A, Zhao Q, Ross-Ibarra J, Doebley J (2011) Identification of a functional transposon
insertion in the maize domestication gene tb1. Nat Genet 43: 1160-1163.
Tsuchiya T, Eulgem T (2010) The Arabidopsis defense component EDM2 affects the floral
transition in an FLC-dependent manner. Plant J 62: 518-528.
Turner TL, Bourne EC, Von Wettberg EJ, Hu TT, Nuzhdin SV (2010) Population resequencing
reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nat Genet 42: 260-263.
Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domestication of
sunflower. Genetics 176: 2589-2599.
Wright S (1951) The genetical structure of populations. Ann Eugen 15: 323-354.
64
Tables
Table 3.1 – Levels of population genetic diversity in three populations across a latitudinal gradient in North America.
Population # of
Stacks
#
Monomorphic
#
Polymorphic H SE
H
Div SE Ho SE He SE FIS SE
Texas 12773 2068 10705 3.09 0.01 0.69 0.005 0.14 0.001 0.27 0.001 0.41 0.003
Nebraska 10796 1202 9594 3.40 0.01 0.71 0.005 0.13 0.001 0.25 0.001 0.40 0.003
Canada 13522 2393 11129 3.00 0.01 0.66 0.004 0.14 0.001 0.27 0.001 0.42 0.003
65
Table 3.2 – Pairwise population structure among three wild sunflower populations.
Comparison # of SNPs AMOVA FST SE PhiST SE FST prime SE DEST SE
Texas vs. Nebraska 10375 0.15 0.002 0.23 0.003 0.26 0.004 0.26 0.004
Texas vs. Canada 10822 0.19 0.002 0.28 0.003 0.31 0.004 0.31 0.004
Nebraska vs. Canada 10030 0.13 0.002 0.21 0.003 0.23 0.003 0.23 0.003
66
Table 3.3 – QTL co-localization
Trait Chromosome cM One LOD interval Number of FST outliers
Days to Flower a 1 17.2 10.8-21.5 0
Days to Flower a 4 61.3 53.9-64.4 2
Days to Flower a 6 52.2 48.2-54.6 2
Days to Flower a 7 2 0.0-21.7 3
Days to Flower a 8 52.7 49.3-59.0 1
Days to Flower a 8 79.3 77.0-81.3 0
Days to Flower a 9 11.3 0.0-18.8 0
Days to Flower a 9 53.5 49.5-54.3 2
Days to Flower a 17 39.2 35.6-42.6 0
Days to Flower a 17 58.9 50.9-64.9 0
Days to Flower b 6 57.6 53.6-57.7 0
Days to Flower b 7 1 0.0-5.3 1
Days to Flower b 15 57.1 57.0-58.2 2
% Palmitic acid c 6 26.3 18.4-41.7 3
% Palmitic acid c 17 39.6 33.8-43.1 0
% Stearic acid c 6 24.3 20.3-28.3 1
% Stearic acid c 10 50.7 44.7-62.2 3
% Oleic acid c 1 19.5 3.3-25.5 0
% Oleic acid c 3 63.3 41.9-69.3 7
% Oleic acid c 6 22.3 16.4-28.3 1
% Linoleic acid c 3 47.9 39.9-61.2 6
% Linoleic acid c 6 24.3 18.4-28.3 1
a Burke et al., 2002;
b Wills and Burke 2007;
c Burke et al., 2005
67
Figure 3.1 – Original USDA sampling location for the three populations genotyped in this study
68
Figure 3.2 – Bar plot indicating the proportion of membership to three genetic clusters as
identified in the program STRUCTURE
Texas Nebraska Canada
69
CHAPTER IV
TRANSCRIPTOMIC ANALYSIS OF DEVELOPING SEEDS ACROSS THE RANGE OF
WILD SUNFLOWER3
3 McAssey, E.V., Burke, J.M. to be submitted to American Journal of Botany
70
Abstract
Ecologists have identified numerous adaptive traits through classic reciprocal transplant
experiments. However, the genetic basis of these traits has only been investigated in a handful of
organisms to date. High throughput sequencing approaches like RNA-sequencing (RNA-seq)
have the ability to address questions of adaptation in non-model organisms. Wild populations of
Helianthus annuus L. have previously been shown to be adaptively differentiated for the
percentage of saturated fatty acids within seeds. Southern populations are known to contain a
higher proportion of saturated fatty acids in their seeds, which results in quicker growth under
warmer temperatures compared to northern populations. I used RNA-seq to make a wild
sunflower transcriptome assembly that was then used to test for differential expression when
comparing RNA derived from developing sunflower seeds from common garden grown northern
and southern populations. I found 1788 significantly differentially expressed isoforms located
throughout the sunflower genome. A number of these isoforms, FAD2, FAD8, FAB1, FAB2, and
FATA were derived from homologs of genes existing in the Arabidopsis thaliana fatty acid
KEGG pathways. Additionally, differentially expressed isoforms were found to exist in a number
of fatty acid quantitative trait loci intervals from previously published work, although only one
of the isoforms, FATA, was annotated as being a fatty acid homolog. The most differentially
expressed isoforms were found to be enriched for gene ontology terms such as lipid transport, in
addition to other less clear terms like defense and stress response. These differentially expressed
isoforms represent candidate genes for future functional work to directly establish a relationship
between gene expression and trait differentiation.
71
Introduction
Adaptation to divergent environments (i.e., local adaptation) requires heritable genetic
changes that confer higher fitness to an individual in one habitat over another habitat.
Historically the most sought after molecular evidence for adaptation focused on mutations that
altered the protein coding sequence of a gene. Non-synonymous nucleotide substitutions have
the potential to influence gene function, and novel alleles arising through such a process could be
favored in certain environments. In such cases, different alleles could come to predominate in
different populations. Alternatively, regulatory differences resulting in divergent expression
patterns could play an important role in adaptation in the absence of coding sequence
polymorphisms. The relative importance of coding sequence and regulatory mutations has been
extensively debated (King and Wilson 1975; Andolfatto 2005; Hoekstra and Coyne 2007). In
plant biology, when looking at the types of mutations and genes that were the targets of crop
domestication, there appears to be a big role for transcription factors (Doebley et al., 2006).
Recent advances in sequencing technology have made possible the efficient characterization of
gene expression on a genome-wide scale, which has enabled studies of transcriptional variation
in both model and non-model species (Wang et al., 2009).
Low throughput techniques requiring a priori sequence information were a major
limitation to the study of gene expression in non-model organisms. Now levels of gene
expression have been assayed in a variety of species including diploid and polyploid Glycine
species (Ilut et al., 2012), sea urchins (Pespeni et al., 2013), black-faced blenny (Schunter et al.,
2014), European silver fir (Behringer et al., 2015), and the salt tolerant species Suaeda fruticosa
(Diray-Arce et al., 2015). In the genus Helianthus, a group containing cultivated sunflower,
72
RNA-seq is already being used for a variety of applications (Renaut et al., 2012; Baute et al.,
2015; Renaut and Rieseberg 2015) including an analysis of differential expression between wild
sunflower ecotypes that varied in the amount of woody growth (Moyers and Rieseberg 2013).
In addition to secondary growth, populations of wild sunflower contain genetic variation
for seed traits. For plants, seeds represent a crucial transitional state where various genetic and
environmental cues are required to establish proper early growth. For example, dormancy
mechanisms have resulted in seed germination primarily occurring when environmental
conditions are ideal. This is thought to occur through a variety of mechanisms including the
sensing of temperature and light (Footitt et al., 2013). An additional crucial stage of seeds
maturation is in early development, when seeds acquire crucial provisions in the form of seed
storage proteins and oils (Shewry et al., 1995; Ruuska et al., 2002). These molecules provide raw
materials and energy for the early growth of a seedling and can be focal points for adaptive
differentiation between populations.
In angiosperms, for example, trends in seed oil composition have been detected across
latitudinal gradients in a wide variety of species, suggesting that this trait may play an important
role in latitudinal adaption (Linder 2000). Plant seed oils consist of a variety of saturated and
unsaturated fatty acids. These molecules vary in melting point due to the presence of double
bonds in unsaturated fatty acids. Unsaturated fatty acids have lower melting points due to their
inability to pack tightly together. Unsaturated fatty acids are less energy rich because there is a
cost to forming their characteristic double bonds. In a meta-analysis of data from a wide variety
of plant species, Linder (2000) found that low latitude species/populations had a relatively higher
73
proportion of saturated fatty acids in their seeds. Linder (2000) reasoned that if the melting point
of fatty acids was adaptive, low latitude families, species, and populations would have a higher
relative amount of saturated fatty acids because the comparatively warmer germination
temperatures would allow individuals to use saturated fatty acids. On a related note, low
temperatures in high latitudes would cause saturated fatty acid to remain solid and unusable. For
example, at the intraspecific level, it was found that northern populations of Helianthus annuus
L. contained significantly less saturated fatty acids compared to southern populations, which was
in line with theoretical predictions (Linder 2000). A further characterization of saturated fatty
acids across a latitudinal gradient showed that this pattern was rather gradual across the continent
as opposed to a large clear disjunction in trait values (McAssey et al., in prep).
A reciprocal transplant experiment showed that southern genotypes of H. annuus grew
significantly faster than northern genotypes when placed in a warm germination environment
(Linder 2000). Correspondingly, northern genotypes germinated earlier than southern genotypes
when both were grown in a cool environment. The above pattern is consistent with local
adaptation; the respective home populations perform better in their native environment compared
to a foreign environment. While the ecological patterns and functional relevance of the trait have
been fairly well established, we know comparatively little about the underlying genetics that
mediate this adaptive phenotype. The fatty acid biosynthesis pathway has been worked out in the
model plant species Arabidopsis thaliana and involves genes responsible for elongation of the
fatty acid chain, desaturation of parts of the chain (if necessary), and then export for storage or
usage (Ohlrogge and Browse 1995).
74
The most important type of genes for conferring this difference in germination under
different temperature regimes would be any gene that changes the average melting point of the
resulting fatty acids. Therefore, we decided to investigate the genes in the fatty acid pathway
across the range of wild sunflower. As we already know that there is a phenotypic difference
between high and low latitude populations (Linder 2000; McAssey et al., in prep) we expect to
uncover genes that are differentially expressed in a direction consistent with the accumulation of
more saturated fatty acids in low latitude individuals. To establish whether or not this pattern
exists in wild sunflower, we performed RNA-sequencing of individuals from the southern and
northern ends of the native range. By performing tests of differential expression between regions,
we present a list of candidate genes for playing a role in the development of this trait.
Furthermore, we analyze these candidates in terms of their position in the fatty acid pathway,
their co-localization with known QTL for fatty acid profile, and collectively in terms of whether
or not gene ontology terms are over-represented in the differentially expressed isoforms relative
to the frequency of terms found in all of the seed expressed isoforms.
Materials and Methods
Plant Materials
Seeds from six populations representing the northern and southern ends of the wild
sunflower range in North America (three from Saskatchewan, Canada and three from Texas,
USA; Table 4.1) were rinsed with 3% hydrogen peroxide before being placed on moist paper
towels in a darkened cold room at 4 C for two weeks to break dormancy. Seeds used in this study
were produced from crossing USDA individuals from a previous experiment (McAssey et al., in
prep). After two weeks, seeds were placed in a growth room at 23 C and maintained under 16
75
hour days : 8 hour nights with artificial light. Once seedlings developed a substantial hypocotyl
and radicle, they were transplanted into flats of soil. Established seedlings were then moved into
a greenhouse, transplanted into soil pots, and populations of plants were arranged randomly in
the greenhouse. Flowering heads were bagged once buds began to develop in order to prevent
unintended cross pollination. Since wild sunflower is self-incompatible, I manually cross-
pollinated pairs of individuals originating from the same population. This involved removing the
pollination bags from the two focal plants, collecting pollen from both plants directly into a petri
dish, and then using a paint brush to apply pollen onto the same two plants. Heads were then re-
bagged and plants were allowed to set seed. Fifteen days after pollination, eight achenes (i.e.,
single-seed fruits) were collected from the center of each developing seed head, placed into 1.5
mL tubes, and frozen in liquid nitrogen. These tubes were then stored at -80 C until RNA
extraction.
RNA Extraction
The eight frozen achenes from each maternal plant were ground in a mortar and pestle
using liquid nitrogen and a pinch of PVPP. The ground tissue was then transferred into a tube
and placed in liquid nitrogen. After removing the tube from liquid nitrogen storage, one mL of
Trizol was added to each tube. The contents of each tube were then mixed and allowed to
incubate at room temperature for five minutes. Chloroform (300 µl) was then added to each tube
and the tubes were manually shaken and then centrifuged at 12,000 G for 10 minutes. The
aqueous phase of each sample was then removed via pipetting and transferred to a new tube.
After mixing with a 0.53X volume of 100% ethanol, the solution was applied to a Qiagen
76
RNeasy Plant Mini protocol (Valencia, CA, USA) with on-column DNase digest for RNA
purification.
Library Construction
RNA quality was assessed using a Bioanalyzer RNA chip (Agilent, Santa Clara, CA). All
samples (nine from Texas, eleven from Canada; Table 4.1) used for library construction had a
RIN value of at least 8.5 out of 10. Libraries were constructed using a Kapa mRNA-seq kit
(Kapa Biosystems, Wilmington, MA). This kit utilizes magnetic beads to perform size selection
and poly-A tail selection. Libraries were constructed to include a size range of approximately
200-500 bp, and size ranges were checked using a fragment analyzer (Advanced Analytical
Technologies, Ankeny, IA). Individual libraries were then quantified via qPCR using Illumina
standards. Equimolar amounts of library were then pooled into a single tube and submitted for
Illumina NextSeq 500 PE75 sequencing at the Georgia Genomics Facility (Athens, GA).
Transcriptome Assembly and Tests for Differential Expression
Adaptor sequences were trimmed from reads prior to quality filtering using Trimmomatic
(Bolger et al., 2014). With one exception, all reads from each library were used to produce a
single de novo assembly in Trinity (Gragherr et al., 2011). The lone exception was dramatically
overrepresented in the raw data (123 million fragments sequenced with paired-end reads), and
therefore 9.5 million paired-end reads were subsampled from this library to be used in the
assembly. We used the Trinity option –min_kmer_cov 2 to reduce the computing power
necessary to assemble this large dataset. We then performed a BLASTx comparing all assembled
transcripts to the A. thaliana proteome. Isoforms whose E-value was greater than 1 x 10-5
were
77
removed from the assembly. Reads were then mapped to the BLAST-filtered assembly using
Bowtie2 (Langmead and Salzberg 2012) and a matrix of read counts from each individual
mapping to each isoform was constructed using Rsem (Li and Dewey 2011). This matrix was
then used as input into EdgeR (Robinson et al., 2010) to test for differential expression. One
library was under-sequenced and removed from analyses of differential gene expression. The
remaining 19 libraries were split into a Saskatchewan and Texas group and then tested for
differential expression using a fisher’s exact test. Only isoforms that were expressed in eight
individuals at the level of at least three counts per million were retained for tests of differential
expression. Significantly differentially expressed isoforms were determined based on an FDR <
0.05. Furthermore, we used the physical position of sunflower genes to perform an auto-
correlation analysis. Specifically, we calculated the physical distance between each expressed
gene on an individual chromosome in order to form a matrix. When multiple trinity isoforms
mapped to the same gene in the genome assembly we arbitrarily chose a single isoform for
creating the matrix. In a similar fashion, a distance matrix of log fold change was created to
compare expression differences between all combinations of genes on each individual
chromosome. For each chromosome, the physical distance and log fold change matrices were
compared using a mantel test as implemented in the R package Ade4 with 999 permutations.
Gene Ontology Term Enrichment and Candidate Gene Identification
Ermine J (Gillis et al., 2010) was used to test for gene ontology term enrichment in my
differential expression dataset. Specifically, Ermine J first looks at a list expressed isoforms
sorted by p-value, and then establishes whether or not particular GO terms are found
78
preferentially at the top of the ranked list (i.e., GO terms associated with DE isoforms). This was
done using receiver operator characteristic scoring looking at GO terms present between 10 and
200 times in the dataset. GO terms were considered to be significantly enriched if their FDR
value was less than 0.1. The best blast hit IDs for differentially expressed genes were compared
to the A. thaliana IDs that make up various fatty acid pathways (pathways used: biosynthesis of
unsaturated fatty acids, fatty acid biosynthesis, fatty acid elongation, fatty acid degradation)
found in the KEGG database (Kanehisa et al., 2015). Here, we assessed whether or not the
patterns of differential expression were consistent with known phenotypes. In other words, for
fatty acid desaturases, we assessed whether or not they were more highly expressed in Canadian
plants relative to Texas plants as would be predicted given theory and known phenotypic data.
Analysis of Fatty Acid QTL Regions
All transcripts were mapped using BLASTn to mRNA sequences derived from the
sunflower genome. These mRNA sequences have been previously connected to specific genes in
A. thaliana by using BLASTp. After mapping these sequences, we determined the number and
identity of differentially expressed transcripts occurring within QTL for fatty acid profile from
previously published research (Burke et al., 2005). Specifically, we asked whether any of the
differentially expressed transcripts co-localizing with QTL were annotated as being related to
fatty acid biosynthesis/processing. This was done by first identifying the flanking scaffolds
surrounding the QTL interval from the consensus genetic map of the sunflower genome (Bowers
et al., 2012). The locations of these flanking scaffolds were then identified in the current genome
assembly [HA412.v1.1.bronze.20141015]. The QTL interval was then scanned to identify the
presence of differentially expressed isoforms.
79
Results
After de-multiplexing, we determined that each library had an average of ca. 15 million
fragments sequenced with paired-end reads (Figure 4.1). After filtering the assembly via a
BLASTx to the A. thaliana proteome, the assembly contained 79,286 isoforms corresponding to
a N50 of 1219 bp. The differential expression analysis identified 1,798 of the 30,943 tested
isoforms as being significantly differentially expressed between Texas and Canada. Of these, 997
had significantly higher expression in Canada relative to Texas, and 801 had significantly higher
expression in Texas relative to Canada (Supplemental table 4.1; Supplemental table 4.2).
Differentially expressed isoforms were found on all 17 chromosomes (Figure 4.2). After
correcting for multiple testing, none of the 17 sunflower chromosomes exhibited an association
between physical distance and similarity in expression patterns (Supplemental table 4.3). In other
words, just because genes are adjacent to one another there is no evidence that they are expressed
at a similar level.
Numerous genes related to fatty acid biosynthesis, modification, and breakdown were
differentially expressed (Table 4.2). Fatty acid desaturase 2 (FAD2), fatty acid desaturase 8
(FAD8), and fatty acid biosynthesis 2 (FAB2) all play a role in producing unsaturated fatty acids,
and all were differentially expressed in developing seeds (Table 4.2). FAD2 and FAB2 were
more highly expressed in Canada relative to Texas, whereas FAD8 was more highly expressed in
Texas. Homologs of the fatty acid degradation genes alcohol dehydrogenase 1 and a zinc
binding alcohol dehydrogenase were both more highly expressed in Texas relative to a Canada
whereas acetoacetyl-CoA thiolase 1, long chain acyl-CoA synthetase 9, and multifunctional
protein 2 were more highly expressed in Canada. Another FAB gene, FAB1, was differentially
80
expressed in the same direction as FAB2 (higher expression in Canada). Genes that control the
usage of fatty acids were also differentially expressed, including a homolog of delta(2)-enoyl
CoA isomerase 3 (higher expression in Texas).
Differentially Expressed Isoform Co-localization with QTL
We identified 166 instances of differentially expressed isoforms co-localizing with a
quantitative trait locus for one of the four main fatty acids in sunflower seed oil: palmitic, stearic,
oleic, and linoleic acid (Burke et al., 2005). These 166 instances of co-localization corresponded
to 130 genes due to multiple DE isoforms mapping to the same sunflower gene. After identifying
genes that co-localize with multiple QTL on linkage groups 3 and 6, we were able to establish
that 95 unique genes in the sunflower genome are both differentially expressed and co-localize
with fatty acid QTL (Table 4.3). Of these 95 differentially expressed genes, only one fatty acid
isoform, a homolog of acyl-ACP thioesterase (FATA), co-localized with a previously mapped
fatty acid QTL – an oleic acid QTL on chromosome 1 (Table 4.3). It should be noted that based
on the amount of genetic space occupied by fatty acid QTL, one would expect about 190
differentially expressed isoforms to co-localize. Thus, in this dataset there is no evidence for
enrichment of differentially expressed genes being located within fatty acid QTL.
Gene Ontology Enrichment
By analyzing the entire list of 30,943 expressed isoforms at our filtering standards (eight
individuals; counts per million greater than 3) we identified gene ontology terms that were
preferentially found among statistically significant isoforms (Table 4.4). When looking at
81
biological processes, we found a variety of terms including embryo development, and lipid
localization, that were significant at a FDR cutoff of 0.1.
Discussion
Analyses of differential gene expression have the potential to provide unique insights into
the molecular mechanisms underlying adaptive differences. Furthermore, by comparing
populations from opposite ends of the natural range, we likely are sampling gene expression
patterns that affect a number of adaptive aspects of seed biology. Differences in gene expression,
and presumably protein accumulation, affect the very first stages of growth. Therefore, this
developmental stage is crucial for lifetime fitness, and differentially expressed genes in this stage
may fundamentally affect adult stature despite only being differentially expressed very early in
development. By establishing whether or not differentially isoforms are implicated in fatty acid
biosynthesis in the model organism A. thaliana, it is possible to easily narrow the list of
candidate genes for conferring the adaptive trait of fatty acid profile.
Differential Expression of Fatty Acid Genes
The large amount of past work on the biochemistry of seed oil biosynthesis has allowed
us to prioritize candidate pathways. In particular, the elevated expression of desaturases, which
effectively reduce the melting point of available fatty acids, ought to be favored in high latitude
populations according to theory (Linder 2000). Consistent with this expectation, we identified a
homolog of FAD2 as being significantly more highly expressed in plants from Canada vs. those
from Texas. The FAD2 protein is responsible for converting 18:1 oleic acid into 18:2 linoleic
acid (Chen et al., 2011; Liu et al., 2013). FAD2 has been the focus of numerous investigations as
82
it is an attractive breeding target for altering the fatty acid profile in commercial seed oils (Patel
et al., 2004; Pham et al., 2010). Functional work in A. thaliana has further indicated that de-
activation of this gene results in over accumulation of oleic acid and subsequent delayed
germination in cool temperatures compared to wild type (Miquel and Browse 1994). Work in
cultivated sunflower has shown that down-regulation of a different copy of FAD2 on LG 14,
known as FAD2-1, is responsible for oleic acid accumulation in so-called high oleic cultivars
(Schuppert et al., 2006; Lacombe et al., 2009). In this dataset, FAD2-1 corresponded to four
isoforms that were all expressed at high levels across all libraries; as such, there was no evidence
of differential expression for this gene (all FDR > 0.05). The additional copy of FAD2 on LG 6
may, however, play a role in determining the observed difference in oil composition across the
range of wild sunflower. It should be noted that the variance in expression of FAD2 in northern
libraries is quite high and that many libraries only have modest levels of expression despite this
gene being identified as differentially expressed.
FAB2 is an enzyme that plays a role in the conversion of 18:0 fatty acids to 18:1 mono-
unsaturated oleic acid (Lightner et al., 1994). Consistent with expectations, this gene also shows
a latitudinal trend in expression whereby northern individuals exhibit higher expression than
southern individuals (80 vs. 40 FPKM; Table 4.2). In contrast, another differentially expressed
fatty acid desaturase, FAD8, exhibited higher expression in southern vs. northern individuals (32
vs. 9 FPKM; Table 4.2). This gene has been implicated in the production of 18:3 linolenic acid
from 18:2 linoleic acid (Gibson et al., 1994). In sunflower, linolenic fatty acid is generally not
seen in seeds but rather can be found in high amounts in leaves (Cantisan et al., 1999). The low
melting point conferred by highly unsaturated fatty acids like linolenic acid may be beneficial in
leaf tissue when plants are experiencing cold temperatures (Gibson et al., 1994).
83
In addition to fatty acid biosynthesis and modification, the degradation of fatty acids
could play a role in establishing the phenotypic difference seen in fatty acid profile of
latitudinally distributed populations. A homolog of delta(2)-enoyl CoA isomerase 3 (ECI3) is
more highly expressed in plants from Texas compared to Canada (Two isoforms, 17 vs. 6, and
14 vs. 6 FPKM; Table 4.2). ECI3 is known to play a role in the breakdown of linoleic (and
linolenic) fatty acids. However, biochemical evidence suggests that the Arabidopsis copy of
ECI3 is not found in the peroxisome, where fatty acid breakdown occurs, despite being able to
catalyze the reaction (Goepfert et al., 2008). The precise location of ECI3 protein within the cells
of H. annuus seeds remains unknown. A homolog of multifunctional protein 2 (MFP2) was more
highly expressed (although generally weak expression overall) in Canadian individuals relative
to those from Texas (8 vs. <1 FPKM; Table 4.2). MFP2 is has been shown to be quite important
for breaking down fatty acids in germinating seeds (Rylott et al., 2006). The extent to which both
MFP2 and ECI3 play functional roles in developing sunflower seeds will require follow up
biochemical experiments.
This set of differentially expressed genes can be used to further curate previously
described lists of candidates. Specifically, individuals derived from these latitudes (as well as a
middle latitude in Nebraska) have been genotyped with GBS markers. These markers were used
to uncover the population genetic signal of local adaptation occurring across latitudes in wild
sunflower. Ultimately 243 SNP loci (out of 11,315) showed significantly elevated population
differentiation (McAssey et al., in prep). Of these candidate loci for local adaptation, two were
differentially when querying the GBS outliers against this study’s differentially expressed gene
list. One is a microtubule based kinesin motor protein on LG 4 that has higher (although
relatively weak) expression in Canada compared to Texas. The other such gene was PEPC
84
(phosphoenolpyruvate carboxylase) homolog, which is found on LG 8. These gene was more
highly expressed in Canada relative to Texas, although the expression difference was modest and
as such its FDR value just barely was significant (FDR = .049). As these genes show both a
population genetic and a gene expression signal of adaptation, they represent interesting
candidates to pursue.
QTL Co-localization
A number of differentially expressed isoforms co-localized with QTL for various fatty
acid products. Larger QTL intervals were found to contain more differentially expressed
isoforms as identified by a linear regression (r2 = 0.91). A homolog of FATA co-localized with an
oleic acid QTL on LG 1 (Burke et al., 2005). FATA expression in A. thaliana appears to affect
the relative abundance of 18:2 and 18:3 fatty acids (Moreno-Perez et al., 2012) though, as noted
above, sunflower seeds do not typically accumulate 18:3 fatty acids in their seeds. As QTL
intervals are typically large, it is challenging to connect variation in a particular phenotype to a
single gene. For example, in the relatively small QTL intervals on LG 6 none of the four
differentially expressed genes have fatty acid gene annotations.
Gene Ontology Term Enrichment
The GO enrichment analysis revealed a number of terms associated with the most
significantly differentially expressed isoforms (Table 4.4). The presence of ‘lipid transport’ and
‘lipid localization’ as significant terms suggests broad differences between northern and southern
populations in how lipids are shuttled and organized within developing seeds. Other enriched GO
terms like ‘response to stress’ and ‘defense response’ were not initially expected to be
85
significant, as these plants were grown in a common garden. However, it is possible that stress
and defense genes could be differentially expressed due to northern and southern populations
experiencing the common garden greenhouse conditions in Georgia in different ways. For
example, the level of watering and greenhouse temperatures may have been perceived by
individuals from one end of the range as being rather benign (no stress response), whereas
individuals from the other end of the range might sense the same conditions as being stressful,
thereby eliciting a transcriptomic response.
Analyses of differential gene expression have revealed a considerable amount of
intraspecific diversity between latitudinally structured sunflower populations. As previous work
has strongly implicated the higher saturated fatty acid content in southern populations as being
adaptive in H. annuus, the reported results have generated numerous candidates related to the
molecular basis of this adaptive trait in developing seeds. Further assessment of the functional
relevance of these candidate genes will require additional experimentation. This work could
include quantification of expression levels across multiple developmental stages to gain a more
nuanced understanding of how these genes are regulated. As a complement to such work,
transformation of sunflower alleles into a heterologous system (A. thaliana), as has been done
previously (e.g., Blackman et al., 2010), has the potential to provide direct insights into the
functional relevance of these genes in sunflower. Taken together, this work illustrates the utility
of transcriptomic analyses for the study of locally adaptive traits. As the price of sequencing
continues to drop and throughput increases, such analyses are likely to become routine, even in
species with minimal genomic resources. In particular, by pairing RNA-sequencing experiments
with previous reciprocal transplant evidence it may be possible to conclusively demonstrate the
beneficial genotype by environment interaction that defines local adaptation.
86
Acknowledgements
We would like to thank Greg Cousins for greenhouse assistance. Savithri Nambeesan
provided guidance concerning tissue sampling and RNA extraction. The Georgia Advanced
Computing Resource Center (GACRC) provided computational support throughout this project.
Karolina Heyduk provided valuable bioinformatic assistance. This work was supported by a
grant from the NSF Plant Genome Research Program (DBI-0820451 to JMB).
87
References
Andolfatto P (2005) Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149-
1152.
Baute GJ, Kane NC, Grassa CJ, Lai Z, Rieseberg LH (2015) Genome scans reveal candidate
domestication and improvement genes in cultivated sunflower, as well as post-
domestication introgression with wild relatives. New Phytol 206: 830-838.
Behringer D, Zimmermann H, Ziegenhagen B, Liepelt S (2015) Differential gene expression
reveals candidate genes for drought stress response in Abies alba (Pinaceae). PLoS One
10: e0124564.
Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH (2010) The role of
recently derived FT paralogs in sunflower domestication. Current Biology 20: 629-635.
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence
data. Bioinformatics 30: 2114-2120.
Bowers JE, Bachlava E, Brunick RL, Rieseberg LH, Knapp SJ, et al. (2012) Development of a
10,000 locus genetic map of the sunflower genome based on multiple crosses. G3
(Bethesda) 2: 721-729.
Burke JM, Knapp SJ, Rieseberg LH (2005) Genetic consequences of selection during the
evolution of cultivated sunflower. Genetics 171: 1933-1940.
Cantisan S, Martinez-Force E, Alvarez-Ortega R, Garces R (1999) Lipid characterization in
vegetative tissues of high saturated fatty acid sunflower mutants. J Agric Food Chem 47:
78-82.
Chen W, Song K, Cai Y, Li W, Liu B, et al. (2011) Genetic modification of soybean with a novel
grafting technique: Downregulating the FAD2-1 gene increases oleic acid content. Plant
Molecular Biology Reporter 29: 866-874.
Diray-Arce J, Clement M, Gul B, Khan MA, Nielsen BL (2015) Transcriptome assembly,
profiling and differential gene expression analysis of the halophyte Suaeda fruticosa
provides insights into salt tolerance. BMC Genomics 16: 353.
Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127:
1309-1321.
Footitt S, Huang Z, Clay HA, Mead A, Finch-Savage WE (2013) Temperature, light and nitrate
sensing coordinate Arabidopsis seed dormancy cycling, resulting in winter and summer
annual phenotypes. Plant J 74: 1003-1015.
88
Gibson S, Arondel V, Iba K, Somerville C (1994) Cloning of a temperature-regulated gene
encoding a chloroplast omega-3 desaturase from Arabidopsis thaliana. Plant Physiol 106:
1615-1621.
Gillis J, Mistry M, Pavlidis P (2010) Gene function analysis in complex data sets using ErmineJ.
Nat Protoc 5: 1148-1159.
Goepfert S, Vidoudez C, Tellgren-Roth C, Delessert S, Hiltunen JK, et al. (2008) Peroxisomal
Delta(3),Delta(2)-enoyl CoA isomerases and evolution of cytosolic paralogues in
embryophytes. Plant J 56: 728-742.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, et al. (2011) Full-length
transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol
29: 644-652.
Hoekstra HE, Coyne JA (2007) The locus of evolution: evo devo and the genetics of adaptation.
Evolution 61: 995-1016.
Hubner S, Korol AB, Schmid KJ (2015) RNA-Seq analysis identifies genes associated with
differential reproductive success under drought-stress in accessions of wild barley
Hordeum spontaneum. BMC Plant Biol 15: 134.
Ilut DC, Coate JE, Luciano AK, Owens TG, May GD, et al. (2012) A comparative
transcriptomic study of an allotetraploid and its diploid progenitors illustrates the unique
advantages and challenges of RNA-seq in plant species. Am J Bot 99: 383-396.
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2015) KEGG as a reference
resource for gene and protein annotation. Nucleic Acids Res.
King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188:
107-116.
Lacombe S, Souyris I, Berville AJ (2009) An insertion of oleate desaturase homologous
sequence silences via siRNA the functional gene leading to high oleic acid content in
sunflower seed oil. Mol Genet Genomics 281: 43-54.
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:
357-359.
Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or
without a reference genome. BMC Bioinformatics 12: 323.
Lightner J, Wu J, Browse J (1994) A Mutant of Arabidopsis with Increased Levels of Stearic
Acid. Plant Physiol 106: 1443-1451.
89
Linder CR (2000) Adaptive evolution of seed oils in plants: Accounting for the biogeographic
distribution of saturated and unsaturated fatty acids in seed oils. The American Naturalist
156: 442-458.
Liu Q, Cao S, Zhou XR, Wood C, Green A, et al. (2013) Nonsense-mediated mRNA degradation
of CtFAD2-1 and development of a perfect molecular marker for olol mutation in high
oleic safflower (Carthamus tinctorius L.). Theor Appl Genet 126: 2219-2231.
Mathilde G, Ghislaine G, Daniel V, Georges P (2003) The Arabidopsis MEI1 gene encodes a
protein with five BRCT domains that is involved in meiosis-specific DNA repair events
independent of SPO11-induced DSBs. The Plant Journal 35: 465-475.
Miquel MF, Browse, J.A. (1995) High-oleate oilseeds fail to develop at low temperatures. Plant
Physiology 106: 421-427.
Moreno-Perez AJ, Venegas-Caleron M, Vaistij FE, Salas JJ, Larson TR, et al. (2012) Reduced
expression of FatA thioesterases in Arabidopsis affects the oil content and fatty acid
composition of the seeds. Planta 235: 629-639.
Moyers BT, Rieseberg LH (2013) Divergence in gene expression is uncoupled from divergence
in coding sequence in a secondarily woody sunflower. International Journal of Plant
Sciences 174: 1079-1089.
Ohlrogge J, Browse J (1995) Lipid biosynthesis. Plant Cell 7: 957-970.
Patel M, Jung S, Moore K, Powell G, Ainsworth C, et al. (2004) High-oleate peanut mutants
result from a MITE insertion into the FAD2 gene. Theor Appl Genet 108: 1492-1502.
Pespeni MH, Barney BT, Palumbi SR (2013) Differences in the regulation of growth and
biomineralization genes revealed through long-term common-garden acclimation and
experimental genomics in the purple sea urchin. Evolution 67: 1901-1914.
Pham A, Lee, J., Shannon, J.G., Bilyeu, K.D. (2010) Mutant alleles of FAD2-1A and FAD2-1B
combine to produce soybeans with the high oleic acid seed oil trait. BMC Plant Biology
10: 195.
Renaut S, Grassa CJ, Moyers BT, Kane NC, Rieseberg LH (2012) The population genomics of
sunflowers and genomic determinants of protein evolution revealed by RNAseq. Biology
(Basel) 1: 575-596.
Renaut S, Rieseberg LH (2015) The accumulation of deleterious mutations as a consequence of
domestication and improvement in sunflowers and other compositae crops. Mol Biol
Evol 32: 2273-2283.
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential
expression analysis of digital gene expression data. Bioinformatics 26: 139-140.
90
Ruuska SA, Thomas, G., Benning, C., Ohlrogge, J.B. (2002) Contrapuntal Networks of Gene
Expression during Arabidopsis Seed Filling. The Plant Cell 14: 1191-1206.
Rylott EL, Eastmond PJ, Gilday AD, Slocombe SP, Larson TR, et al. (2006) The Arabidopsis
thaliana multifunctional protein gene (MFP2) of peroxisomal beta-oxidation is essential
for seedling establishment. Plant J 45: 930-941.
Schunter C, Vollmer SV, Macpherson E, Pascual M (2014) Transcriptome analyses and
differential gene expression in a non-model fish species with alternative mating tactics.
BMC Genomics 15: 167.
Schuppert GF, Tang, S., Slabaugh, M.B., Knapp, S.J. (2006) The sunflower high-oleic mutant Ol
carriers variable tandem repeats of FAD2-1, a seed-specific oleoyl-phosphatidyl choline
desaturase. Molecular Breeding 17: 241-256.
Shewry PR, Napier JA, Tatham AS (1995) Seed storage proteins: structures and biosynthesis.
Plant Cell 7: 945-956.
Wang L, Tiffin P, Olson MS (2014) Timing for success: expression phenotype and local
adaptation related to latitude in the boreal forest tree, Populus balsamifera. Tree Genetics
& Genomes 10: 911-922.
Wang W, Feng B, Xiao J, Xia Z, Zhou X, et al. (2014) Cassava genome from a wild ancestor to
cultivated varieties. Nat Commun 5: 5110.
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat
Rev Genet 10: 57-63.
91
Tables
Table 4.1 – Original sample locations and associated sampling depth
USDA PI# Location Latitude Longitude N
413160 Texas, USA 31.03972222 -104.8302778 4
664692 Texas, USA 31.18916667 -103.5780556 1
468476 Texas, USA 31.27277778 -102.6922222 4
592320 Saskatchewan, Canada 50.0475 -104.7072222 4
592311 Saskatchewan, Canada 50.39361111 -108.4802778 4
592316 Saskatchewan, Canada 50.66 -105.6647222 3
92
Table 4.2 – Differentially expressed fatty acid isoforms
Gene ID At ID # of DE
isoforms
Kegg pathway
membership Log fold change FDR Chromosome
AACT1 AT5G47720 1 B, E -1.001467733 0.022680079 15
ADH1 AT1G77120 6 B 1.204962596 0.000644478 N/A
alpha/beta hydrolase super
family AT3G60340 1 D, E -0.712966159 0.036220746 13
BCCP AT5G16390 1 C, E -1.17269982 0.036700762 16
CAC2 AT5G35360 1 C, E -1.689500937 7.40321E-06 4
ECI3 AT4G14440 2 D 1.272111638 0.000299732 3
FAB1 AT1G74960 2 C, E -0.900625098 0.018662491 15
FAB2 AT2G43710 1 A, C, E -1.027371065 0.017887013 11
FAD2 AT3G12120 1 A, E -3.136681738 0.042268888 6
FAD8 AT5G05580 1 A, E 1.681796277 0.030616556 N/A
FATA AT3G25110 1 C 2.653147656 4.80035E-06 1
KCR1 AT1G67730 1 A, D, E 1.049408074 0.01214747 6
KCS1 AT1G01120 1 D 1.027052723 0.045393104 4
KCS10 AT2G26250 2 D 1.077585019 0.003067057 5
LACS9 AT1G77590 2 B, C, E -1.780734298 0.000111006 11
MFP2 AT3G06860 1 B, E -4.415321265 1.28306E-19 N/A
Zinc binding alcohol
dehydrogenase family protein AT5G42250 1 B 1.29067737 0.007552823 N/A
A = Biosynthesis of unsaturated fatty acids; B = Fatty acid degradation; C = Fatty acid biosynthesis; D = Fatty acid elongation; E =
Fatty acid metabolism; N/A = gene has not been assigned to a specific chromosome in the genome assembly
93
Table 4.3 – Fatty acid QTL co-localization with differentially expressed isoforms
Trait Chromosome cM One LOD interval # of DE isoforms # of DE physical positions DE FA isoforms
% Palmitic acid 6 26.3 18.4-41.7 12 10 -
% Palmitic acid 17 39.6 33.8-43.1 7 3 -
% Stearic acid 6 24.3 20.3-28.3 3 3 -
% Stearic acid 10 50.7 44.7-62.2 7 6 -
% Oleic acid 1 19.5 3.3-25.5 58 49 FATA
% Oleic acid 3 63.3 41.9-69.3 37 26 -
% Oleic acid 6 22.3 16.4-28.3 4 4 -
% Linoleic acid 3 47.9 39.9-61.2 34 25 -
% Linoleic acid 6 24.3 18.4-28.3 4 4 -
All QTL from Burke et al., 2005
94
Table 4.4 – Gene ontology enrichment terms
Name GO ID Corrected P value
Response to stress GO:0006950 0.00009037
Defense response GO:0006952 0.0002111
Iron ion transport GO:0006826 0.0003512
Embryo development GO:0009790 0.005188
Lipid transport a GO:0006869 0.00503
Transition metal ion transport GO:0000041 0.007719
RNA metabolic process GO:0016070 0.009252
Multicellular organismal development b GO:0007275 0.009424
Nucleic acid metabolic process GO:0090304 0.01467746
Gene expression GO:0010467 0.03542668 a Same GO category as lipid localization (GO:0010876);
b same GO category as single-
multicellular organism process (GO:0044707). All terms significant at a level of FDR < 0.01.
95
Figure 4.1 - Frequency histogram of sequencing effort across 20 RNA-seq libraries in wild
sunflower
0
1
2
3
4
5
6
<)1 1)*2.5 2.5)*5 5)*7.5 7.5)*10 10)*12.5 12.5)*15 20)*30 >30
Number of
libraries
Number of fragments sequenced in millions
96
Figure 4.2 – Plot of log fold change in gene expression between Texas and Canada across the
sunflower genome. Black dots represent isoforms that were not significantly differentially
expressed. Red dots represent significantly differentially expressed genes. Positive log fold
change indicates higher expression in Texas relative to Canada.
97
CHAPTER V
CONCLUSIONS
In order to understand the process of local adaptation at the molecular level it is crucial to
both identify phenotypically divergent populations and then characterize their molecular
variation to elucidate the changes that closely associate with the respective populations. In short,
a modern study of local adaptation requires one to characterize the extent and distribution of
heritable genetic variation in the form of common garden phenotypes, DNA genotypes, and gene
expression patterns. The extent to which a locus shows elevated structure among populations
may suggest an adaptive role for that particular genomic region (Beaumont and Nichols 1996).
In sunflower, the large size of the natural range makes the species amenable to studies of
adaptation. In particular, the availability of genomic resources in the related crop species further
solidifies this group as an ecological model system.
Across a latitudinal transect extending from Texas to Canada, I found a tremendous
amount of phenotypic diversity within the species. Flowering time was structured by latitude in
which northern populations tend to flower earlier compared to southern populations. This general
pattern could be an adaptation related to growing season. Environments closer to the equator
experience longer stretches of favorable growing conditions, which may have caused populations
to become differentiated with respect to the transition to reproductive growth. The short growing
season in high latitudes may have selected for variants that promote quick growth before adverse
conditions set in. The pressures at high latitudes could include changes in pollinator abundance
that track climate variation.
98
Latitude was related to other important traits such as the percent of seed oil that contains
saturated fatty acids. As sunflower is an oil seed crop, there is a general interest in understanding
the genetic variation underlying this trait. In wild sunflower I corroborated and further extended
a phenotypic survey of fatty acid variation across a North American transect. Both my work, and
that of Linder (2000), have shown that northern populations have significantly less saturated
fatty acids than southern populations. My work extended this result by fine scale sampling
throughout the continent, which revealed that the difference in saturated fatty acids between
Texas and Canada is rather gradual. The selective pressure that generates the above pattern may
relate to germination temperatures. The warmer temperatures in Texas may have selected for
individuals that produce more of the high energy, high melting point saturated fatty acids.
Relatively cool germination temperatures in Canada result in plants that have made a
compromise: they produce more unsaturated fatty acids, which have lower energy associated
with them yet due to their low melting point that energy is accessible at low temperatures.
Population genetic variation mirrored the distributions of flowering time and fatty acid
composition traits. Nearly 250 markers identified a broad north south structuring of genetic
variation. The identification of population structure allowed me to then test for which loci were
exceptionally more differentiated than one would expect by chance. By performing FST outlier
analyses I was able to identify eight genes that showed an elevated structure relative to the
genome-wide average. A gene identified by the approach, FT2, has numerous additional lines of
evidence suggestive of its role in adaptation including QTL and gene expression analyses. This
investigation clearly set the stage for additional investigations of genetic regions for harboring
the genetic signature of local adaptation.
99
Population genomic protocols such as Genotyping-by-Sequencing (GBS) have allowed
researchers to genotype many individuals at many loci (Elshire et al., 2011). Techniques like
GBS are revolutionizing the way population geneticists go about detecting the molecular changes
conferring adaptation. My work has used markers derived from a modified GBS protocol to
perform a higher density sampling of wild sunflower genomes. By specifically targeting a subset
of populations from my original sampling, one population each from Texas, Nebraska, Canada, it
was possible to identify over 11,000 SNP markers. These additional markers were used to assess
the levels of population genetic variation, structure, and to identify elevated genetic structure
throughout the genome. In terms of population genetic structure, all populations were clearly
separated from one another as determined by a STRUCTURE analysis (Pritchard et al., 2000).
However, the Nebraskan population was found to be slightly more similar in overall allele
frequencies to the Canadian population in terms of various estimates of genetic structure.
As these populations were previously genotyped with the above SNP chip markers, it is
important to consider the extent to which these datasets identify similar and/or different patterns
of genetic variation. Despite the vast differences in marker number, both datasets identified
similar population genetic structuring of variation. Specifically, by re-running STRUCTURE on
only the SNP chip data from the three populations that eventually were GBS genotyped, I found
the same pattern in which three genetic clusters were most likely, and those closely corresponded
to the initial USDA sampling were identified. This was an important result because it indicated
that the two classes of markers were capable of identifying the same type of genetic structure. A
second consideration with the newly developed GBS markers was whether they would identify
the same genomic regions as having elevated genetic structure. On the whole, it was found SNP
chip outliers generally were not physically close to GBS outliers. The closest instances of co-
100
localization of FST outliers between the two datasets involved polymorphisms over one megabase
apart from one another. It is necessary then to consider why these two datasets did not identify
the same genomic regions as being relevant to local adaptation. One possible reason could be a
lack a lack of compatible restriction cut sites nearby the SNP chip polymorphisms. Alternatively,
many loci were dropped from the analysis due to small sample sizes in one of the three
populations. It is therefore possible that a SNP chip outlier could have been re-identified in my
analysis had libraries been sequenced to a greater depth. Another possible explanation would be
the extent of populations sampled with the two technologies. The SNP chip based analysis of
local adaptation addressed pairwise FST between all combinations of the 15 latitudinally
distributed populations. This is in stark contrast to the GBS work that targeted three populations,
and thus sampled a much reduced pool of genetic variation. Despite the two markers classes not
identifying the same genomic regions as being important for adaptation as described above, it is
certainly true that the GBS markers facilitated a much broader sampling of the sunflower
genome compared to the initial 246 SNPs used previously.
SNPs identified in the GBS dataset were found to co-localize with a number of QTL for
both flowering time and fatty acid biosynthesis traits. These co-localizations suggest that these
genomic intervals, rather than the individual SNPs, may be important in sunflower adaptation to
the environment. When analyzing the annotations of the genes with outlier SNPs I found that
three of the genes had roles in flowering time. A NAC domain containing protein 52 homolog,
ANAC52, is a highly differentiated gene found to co-localize with a QTL on chromosome 9.
Interestingly this gene has been shown to interact with FT in A. thaliana. As FT was already
shown to play a role in sunflower adaptation, this candidate presents an interesting situation in
which sequential genes in a developmental pathway show elevated differentiation. In a parallel
101
fashion, a homolog of Early is Short Days 7, a flowering time gene, has both elevated
differentiation, and affects the expression of FT. Thus, the increased marker density has
drastically improved my ability to detect whether or not a genomic region has elevated
differentiation. Of course, a more detailed understanding of the extent of linkage disequilibrium
will be required to more conclusively state whether these candidates are the targets of selection,
or linked to the true selected locus. Despite the increased marker density, none of the outlier
genes were annotated as playing a role in fatty acid biosynthesis.
In order to understand the genetics of fatty acid profile based adaptation I took a targeted
approach that focused on the extremes of the natural range, Texas and Canada. Using RNA-seq
to assay the levels of gene expression of developing sunflower seeds I sought to identify
differentially expressed genes that may contribute to this ecologically and agronomically
important trait. Upon analyzing the gene ontology terms of all genes with respect to the extent of
differential expression, I found a number of over represented terms. In particular, lipid transport
was over-represented, which make sense given I sampled developing seed tissue in an oil
producing species.
I found that a number of isoforms related to fatty acid biosynthesis, modification, and
degradation were differentially expressed between the extremes of my latitudinal transect. In
particular, a homolog of FAD2 was more highly expressed in Canada compared to Texas, a
pattern which is consistent with observed phenotypes and theory. Interestingly, additional genes
responsible for breaking down unsaturated fatty acids fatty acids were more highly expressed in
southern individuals. These patterns may suggest that the observed phenotypic differences in
saturated fatty acid percentages reflect both differential production, as well as differential break
down of fatty acids. Future experiments that specifically look at time course sampling of
102
developing seeds may yield a more nuanced understanding of how gene expression is affecting
differential seed phenotypes in wild sunflower populations. The differential expression analyses
provided a means for testing the potential functional relevance of FST outliers identified earlier.
Specifically, I found that two genes, a kinesin motor protein and a phosphoenolpyruvate
carboxylase, that were both a FST outlier and differentially expressed. This result highlights the
importance of pairing population genome scans with RNA-seq experiments. Since both GBS and
RNA-seq do not require a priori information, these techniques are particularly attractive when
working in non-model systems.
Together, these investigations highlight the extent of intraspecific variation in H. annuus.
I have shown that traits, allele frequencies, and gene expression patterns exhibit significant
differences with respect to latitude, and by association, climate. Molecular studies of adaptation
in wild species are an important step in understanding how the species adapt to a changing
environment. The variants identified in this work may be the types of changes that are selected
on in the coming years in the context of higher latitudes becoming warmer, and thus more like
lower latitudes. Furthermore, the introgression of these adaptive genomic regions into cultivated
germplasm could provide a potential to extend the production areas of sunflower and/or maintain
current yields in novel climatic conditions. The GBS outliers and differentially expressed genes
are clear candidates for future target experiments to further clarify the precise genetic mechanism
of adaptation in wild sunflower.
References
Beaumont MA, Nichols RA (1996) Evaluating loci for use in the genetic analysis of population
structure. Proceedings of the Royal Society B-Biological Sciences 263: 1619-1626.
103
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust, simple
genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:
e19379.
Linder CR (2000) Adaptive evolution of seed oils in plants: Accounting for the biogeographic
distribution of saturated and unsaturated fatty acids in seed oils. American Naturalist 156:
442-458.
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus
genotype data. Genetics 155: 945-959.
104
APPENDIX A
Supporting information for chapter II
Supplemental table 2.1 – SNP genotypes for 286 individuals at 246 loci.
Supplemental table 2.2 – Raw trait values for 286 common garden grown individuals
Supplemental table 2.3 – Results of REML analysis of phenotype data. Each tab contains
statistical results for a single trait. Additionally, the result of a Tukey’s test to determine
significant regional effects is presented on the right portion of each sheet.
* Note that supplemental tables 2.1, 2.2, and 2.3 are Excel documents that will be uploaded to
the journal website
Supplemental figure 2.1 – Delta K plot of STRUCTURE results. STRUCTURE harvester
results indicate K=2 as being the most likely number of groupings within the full dataset.
105
Supplemental figure 2.2 – STRUCTURE bar plot of southern regions. Populations correspond
to those in Table 1.
TX1 TX2 TX3 TX4 TX5 OK1
106
Supplemental figure 2.3 – Delta K plot for southern STRUCTURE plot found in Supplemental
figure 2.2.
107
Supplemental figure 2.4 – STRUCTURE bar plot of northern regions. Populations correspond
to those in Table 1.
MT1 MT2 ND1 SAS1 SAS2 SAS3
108
Supplemental figure 2.5 – Delta K plot for northern STRUCTURE plot found in Supplemental
figure 2.4.
109
Supplemental figure 2.6 – STRUCTURE bar plot corresponding to K = 6 for the six
populations with the southern two regions.
110
APPENDIX B
Supporting information for chapter III
Supplemental figure 3.1 – Sequencing effort per library
<1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 >10
Library sequencing distribution
Reads X 10^6
To
tal
02
46
810
12
14
111
Supplemental figure 3.2 – SNP density per chromosome
0
200
400
600
800
0e+00 1e+08 2e+08 3e+08
Length
Nu
m_
Sta
cks
112
Supplemental figure 3.3 – STRUCTURE delta K plot
113
Supplemental figure 3.4 – STRUCTURE bar plot based on previous SNP genotyping.
114
Supplemental figure 3.5 – STRUCTURE delta K plot for the three populations genotyped with
a SNP chip.
115
APPENDIX C
Supporting information for chapter IV
Supplemental table 4.1 – Top ten most differentially expressed nuclear isoforms with higher expression in Canada
Gene ID At ID logFC FDR Chromosome
NHL domain-containing protein AT1G23880 -7.209152704 4.63548E-40 4
DNAse 1 - like AT3G58580 -5.338613346 1.75265E-38 N/A
Phototropic-responsive NPH3 family protein AT5G47800 -5.504416002 7.35257E-33 2
Polyadenylate-binding protein RBP45B AT1G11650 -8.650369689 6.36317E-31 8
Serine carboxypeptidase-like 49 AT3G10410 -7.292796909 2.50579E-30 8
Polyadenylate-binding protein RBP45B AT1G11650 -7.603261431 5.51519E-28 8
Hypothetical protein AT2G18100 -6.1607632 6.70907E-28 5
Phototropic-responsive NPH3 family protein AT5G47800 -5.177943316 2.30409E-27 2
DNAse 1 - like AT3G58580 -4.4200042 2.32101E-27 N/A
Serine carboxypeptidase-like 49 AT3G10410 -6.939534621 2.36725E-27 8
116
Supplemental table 4.2 – Top ten most differentially expressed nuclear isoforms with higher expression in Texas
Gene ID AT ID logFC FDR Chromosome
GRAM domain family protein AT5G13200 7.062037115 9.66701E-22 N/A
Tubby-like protein 8 AT1G16070 4.949754276 1.81253E-21 9
GRAM domain family protein AT5G13200 6.679200448 2.07451E-21 N/A
Hypothetical protein AT1G70750 3.768337654 4.38976E-16 12
Kunitz family trypsin and protease inhibitor protein AT1G17860 3.588530937 9.48592E-13 7
TIR-NBS-LRR class disease resistance protein AT4G16920 2.422635816 5.13868E-12 5
Hypothetical protein AT2G45520 2.223236181 1.23528E-10 3
Small heat shock protein 23.6 AT4G25200 4.907818042 1.4306E-10 3
Hypothetical protein AT1G70750 2.575602155 6.53694E-10 8
Sulphur deficiency-induced 1 AT5G48850 4.121138197 1.71722E-09 13
117
Supplemental table 4.3 - Auto-correlation analysis of physical genome position and expression
similarity
Chromosome r p-value1
1 0.007319802 0.28
2 -0.02492382 0.958
3 0.008872539 0.285
4 -0.01529262 0.843
5 0.007694233 0.261
6 -0.009814005 0.653
7 -0.002112057 0.563
8 0.01544674 0.175
9 0.00214163 0.417
10 -0.005591531 0.725
11 0.03373421 0.019
12 -0.01681713 0.849
13 0.00475819 0.398
14 0.01102912 0.281
15 0.02180371 0.097
16 0.01854987 0.1
17 0.02069085 0.095
1 New critical value = 0.05/17 = 0.003
118
Supplemental figure 4.1 – Differentially expressed fatty acid isoforms 1. Gray libraries (1-8)
represent southern individuals and white libraries (9-19) represent northern individuals. Isoforms
A B
C D
E F
119
represent sunflower homologs of the following A. thaliana genes: A) KCR1, B) FAB2, C)
FAD8, D) FAD2, E) ADH1, F) ADH1.
120
Supplemental figure 4.2 – Differentially expressed fatty acid isoforms 2. Gray libraries (1-8)
represent southern individuals and white libraries (9-19) represent northern individuals. Isoforms
A B
C D
E F
121
represent sunflower homologs of the following A. thaliana genes: A) ADH1, B) ADH1, C)
ADH1, D) ADH1, E) LACS9, F) LACS9.
122
Supplemental figure 4.3 – Differentially expressed fatty acid isoforms 3. Gray libraries (1-8)
represent southern individuals and white libraries (9-19) represent northern individuals. Isoforms
E
A B
C D
F
123
represent sunflower homologs of the following A. thaliana genes: A) MFP2, B) Zinc binding
alcohol dehydrogenase family protein, C) AACT1, D) FAB1, E) FAB1, F) FATA.
124
Supplemental figure 4.4 – Differentially expressed fatty acid isoforms 4. Gray libraries (1-8)
represent southern individuals and white libraries (9-19) represent northern individuals. Isoforms
125
represent sunflower homologs of the following A. thaliana genes: A) BCCP, B) CAC2, C) KCS1,
D) KCS10, E) KCS10, F) alpha/beta hydrolase super family.
126
Supplemental figure 4.5 – Differentially expressed fatty acid isoforms 5. Gray libraries (1-8)
represent southern individuals and white libraries (9-19) represent northern individuals. Isoforms
represent sunflower homologs of the following A. thaliana genes: A) ECI3, B) ECI3.
A B