range-wide studies of genetic and transcriptomic …

RANGE-WIDE STUDIES OF GENETIC AND TRANSCRIPTOMIC DIVERSITY IN

SUNFLOWER

by

EDWARD VINCENT MCASSEY IV

Under the Direction of John M. Burke

ABSTRACT

With the pressures of climate change and human population growth affecting agricultural

land use, it is important to consider the use of wild adaptations to maintain and increase where

crops can be grown. The study of the genetic basis of local adaptation has accelerated with ease

of collecting sequence data from multiple individuals from across the range of a species. Here, I

present phenotypic, genotypic, and expression based studies of variation in wild sunflowers

across a latitudinal gradient in North America. I found that flowering time and saturated fatty

acid percentage of seeds were differentiated across the range, with northern populations both

flowering earlier and having less saturated fatty acids in their seeds. In order to understand the

genetic basis of these traits I genotyped individuals with two different marker technologies, a

SNP chip and Genotyping-by-Sequencing, in order to identify regions of the genome that were

exceptionally differentiated when comparing northern and southern populations. An analysis of

population genetic variation revealed a number of candidate regions for local adaptation

including multiple members of the flowering time pathway. To complement the study of

population genetic variation as it relates to local adaptation, I performed RNA-sequencing to

identify genes that may be influencing the differences in fatty acid saturation in wild sunflower.

When comparing expression levels of developing northern and southern wild sunflower seeds, I

found a number of differentially expressed genes, some of which were annotated as part of the

fatty acid biosynthesis pathway. Taken together, the genetic differentiation outliers and

differentially expressed genes represent excellent candidates for follow up experiments.

Importantly, by mapping these variants against the sunflower genome, I was able to further

prioritize candidates by assessing whether or not they co-localizing with important QTL. Future

work will focus on establishing the extent of linkage disequilibrium in these genomic intervals to

clarify the individual role of these putative adaptive variants.

INDEX WORDS: Gene expression, latitudinal variation, local adaptation, population

genetics, sunflower


SUNFLOWER

by


BA, Vanderbilt University, 2010

A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

DOCTOR OF PHILOSOPHY

ATHENS, GEORGIA

2015


SUNFLOWER

by


Major Professor: John M. Burke

Committee: Mike Arnold

Katrien Devos

Jim Leebens-Mack

CJ Tsai

Electronic Version Approved:

Suzanne Barbour

Dean of the Graduate School

The University of Georgia

December 2015

iv

DEDICATION

I first dedicate this work to my family; my parents, Ed and Linda, provided me with a

loving home, excellent education and constant support. My sisters, Danielle, Jackie, and

Kathleen, as well as my grandparents Ed and Anne, have supported me along the way.

Additionally, I would like to dedicate this work to my wife, Karolina Heyduk, who has helped

and supported me for the past 3.5 years.

This dissertation is finally dedicated to the memory of Dr. Dave McCauley, who first

stirred my interest in studying genetic diversity when he allowed me to join his lab as an

undergrad at Vanderbilt University. I am eternally grateful for his support and advice that led me

to pursue a career in biology.

v

ACKNOWLEDGEMENTS

Many people have made this work possible. First, John Burke has provided me with

numerous opportunities to grow as a confident and independent scientist. His input has helped

frame and steer my research in new and exciting directions. The advisement I’ve received from

my committee members - Mike Arnold, Katrien Devos, Jim Leebens-Mack, and CJ Tsai - has

been crucial in helping me refine my understanding and analysis of key questions in plant

genomics. Burke lab members past and present including Adam Bewick, John Bowers, Jo Corbi,

Caitlin Ishibashi, Jen Mandel, Rishi Masalia, Savithri Nambeesan, Stephanie Pearl and Evan

Staton have all played a role by helping with analyses, discussing research strategies, and

providing valuable feedback. The University of Georgia has provided numerous resources that

assisted this work including greenhouse facilities provided by the Crop and Soil Science and

Plant Biology Departments, computing resources from the Georgia Advanced Computing

Resource Center, and sequencing from the Georgia Genomics Facility.

vi

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS .............................................................................................................v

LIST OF TABLES ....................................................................................................................... viii

LIST OF FIGURES .........................................................................................................................x

CHAPTER

I INTRODUCTION AND LITERATURE REVIEW .....................................................1

References ..............................................................................................................11

II RANGE-WIDE PHENOTYPIC AND GENETIC DIFFERENTIATION IN WILD

SUNFLOWER .............................................................................................................15

Abstract ..................................................................................................................16

Introduction ............................................................................................................17

Materials and Methods ...........................................................................................19

Results ....................................................................................................................23

Discussion ..............................................................................................................27

References ..............................................................................................................32

III GENOMIC PATTERNS OF SNP VARIATION AND THE GENETIC BASIS OF

LOCAL ADAPTATION IN WILD SUNFLOWER ...................................................44

Abstract ..................................................................................................................45

Introduction ............................................................................................................46


vii

Results and Discussion ..........................................................................................54

References ..............................................................................................................61

IV TRANSCRIPTOMIC ANALYSIS OF DEVELOPING SEEDS ACROSS THE

RANGE OF WILD SUNFLOWER .............................................................................69

Abstract ..................................................................................................................70

Introduction ............................................................................................................71


Results ....................................................................................................................79

Discussion ..............................................................................................................81

References ..............................................................................................................87

V CONCLUSIONS..........................................................................................................97

References ............................................................................................................102

APPENDICES

A Supporting information for chapter II ........................................................................104

B Supporting information for chapter III ......................................................................110

C Supporting information for chapter IV ......................................................................115

viii

LIST OF TABLES

Page

Table 2.1: Range-wide population sampling information .............................................................38

Table 2.2: Population genetic statistics for 15 wild sunflower populations ..................................39

Table 2.3: Summary of candidates for genes involved in local adaptation ...................................40

Table 3.1: Levels of population genetic diversity in three populations across a latitudinal gradient

in North America ...............................................................................................................64

Table 3.2: Pairwise population structure among three wild sunflower populations ......................65

Table 3.3: Co-localization of FST outliers with QTL .....................................................................66

Table 4.1: Original sample locations and associated sampling depth ...........................................91

Table 4.2: Differentially expressed fatty acid isoforms .................................................................92

Table 4.3: Fatty acid QTL co-localization with differentially expressed isoforms .......................93

Table 4.4: Gene ontology enrichment terms ..................................................................................94

Supplemental table 2.1: SNP genotypes for 286 individuals at 246 loci .....................................104

Supplemental table 2.2: Raw trait values for 286 common garden grown individuals ...............104

Supplemental table 2.3: Results of REML analysis of phenotype data .......................................104

Supplemental table 4.1: Top ten most differentially expressed nuclear isoforms with higher

expression in Canada ......................................................................................................115

Supplemental table 4.2: Top ten most differentially expressed nuclear isoforms with higher

expression in Texas .......................................................................................................116

ix

Supplemental table 4.3: Auto-correlation analysis of physical genome position and expression

similarity .........................................................................................................................117

x

LIST OF FIGURES

Page

Figure 2.1: Map of the locations of the 15 populations used in this study in the central USA and

Canada found in Table 1. ...................................................................................................42

Figure 2.2: STRUCTURE bar plot of full dataset .........................................................................43

Figure 3.1: Original USDA sampling location for the three populations genotyped in this

study…. ..............................................................................................................................67

Figure 3.2: Bar plot indicating the proportion of membership to three genetic clusters as

identified in the program STRUCTURE ...........................................................................68

Figure 4.1: Frequency histogram of sequencing effort across 20 RNA-seq libraries in wild

sunflower............................................................................................................................95

Figure 4.2: Plot of log fold change in gene expression between Texas and Canada across the

sunflower genome ..............................................................................................................96

Supplemental figure 2.1: Delta K plot of STRUCTURE results .................................................104

Supplemental figure 2.2: STRUCTURE bar plot of southern regions ........................................105

Supplemental figure 2.3: Delta K plot for southern STRUCTURE plot found in Supplemental

figure 2.2 ..........................................................................................................................106

Supplemental figure 2.4: STRUCTURE bar plot of northern regions ........................................107

Supplemental figure 2.5: Delta K plot for northern STRUCTURE plot found in Supplemental

figure 2.4 ..........................................................................................................................108

xi

Supplemental figure 2.6: STRUCTURE bar plot corresponding to K = 6 for the six populations

with the southern two regions ..........................................................................................109

Supplemental figure 3.1: Sequencing effort per library...............................................................110

Supplemental figure 3.2: SNP density per chromosome .............................................................111

Supplemental figure 3.3: STRUCTURE delta K plot ..................................................................112

Supplemental figure 3.4: STRUCTURE bar plot based on previous SNP genotyping ...............113

Supplemental figure 3.5: STRUCTURE delta K plot for the three populations genotyped with a

SNP chip. .........................................................................................................................114

Supplemental figure 4.1: Differentially expressed fatty acid isoforms 1 ....................................118





1

CHAPTER I

INTRODUCTION AND LITERATURE REVIEW

The diversity of morphological and physiological phenotypes seen when comparing

populations of the same species may be the result of adaptation to different environments.

Understanding the process of local adaptation is paramount in evolutionary biology as it is a

stepping stone towards the genesis of biodiversity over long timescales. The first half of the 20th

century provided numerous insights into the adaptive role of intraspecific diversity in numerous

taxa (Clausen et al., 1941; Hiesey et al., 1942; Clausen et al., 1947). However, the lack of genetic

tools precluded the understanding of what genetic changes were associated with the pattern of

adaptation. Present day genomic technologies, when paired with prior evidence of a trait’s role in

adaptation, have allowed for an unprecedented look at the genomes and transcriptomes of wild

individuals in order to document molecular patterns consistent with adaptation.

Historical study of local adaptation

Local adaptation represents the situation in which a particular population has higher

fitness in the environment which it is found in as opposed to other environments occupied by

different populations. To study this phenomenon, ecologists typically perform reciprocal

transplants where they grow members of a population in both their ‘home’ environment where

the population is naturally found and an ‘away’ environment where a different population is

normally located (Bradshaw 1960; Warwick and Briggs 1980; Waser and Price 1985; Wang et

al., 1997; Kawecki and Ebert 2004; Bennington et al., 2012). Each reciprocal transplant is

actually a series of common gardens whereby environmental factors are constant so that all

2

differences found in phenotypes can be solely attributed to differences in the underlying genetics.

Some of the most convincing evidence of local adaptation comes from the earliest experiments

done on the subject matter by Turesson in the 1920’s and Clausen, Keck and Hiesey in the

1930’s through the 1950’s (Turesson 1925; Clausen et al., 1941; Hiesey et al., 1942; Clausen et

al., 1947). These classic experiments combined common gardens and reciprocal transplants with

numerous taxa and quantitative measurements of fitness to determine whether or not local

populations were indeed adapted to their particular home environment. The motivation of these

particular experiments began with the observation of intraspecific diversity occurring in nature.

In particular, early researchers noticed that certain types of phenotypic traits tended to associate

with environmental and/or geographic factors.

For Clausen, Keck and Hiesey, the usage of an elevational gradient proved extremely

beneficial in the dissection of adaptation. Differences in altitude play a unique role in the study

of adaptation because they allow the comparison of plant populations that exist at the same

latitude but who differ in elevation (and its associated environmental changes). Clausen, Keck

and Hiesey, and others since then (Angert and Schemske 2005; Byars et al., 2007; Gonzalo-

Turpin and Hazard 2009), leveraged this experimental set up to study adaptation by establishing

common gardens across three different elevations. At each elevation, Clausen, Keck and Hiesey

planted out populations of California flora collected from each elevation and eventually

measured traits such as flowering time. By comparing the populations within a particular

common garden as well as the same populations grown at multiple elevations the authors were

able to assess extent to which traits were plastic (large environmental component) versus genetic.

Taken together, the work of these authors indicated that intraspecific variation needed to be

3

studied at a more quantitatively rigorous level and that much of the intraspecific variation had a

genetic component. The main goal of common garden studies is to deduce whether trait variation

has a genetic component. Determining the genetic differences that control trait variation became

an increasingly popular avenue of research as the 20th

century progressed. While common garden

studies of trait variation do not necessary speak to whether a trait is adaptive, they do indicate

which traits are most divergent, and therefore serve to prioritize traits for additional

investigations.

The advent of new genetic marker systems, coupled with traditional genetic mapping and

population genetic approaches, finally allowed researchers to get closer to establishing the

connection between genotype and phenotype. Population genetic analyses using allozymes

represent the earliest attempt to connect allele frequency variation among populations with

phenotypic variation seen in common garden grown plants. As allozyme work began to take off

in the late 60’s and early 70’s, the lab of Robert Allard at the University of California at Davis

began a classic set of population genetic studies that sought to establish a connection between

allelic variation and adaptation to soil water status in wild oat, Avena barbata. Briefly, the work

of Allard and colleagues showed that certain allelic combinations from a variety of allozyme loci

were found in a higher frequency of individuals residing in more mesic environments in

California (Clegg and Allard 1972; Hamrick and Allard 1972; Hamrick and Holden 1979). This

particular observation represented a purported example of a co-adapted gene complex (i.e., a set

of interacting loci that are selectively advantageous when particular alleles are found in the same

individual [Allard et al., 1972]). Focusing on population genetic differentiation and tying it to

divergence in environments highlighted the possibility that populations may be under natural

selection.

4

Range-wide studies of genetic diversity

Assessing the amount of genetic diversity in a species is crucial for several reasons. In the

context of the maintenance of biodiversity it is important that organisms with valuable ecological

roles have sufficient genetic variation to adapt to a changing environment. For plant species, this

is an extremely important avenue of research because these organisms cannot migrate over the

course of their life to track their ideal climate. It should be noted that human mediated

transplantation is an option to mitigate the sessile nature of plants, but it will require substantial

resources (Vitt et al., 2010). Instead, plants must rely on seed dispersal to colonize new habitats.

Assuming that transplantation and seed dispersal are relatively unlikely sources of climate

tracking, the only other option for plants is to adapt to changing biotic and abiotic forces.

Range-wide studies are important because they establish the levels of genetic diversity

that exist within populations, which is in turn required for local adaptation (Li and Adams 1989;

King et al., 2001; Spinks and Shaffer 2005; Bryja et al., 2010). Local adaptation across a range

can be explored by using latitudinal or longitudinal transects, both of which will correlate to a

number of biotic and abiotic factors that can drive local adaptation. For example, a latitudinal

transect across a species’ range correlates to differences in photoperiod, growing season,

maximum and minimum temperatures, as well as the frequencies of biotic and/or abiotic stresses

such as drought, nutrient deficit, herbivory, and pathogen load. Additionally, longitude tends to

correlate with precipitation in conjunction with the presence of a mountain range. The above

mentioned climatic factors are often implicated in driving adaptive differentiation (Franks et al.,

2007; Hancock et al., 2011; Colautti and Barrett 2013). However, to understand if the genetic

5

variation seen across a transect is related to local adaptation, it requires an understanding of the

levels of genetic diversity across all populations.

Range-wide studies of genetic variation in the progenitor species of crops is extremely

important for crop improvement. When genetic variation for traits has been limited in crops, the

main avenues for crop improvement become either using wild diversity (Tanksley and McCouch

1997), mutagenized populations (Wulff et al., 2004) or performing transgenic engineering

(Koziel et al., 1993). Since wild progenitor species have to constantly be adapting to a changing

world over evolutionary timescales they generally contain greater genetic and phenotypic

diversity than related crops (Burke et al., 2007). This diversity can be in the form of disease

resistance, drought resistance or yield improvement factors among many others. Thus, assessing

the level of diversity in natural populations can catalog potential variation that breeders may

require in the future. Unfortunately, there are drawbacks associated with the usage of wild

germplasm for crop improvement. When wild species are crossed to move an interesting trait

into a crop, large chromosomal regions containing many genes other than the target gene of

interest get incorporated as well, a process known as linkage drag. The genes found in this

linkage block often contain alleles that while presumably useful in the wild actually negatively

affect crop production. This necessitates a more nuanced understanding of the genomic intervals

controlling adaptive traits.

Leveraging genomics for answering questions at the level of natural populations

To identify the genetic signature of local adaptation it is essential to genotype many loci

throughout the genome in many individuals. The early work of Allard and colleagues sought to

6

address questions of adaptation only using 5 allozyme loci (Hamrick and Allard 1972). Over the

following decades amplified fragment length polymorphisms (AFLPs), simple sequence repeats

(SSRs), and SNP chips were all employed to achieve the same general goal: genotype

individuals at more loci without a corresponding increase in cost and labor. The general quick

decay of linkage disequilibrium in natural populations necessitates a large number of loci in

order to fully cover the genome. In order to bypass some of the negative aspects associated with

other marker systems (time, requirement of a priori sequence information, presence-absence

marker types), the most recent advance has been restriction site associated DNA sequencing

(RAD-seq, Baird et al., 2008) and Genotyping-by-Sequencing (GBS, Elshire et al., 2011). These

techniques leverage the fact that many restriction sites within the genome will be conserved

between individuals of the same species. Assuming that restriction sites are conserved, the

adjacent sequences can then be obtained via high throughput sequencing and can be compared to

determine whether or not polymorphisms exist. The advantages of this technique is that it is

highly customizable due to the ability to choose restriction enzymes with recognition sites of

different lengths and nucleotide composition. Additionally, the usage of methylation sensitive

restriction enzymes allows for the preferential sequencing of hypomethylated genic regions. If

we assume genic regions of the genome house local adaptation polymorphisms, GBS based

interrogations are well suited to provide genome-wide markers that will be useful for studies of

adaptation.

Population genetic theory provides the framework that allows researchers to determine

which of the thousands of GBS markers that have been genotyped harbor a signature of

selection. Population genetic structure describes the extent to which genetic variation is found

7

among populations as opposed to within populations or within individuals. A typical measure of

population genetic structure is FST and its many derivatives. When standardized, this metric

ranges from 0 to 1 with higher values indicating more differentiation among populations and

lower values indicating that populations are relatively similar in genetic composition. Theory

suggests that demographic factors like population size, migration and recent bottlenecks affect

the genetic structure of all loci in a similar fashion. On the other hand, in the case of local

adaptation where alternative alleles are selected for by different environments, typically only the

genomic region around the causal variant has elevated population structure. Therefore, if one can

genotype a polymorphism in each independent region of the genome, it should be possible to

establish selected regions (that is, regions of high FST) from background ‘noise’.

Population genomic datasets have allowed for entirely new scales of evolutionary

analyses. First, in species with an annotated genome sequence, it is possible to anchor loci to the

genome in order to establish their physical location, allowing for GBS datasets to be integrated

with quantitative trait locus (QTL) mapping datasets (Hohenlohe et al., 2010). Assessing whether

or not loci with exceptional population structure co-localize with known QTL for putatively

adaptive traits provides insight into the likelihood of a candidate gene for underlying adaptation

in a particular system. Furthermore, the list of candidate adaptive genes can be curated by

determining if the highly structured polymorphism is in a candidate pathway in a model species

like Arabidopsis thaliana.

8

The importance of gene expression in evolutionary biology

Mutations in DNA provide the heritable genetic variation that natural selection acts upon.

As described above, these mutations (or mutations genetically linked to an adaptive mutation)

carry a signature of selection in the form of FST. Many studies have identified adaptation

occurring via structural mutations to protein coding sequence (Smith and Eyre-Walker 2002). If

one allele underwent a non-synonymous mutation increasing the protein’s ability to perform

some cellular process, it may be selected for in a particular environment and thus become known

as a local adaptation gene. A general question remains though: are mutations that alter a protein’s

functional ability the major avenue for adaptation at the molecular level?

The amount that a gene gets transcribed and the subsequent transcript gets translated

provides an alternative possibility for adaptation at the molecular level. In other words, in the

absence of any genetic polymorphisms in the coding sequence of a gene, the difference in gene

expression when comparing populations could explain an adaptive trait. Since adaptive traits are

by definition heritable, there must be genetic variation somewhere in the genome to explain the

difference in gene expression that may be occurring in a set of populations. Mutations in cis

regulatory promoter sequence could explain differences in gene expression among populations.

Alternatively, mutations in trans could occur in the form of a coding sequence mutation in an

important upstream transcription factor. Recently, more work has been done in non-model

systems on trying to determine the extent to which this genetic variation for gene expression is

correlated with adaptation to a specific environment (Schoville et al., 2012). As with GBS, the

quantity of data associated with high throughput sequencing of RNA (RNA-seq) both eliminated

the need for any a priori sequence information and at the same time provided a means for

conducting tests of differential expression across many genes. In addition to quantitatively

9

establishing gene expression levels, RNA-seq data provides nucleotide information that can be

mined for SNPs (Ellison et al., 2011).

Utility of the genus Helianthus for answering questions in evolutionary biology

Helianthus has played both a major historical and a contemporary role in the

advancement of evolutionary biology. This North American genus has species that vary for a

number of traits including ploidy, mating system, habitat, and range size (Reviewed in Kane et

al., 2013). It is this tremendous amount of variation that allows this genus to be a source of

studies of adaptation. For example, the work of Loren Rieseberg and colleagues has utilized

Helianthus to study how homoploid hybridization between H. annuus and H. petiolaris resulted

in a set of three species (H. anomalus, H. deserticola, and H. paradoxus) adapted to very

different conditions (Rieseberg et al., 2003). The evolutionary biology work in this genus has

been greatly aided by the resources associated with having a cultivated congener in the group.

Furthermore, the large natural range of common sunflower, H. annuus, suggests there has been

ample opportunity for populations to differentially adapt to a number of environments.

H. annuus was cultivated approximately 4,000 years ago in eastern North America

(Crites 1993). After subsequent improvement in both Europe and North America, this species

has become a rich source of both oil and confectionary seeds for human consumption. In order to

study the process of domestication and improvement to this crop, there have been a number of

QTL mapping populations developed over the years from a wide variety of crosses between

wild, cultivated, and landrace individuals (Burke et al., 2002; Burke et al., 2005; Wills and Burke

2007). Surveys of variation in crop and wild genomes have allowed for the identification of

selected regions during sunflower domestication (Mandel et al., 2014; Baute et al., 2015) as well

10

as the assessment of mutational patterns associated with the domestication bottleneck (Renaut

and Rieseberg, 2015). Genome level data has recently been used in the genus to address

questions of adaptation in ecotypes of H. petiolaris (Andrew and Rieseberg, 2013) and H.

annuus (Moyers and Rieseberg, 2013) as well as speciation in sister species pairs of Helianthus

(Renaut et al., 2014).

Purpose of study

This study characterizes the range-wide phenotypic and genetic diversity of H. annuus.

Before undertaking any study of adaptation it is essential to identify phenotypic differences

among populations and have an understanding of the level of genetic structuring in populations

of interest. Here I first establish baseline levels of genetic diversity and structure. In tandem, I

phenotype populations from across the range for traits such as flowering time, plant height,

branching, as well as seed physical dimensions, oil quantity, and fatty acid content. I use high

throughput sequencing to perform a genome-scan for high levels of genetic structure in a series

of three populations that span the latitudinal range of this species. By combining population

genetic evidence with independent QTL data I was able to further refine a list of putative

adaptive genes. Finally, I use RNA-seq to uncover differentially expressed genes that correlate

with the adaptive phenotype of fatty acid profile in wild sunflower seeds. Taken together, these

studies represent a multi-faceted view of adaptation in wild sunflower.

11

References

Allard RW, Babbel, G.R., Clegg, M.T., Kahler, A.L. (1972) Evidence for coadaptation in Avena

barbata. Proc Natl Acad Sci U S A 69: 3043-3048.

Andrew RL, Rieseberg LH (2013) Divergence is focused on few genomic regions early in

speciation: incipient speciation of sunflower ecotypes. Evolution 67: 2468-2482.

Angert AL, Schemske DW (2005) The evolution of species' distributions: reciprocal transplants

across the elevation ranges of Mimulus cardinalis and M. lewisii. Evolution 59: 1671-

1684.

Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, et al. (2008) Rapid SNP discovery and

genetic mapping using sequenced RAD markers. PLoS One 3: e3376.

Baute GJ, Kane NC, Grassa CJ, Lai Z, Rieseberg LH (2015) Genome scans reveal candidate

domestication and improvement genes in cultivated sunflower, as well as post-

domestication introgression with wild relatives. New Phytol 206: 830-838.

Bennington CC, Fetcher N, Vavrek MC, Shaver GR, Cummings KJ, et al. (2012) Home site

advantage in two long-lived arctic plant species: results from two 30-year reciprocal

transplant studies. Journal of Ecology 100: 841-851.

Bradshaw AD (1960) Population differentiation in Agrostis tenuis Sibth. III. Populations in

varied environments. The New Phytologist 59: 92-103.

Bryja J, Smith C, Konecny A, Reichard M (2010) Range-wide population genetic structure of the

European bitterling (Rhodeus amarus) based on microsatellite and mitochondrial DNA

analysis. Mol Ecol 19: 4708-4722.

Burke JM, Burger JC, Chapman MA (2007) Crop evolution: from genetics to genomics. Curr

Opin Genet Dev 17: 525-532.

Burke JM, Knapp SJ, Rieseberg LH (2005) Genetic consequences of selection during the

evolution of cultivated sunflower. Genetics 171: 1933-1940.

Burke JM, Tang S, Knapp SJ, Rieseberg LH (2002) Genetic analysis of sunflower domestication.

Genetics 161: 1257-1267.

Byars SG, Papst W, Hoffmann AA (2007) Local adaptation and cogradient selection in the

alpine plant, Poa hiemata, along a narrow altitudinal gradient. Evolution 61: 2925-2941.

Clausen J, Keck, D.D., Hiesey, W.M. (1941) Regional differentiation in plant species. The

American Naturalist 75: 231-250.

12

Clausen J, Keck, D.D., Hiesey, W.M. (1947) Heredity of geographically and ecologically

isolated races. The American Naturalist 81: 114-133.

Clegg MT, Allard, R.W. (1972) Patterns of genetic differentiation in the slender wild oat species

Avena barbata. Proc Nat Acad Sci USA 69: 1820-1824.

Colautti RI, Barrett, S.C.H. (2013) Rapid adaptation to climate facilitates range expansion of an

invasive plant. Science 342: 364-366.

Crites GD (1993) Domesticated sunflower in 5th millennium Bp temporal context - New

evidence from Middle Tennessee. American Antiquity 58: 146-148.

Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, et al. (2011) Population genomics and local

adaptation in wild isolates of a model microbial eukaryote. Proc Natl Acad Sci U S A

108: 2831-2836.

Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust, simple

genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:

e19379.

Franks SJ, Sim S, Weis AE (2007) Rapid evolution of flowering time by an annual plant in

response to a climate fluctuation. Proc Natl Acad Sci U S A 104: 1278-1282.

Gonzalo-Turpin H, Hazard L (2009) Local adaptation occurs along altitudinal gradient despite

the existence of gene flow in the alpine plant species Festuca eskia. Journal of Ecology

97: 742-751.

Hamrick JL, Allard, R.W. (1972) Microgeographical variation in allozyme frequencies in Avena

barbata. Proc Natl Acad Sci U S A 69: 2100-2104.

Hamrick JL, Holden, L.R. (1979) Influence of mircohabitat heterogeneity on gene frequency

distribution and gametic phase disequilibrium in Avena barbata. Evolution 33: 521-533.

Hancock AM, Brachi, B., Faure, N., Horton, M.W., Jarymowycz, L.B., Sperone, F.G.,

Toomajian, C., Roux, F., Bergelson, J. (2011) Adaptation to climate across the

Arabidopsis thaliana genome. Science 334: 83-86.

Hiesey WM, Clausen, J., Keck, D.D. (1942) Relations between climate and intraspecific

variation in plants. The American Naturalist 76: 5-22.

Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, et al. (2010) Population genomics

of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet

6: e1000862.

Kane NC, Burke JM, Marek L, Seiler G, Vear F, et al. (2013) Sunflower genetic, genomic and

ecological resources. Mol Ecol Resour 13: 10-20.

13

Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters 7: 1225-

1241.

King TL, Kalinowski ST, Schill WB, Spidle AP, Lubinski BA (2001) Population structure of

Atlantic salmon (Salmo salar L.): a range-wide perspective from microsatellite DNA

variation. Mol Ecol 10: 807-821.

Mandel JR, McAssey EV, Nambeesan S, Garcia-Navarro E, Burke JM (2014) Molecular

evolution of candidate genes for crop-related traits in sunflower (Helianthus annuus L.).

PLoS One 9: e99620.

Moyers BT, Rieseberg LH (2013) Divergence in gene expression is uncoupled from divergence

in coding sequence in a secondarily woody sunflower. International Journal of Plant

Sciences 174: 1079-1089.

Renaut S, Owens GL, Rieseberg LH (2014) Shared selective pressure and local genomic

landscape lead to repeatable patterns of genomic divergence in sunflowers. Mol Ecol 23:

311-324.

Renaut S, Rieseberg LH (2015) The accumulation of deleterious mutations as a consequence of

domestication and improvement in sunflowers and other Compositae crops. Mol Biol

Evol 32: 2273-2283.

Rieseberg LH, Raymond O, Rosenthal DM, Lai Z, Livingstone K, et al. (2003) Major ecological

transitions in wild sunflowers facilitated by hybridization. Science 301: 1211-1216.

Schoville SD, Barreto FS, Moy GW, Wolff A, Burton RS (2012) Investigating the molecular

basis of local adaptation to thermal stress: population differences in gene expression

across the transcriptome of the copepod Tigriopus californicus. BMC Evol Biol 12: 170.

Smith NG, Eyre-Walker A (2002) Adaptive protein evolution in Drosophila. Nature 415: 1022-

1024.

Spinks PQ, Shaffer HB (2005) Range-wide molecular analysis of the western pond turtle (Emys

marmorata): cryptic variation, isolation by distance, and their conservation implications.

Mol Ecol 14: 2047-2064.

Tanksley SD, McCouch SR (1997) Seed banks and molecular maps: unlocking genetic potential

from the wild. Science 277: 1063-1066.

Turesson G (1925) The plant species in relation to habitat and climate. Hereditas 6: 147-236.

Vitt P, Havens K, Kramer AT, Sollenberger D, Yates E (2010) Assisted migration of plants:

Changes in latitudes, changes in attitudes. Biological Conservation 143: 18-27.

14

Wang H, McArthur, E.D., Sanderson, S.C., Graham, J.H., Freeman, D.C. (1997) Narrow hybrid

zone between two subspecies of big sagebrush (Artemisia tridentata: Asteraceae). IV.

Reciprocal transplant experiments. Evolution 51: 95-102.

Warwick SI, Briggs, D. (1980) The genecology of lawn weeds. V. The adaptive significance of

different growth habitat in lawn and roadside populations of Plantago major L. The New

Phytologist 85: 289-300.

Waser NM, Price, M.V. (1985) Reciprocal transplant experiments with Delphinium nelsonii

(Ranunculacea): Evidence for local adaptation. American Journal of Botany 72: 1726-

1732.

Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domestication of

sunflower. Genetics 176: 2589-2599.

Wulff BBH, Thomas, C.M., Parniske, M., Jones, J.D.G. (2004) Genetic variation at the tomato

Cf-4/Cf-9 locus induced by EMS mutagenesis and intralocus recombination. Genetics

167: 459-470.

15

CHAPTER II

RANGE-WIDE PHENOTYPIC AND GENETIC DIFFERENTIATION IN WILD

SUNFLOWER1

1 McAssey, E.V., Corbi, J., Blackman, B.K., Burke, J.M. To be submitted to BMC Plant Biology

16

Abstract

Divergent phenotypes and genotypes are key signals for identifying the targets of natural

selection in locally adapted populations. Here, we use a combination of common garden

phenotyping for a variety of growth, plant architecture, and seed traits and SNP genotyping to

characterize range-wide patterns of diversity in 15 populations of wild sunflower (Helianthus

annuus L.) sampled along a latitudinal gradient in central North America. We analyzed

geographic patterns of phenotypic diversity, quantified levels of within-population SNP

diversity, and also determined the extent of population structure across the range of this species.

We then used these data to identify significantly over-differentiated loci as markers of genes or

genomic regions conferring local adaptation. Traits including flowering time, plant height, and

seed oil composition (i.e., percentage of saturated fatty acids) were found to correlate with

latitude, and thus differentiated northern vs. southern populations. Average pairwise FST was

found to be 0.21, and a STRUCTURE analysis identified two significant clusters that largely

separated northern and southern individuals. The significant FST outliers included a SNP in FT2,

a flowering time gene that has been previously shown to co-localize with flowering time QTL,

and exhibits a known cline in gene expression.

17

Introduction

Local adaptation, wherein populations have higher fitness in their ‘home’ environments

than in non-native locales, is a topic of great interest in the field of evolutionary biology (e.g.,

Kawecki and Ebert 2004). The genetic basis of such adaptive divergence has not, however, been

elucidated in the vast majority of non-model organisms. For plants, the selective pressures

leading to local adaptation can include a variety of abiotic and biotic factors such as: soil type

(Sambatti and Rice 2006; Turner et al., 2008; Turner et al., 2010), water availability (Knight et

al., 2006), photoperiod (Riihimäki and Savolainen 2004), temperature (Arnone and Körner

1997), herbivores (Sork et al. 1993), mycorrhizal associations (Johnson et al. 2010), and

proximity to agricultural fields (Mercer et al. 2006). Because these selective pressures are

expected to produce characteristic patterns of genetic variation in and near genes conferring

adaptive differences, population genetic approaches have the potential to provide insight into the

genes, or at least genomic regions, responsible for producing locally adapted traits across the

range of a species.

In the case of divergent selection, which would be expected to play an important role in

the production of locally adapted populations, the focus is typically on measures of population

genetic differentiation. More specifically, divergent selective pressures would be expected to

produce elevated population structure in the vicinity of targeted genes relative to the genome-

wide average (e.g., Lewontin and Krakauer 1973; Beaumont and Nichols 1996; Excoffier and

Lischer 2010; Foll and Gaggiotti 2008). In contrast, balancing selection (or a range-wide

selective sweep) would be expected to result in much lower levels of population genetic

differentiation (Polly et al. 2003; Cagliani et al. 2008). When combined with high-throughput

genotyping approaches, such population genetic approaches have been used to identify genes

18

thought to be involved in adaptation in a variety of species, including boreal black spruce

(Prunier et al. 2011), Atlantic cod (Nielsen et al. 2009), prairie-chickens (Bollmer et al. 2011),

and moor frogs (Richter-Boix et al. 2011).

In addition to overall levels of population differentiation, clinal patterns of genetic

variation can also be indicative of local adaptation (e.g., Coop et al. 2010; Kooyers and Olsen,

2012). A variety of environmental variables typically vary across the ranges of species, and thus

there may be selection for different phenotypic values at the extremes of a species’ range. While

allele frequencies at many loci might exhibit weak correlations across a given environmental

contrast due to the joint effects of genetic drift and gene flow, alleles at loci that play an

important role in local adaptation should clearly correlate with relevant environmental variables

(Coop et al., 2010). For example, putative adaptive clines in allele frequency have been

identified in Arabidopsis thaliana for the flowering time genes FRIGIDA (Stinchcombe et al.

2004) and PHYTOCROME C (Balasubramanian et al. 2006), in Populus tremula for the

flowering time gene PHYTOCHROME B2 (Ingvarsson et al. 2006), in Drosophila melanogaster

for the insulin-signaling gene INSULIN-LIKE RECEPTOR (Paaby et al. 2010), and in

Peromyscus polionotus for the coat color gene AGOUTI (Mullen and Hoekstra 2008). While the

above studies have provided tremendous insight into the genetic basis of local adaptation, studies

of non-model organisms will help to broaden our understanding of this fundamental evolutionary

process. In the present paper, we report on range-wide patterns of phenotypic and genetic

diversity in common sunflower, Helianthus annuus.

Sunflower is a member of the Compositae (a.k.a., the Asteraceae), which is one of the

largest and most diverse families of flowering plants. The native range of common sunflower

spans much of North America, and wild populations occur in habitats that are characterized by

19

variation in a wide range of environmental variables, including: photoperiod, growing season,

minimum/maximum temperatures, and precipitation. Common sunflower is also the wild

progenitor of cultivated sunflower (also H. annuus), which is native to east-central North

America (Crites 1993; Harter 2004; Blackman et al. 2011b) and is one of the world’s most

important oilseed crops (Schneiter 1997).

Here, we describe patterns of phenotypic and genetic diversity within and among 15 wild

sunflower populations across a latitudinal gradient in central North America. We grew and

phenotyped individuals from these populations in a common greenhouse environment and

genotyped them using a SNP array targeting 384 loci distributed throughout the sunflower

genome. We used these data to investigate geographic patterns of phenotypic differentiation,

describe overall patterns of population genetic variation, and identify loci that harbor the

population genetic signature of local adaptation. We also placed our population genetic results in

the context of prior QTL mapping studies in sunflower to determine whether highly

differentiated loci co-localize with known QTL regions.

Materials and Methods

Plant materials and phenotypic analyses

Seeds from 15 wild-collected populations of H. annuus were obtained from the USDA’s

North Central Regional Plant Introduction Station (Ames, IA). These populations, which were

sampled from a range of latitudes across central North America (Figure 2.1; Table 2.1), were

selected to represent truly wild populations that appear to be free from the effects of past

introgression with cultivated sunflower (L. Marek and G. Seiler, personal communication). Prior

to germination, all seeds were cleaned with 3% hydrogen peroxide, rinsed with deionized water,

20

and placed on moist filter paper in a petri dish. To break dormancy, petri dishes were placed at 4

C in a dark cold room for 14 days. After the cold treatment, they were moved into a growth room

where they were maintained under 16 hour days at 23 C. Following germination, seedlings were

planted in soil trays. Once established, these seedlings were transplanted into soil pots (900

Classic, Nursery Supplies Inc, Kissimmee, FL) and moved to the greenhouse, where

supplemental lighting provided a consistent cycle of 16 hour days and 8 hour nights.

Plants were arranged in the greenhouse in four blocks, each of which contained five

individuals from each of the 15 populations (75 individuals total per block). All plants were

phenotyped for a variety of traits, including: days to four pairs of true leaves, days to flowering,

plant height at senescence, branching architecture, seed size, and seed oil content/composition.

Because wild sunflower is self-incompatible, manual crosses were performed to produce seeds.

This involved intercrossing individuals within populations (i.e., bulked pollen collected from

individuals within a population was used to pollinate individuals within that population), with

inflorescences being bagged to prevent cross-contamination. Seeds were then collected at

physiological maturity and phenotyped. Oil traits were assessed following established protocols

(Burke et al. 2005). Briefly, percent oil content was determined via pulsed nuclear magnetic

resonance (NMR) analyses using a Bruker MQ20 Minispec NMR analyzer (Billerica, MA) that

had been calibrated with known standards. Fatty acid composition was determined by gas

chromatography (Hewlett-Packard, Palo Alto, CA) with known fatty acid standards (Nu-Check

Prep, Elysian, MN).

All traits were tested for deviations from normality by determining whether a frequency

histogram of trait values across all 286 full grown individuals was significantly different from a

normal distribution with the Shapiro-Wilk test in JMP 11 (SAS Institute, Cary, NC) and trait

21

values were transformed using a Box-Cox transformation (Box and Cox 1964) as necessary.

Restricted maximum likelihood was used with region as a fixed effect with blocks, and a block-

by-region interaction as a random effect to test for regional differences in trait values while

accounting for variation amongst blocks. For fatty acid traits, the date of fatty acid extraction

was used as a blocking factor instead of greenhouse block because an inspection of the raw data

indicated clear variation in extraction efficiency across days. Least squares means were

compared amongst regions using Tukey’s test.

DNA extractions and SNP genotyping

Leaf tissue was harvested from 286 of the 300 (Table 2.1) individuals described above

and DNA was extracted using the Qiagen DNeasy Plant Mini Kit (Valencia, CA). All DNA

samples were quantified using a NanoDrop (Thermo Scientific, Wilmington, DE) and diluted to

50 ng/µl prior to genotyping. Each sample was then genotyped using a GoldenGate assay

(Illumina, San Diego, CA) targeting 384 SNPs selected from the larger collection of sunflower

SNPs described by Bachlava et al. (2012). These loci were chosen to provide even coverage of

the 17 sunflower linkage groups (LGs), with an average of one SNP every 3.5 cM. Genotype

calls were made using Illumina’s GenomeStudio (ver. 2011.1) followed by manual inspection.

Loci that exhibited aberrant hybridization signals (perhaps due to presence/absence variation or

the occurrence of duplicate genes), an overall lack of polymorphism (i.e., minor allele frequency

< 0.05), and/or large amounts of missing data (i.e., fraction of missing data > 0.05) were

removed prior to population genetic analysis. A total of 246 loci (average = 14.5 per LG; range =

11-20 per LG) were retained for further analysis.

22

Population genetic analyses

Measures of genetic diversity, including the percentage of polymorphic loci and observed

(Ho) and Nei’s unbiased expected heterozygosity (UHe; Nei 1978) were calculated at the

population level using GenAlEx (version 6.501; Peakall and Smouse 2006). We also used

GenAlEx to investigate genetic differentiation amongst populations by performing an analysis of

molecular variance (AMOVA) with 999 permutations to determine the level of population

structure in our dataset. Finally, the program STRUCTURE (Pritchard et al., 2000) was used to

investigate population genetic structure across the species range. Specifically, STRUCTURE was

run from K = 1 to 17 population genetic clusters with a burn-in of 100,000 and 1,000,000

MCMC iterations (with 20 replicates for each K value). Results were imported into

STRUCTURE Harvester (Earl and von Holdt 2012) where the most likely value of K was

determined using the deltaK method (Evanno et al. 2005). STRUCTURE, was additionally used

to test individual subsets of the data to investigate finer levels of genetic structure.

The potential role of local selective pressures in shaping diversity at individual loci was

investigated using multiple approaches. First, we used Arlequin to calculate 20,000 simulations

in order to obtain a null distribution for FST, which was then used to develop a 99% confidence

interval for outlier identification (Version 3.5.1.2; Excoffier and Lischer 2010). In general terms,

over-differentiated loci are regarded as candidates for local adaptation, while under-

differentiated loci are generally viewed as candidates for balancing selection (Polley et al. 2003;

Cagliani et al. 2008) or a range-wide sweep. BayeScan was also used to test for selection by

comparing the posterior probabilities of two models (selection vs. no selection) for each locus

(Foll and Gaggiotti 2008). Following Foll and Gaggiotti (Version 2.1; 2008), loci whose

posterior probability for the model including selection was greater than 0.91 were regarded as

23

being ‘strong’ FST outlier candidates. We then mined the sunflower QTL literature to identify

any QTL whose confidence interval co-localized with a putative local adaptation SNP identified

in this study. Co-localization information was obtained using previously published studies from a

variety of sunflower crosses (Burke et al. 2002; Burke et al. 2005; Wills and Burke 2007; Baack

et al. 2008; Dechaine et al. 2009).

Results

Phenotypic diversity

We identified numerous traits that exhibited differentiation amongst the five sampled

regions, with latitude being a significant factor in the partitioning of phenotypic diversity for

traits such as flowering time, plant height, branching, and a number of seed oil traits

(Supplemental Table 2.3). Individuals from the southern regions (Texas and Oklahoma, Regions

1 and 2; Supplemental Table 2.3) tended to flower later, grow taller, and have a higher

proportion of saturated fatty acids within their seeds compared to individuals from the northern

regions found in Saskatchewan, North Dakota and Montana (Regions 4 and 5; Supplemental

Table 2.3). The fatty acid composition data also showed some interesting trends, with the

saturated type (i.e., palmitic and stearic acid) showing the same sort of regional differentiation as

noted above. In contrast, the unsaturated types (i.e., oleic acid and linoleic acid) did not show

significant differences between regions. Seed oil content showed no significant differences

among regions across the entire range (Supplemental Table 2.3). Aside from the aforementioned

differentiation in saturated fatty acid percentage in seed oils, regions were significantly

differentiated for seed length with respect to latitude. While seed weight and seed width both

exhibit some regional differences, the differences were not due to latitude as the most southern

24

region was not significantly different from the most northern region for these two traits

(Supplemental Table 2.3). Notably, the latitudinal trends found in saturated fatty acid content and

flowering time are consistent with the results of previous studies (Linder 2000; Blackman et al.

2011a). There was no significant latitudinal difference in total branching, although plants from

Texas and Oklahoma (Regions 1 and 2; Supplemental Table 2.3) had significantly more top

branching compared to the three northern regions. Other plant architecture traits, such as branch

length and the extent of secondary, tertiary, or higher-order branching, were significantly

different when regions were compared, but those differences did not show a latitudinal pattern

(Supplemental Table 2.3). Interestingly, no traits exhibited significant differentiation between all

five regions (Supplemental Table 2.3).

Population genetic structure

Calculation of population genetic statistics for each of the 15 populations revealed a

substantial, albeit variable, amount of genetic diversity across the range of wild sunflower (Table

2.2). An analysis of molecular variance revealed that approximately 20% of the observed genetic

variation could be attributed to population level differentiation. Of the remaining genetic

variation, 76% was seen at the within individual levels whereas only 4% was found at the

among-individual level. A STRUCTURE (Pritchard et al. 2000) analysis of the data coupled

with the deltaK method for determining the most likely number of population genetic clusters

(Evanno et al. 2005) identified K = 2 clusters (Figure 2.2; Supplemental figure 2.1). The

STRUCTURE bar plot for K = 2 revealed a north-south divide with the Region 3 corresponding

to a transition zone (Figure 2.2). An additional STRUCTURE run containing only the

southernmost six populations also indicated the K = 2. For this level of K, TX1 was separated

25

from the remaining five populations found in Texas and Oklahoma (Supplemental figure 2.2).

When the northernmost six populations were analyzed by STRUCTURE, K = 2 was again the

most well-supported number of genetic groups. Similar to the result for the southern portion of

the range, only a single population (ND1) in the northern portion of the range was separated

from the other five populations at K = 2 (Supplemental figure 2.4).

Outlier identification

Multiple outlier identification programs highlighted the existence of an overlapping set of

loci that exhibit the signature of local adaptation (Table 2.3). Arlequin identified eight loci that

were highly differentiated in a global FST calculation (all possible pairwise FST combinations;

99% confidence intervals). These loci included: one SNP on LG4 with homology to a

hydroxyproline-rich glycoprotein family protein; two SNPs located near the distal end of LG 6,

one in FT2 (Blackman et al. 2010; Blackman et al. 2011a) and the other in a gene with homology

to a mitogen-activated protein kinase kinase kinase 14; one SNP on LG7 in a gene with high

similarity to a gene in the ARM repeat family of proteins in A. thaliana; one SNP on LG10 in the

GRAS/DELLA transcription factor GAI; two SNPs on LG 12, one corresponding to an EF-hand-

like domain-containing gene, and the other corresponding to a protein of unknown function; and

one SNP located on LG 14 in a gene with high similarity to Defective Cuticle Ridges (DCR) in A.

thaliana. BayeScan provided complementary outlier results by identifying three highly

differentiated loci (SNPs within the DCR homolog, the GRAS/DELLA transcription factor, and

the gene containing the EF-hand-like domain) already highlighted by Arlequin. Four loci had

evidence of being significantly under-differentiated from both Arlequin and BayeScan. There

were two under-differentiated loci on LG 13, including one SNP in a gene with an alpha-beta

26

plait nucleotide binding role and another SNP in a gene with homology to 5’-AMP-activated

protein kinase. SNPs in a glycoside hydrolase and a guanylate binding gene also had

exceptionally low FST, and were found on LGs 8 and 17, respectively.

Outlier co-localization with known QTL

The locations of our eight over-differentiated loci were compared to the locations of

previously mapped sunflower QTL to identify traits potentially involved in local adaptation. On

LG 4, an unannotated gene co-localized with a QTL for leaf number (Dechaine et al. 2009). As

noted above the distal end of LG 6 contains two FST outliers: FT2 and a gene with a putative

kinase function. Both of these co-localize with QTL related to flowering time in two sunflower

mapping populations (Table 2.3; ANN1238 ×CMS 89 [Burke et al. 2002] and ANN1238 ×

Hopi [Wills and Burke 2007]). This genomic region is actually known to contain multiple FT

paralogs, including FT1, which has been shown to be important with respect to cultivated

sunflower’s photoperiod response (Blackman et al. 2010; Blackman et al. 2011a). In addition to

co-localization with the flowering time QTL in this region, there are QTL for morphological

traits (e.g., achene width, plant height, number of ray flowers) and even a QTL for leaf fungal

damage. The outlier on LG 7, a SNP from an EST with homology to an ARM repeat protein, co-

localizes with QTL for flowering time, plant height, leaf number, and head herbivory, as well

(Burke et al. 2002; Dechaine et al. 2009). Interestingly, two loci with strong support from both

Arlequin and BayeScan (the GRAS/DELLA transcription factor and the DCR homolog, which

map to LGs 10 and 14, respectively), did not co-localize with any known QTL. One of the two

outliers on LG12, an unannotated gene, co-localized with leaf shape and number of heads (Burke

et al., 2002). Finally, the EF-hand-domain containing gene co-localized with a QTL for head

27

total (one way of describing the degree of branching), as well as leaf and branch traits, found on

LG 12 (Table 2.3).

Under-differentiated loci co-localized with QTL for a variety of different traits. Of

particular interest were two low FST outliers located near each other on LG 13 that co-localized

with a shared set of QTL that included: number of branches, number of heads, head and leaf

herbivory, stem diameter, achene length, leaf area, and stem height (Burke et al. 2002; Wills and

Burke 2007; Baack et al. 2008; Dechaine et al. 2009).

Discussion

Populations across the range of wild sunflower harbor an exceptional amount of

phenotypic diversity. The extent to which those traits contribute to local adaptation is an

important question that can be addressed in a number of ways including reciprocal transplants,

common garden measurements, and population genome scans. In our analyses, many traits (e.g.,

flowering time, plant height, plant architecture, and seed oil composition) were differentiated in

conjunction with latitude. As sunflower is a seed oil crop, there has been a considerable of

research done to describe and uncover the genetic mechanism behind seed oil variation. In

breeding lines, strong artificial selection has created divergent germplasm groups with vastly

different oil profiles. In the wild, natural selection may act as a strong force in affecting what

relative amounts of saturated and unsaturated fatty acids are most beneficial for populations

living in certain environments.

28

Common garden phenotypic variation

Previous studies of seed oil composition in a variety of species have revealed a negative

correlation between saturated fatty acid content and latitude and degree of saturation at a

relatively coarse geographic scale (Linder 2000). By quantifying the percentage of saturated fatty

acids across the range of sunflower, we were able to identify a similar trend (Supplemental Table

2.3), albeit at a finer geographic scale. Given that these plants were grown in a common garden,

we can infer that the observed differences have a genetic basis, and that functional

polymorphisms in the oil biosynthetic pathway exist across the range of wild sunflower. The

percentage of saturated fatty acids in seed oils is of considerable evolutionary importance with

respect to germination under different environmental conditions. Saturated fatty acids are known

to store more usable energy per carbon as compared to unsaturated fatty acids (Linder 2000), but

saturated fatty acids also have higher melting points than unsaturated fatty acids. The resulting

inference is that the production of unsaturated fatty acids in higher latitudes is advantageous

because it ensures energy availability at lower temperatures (Linder 2000). Conversely, saturated

fatty acids are better in lower latitudes because they are more energy rich while still being

available to germinating seeds due to the comparably warmers temperatures.

Observed differences in flowering time can be interpreted in a similar framework.

Growing seasons tend to be shorter in higher latitudes; thus, there is a premium on flowering

early to allow seed set before the end of the growing season. Alternatively, in lower latitudes,

there is typically a longer growing season that may select for later flowering plants that may

grow to a larger size and produce more seeds. It must, however, be noted that plant height and

flowering time are developmentally correlated; as such they form a suite of inter-related traits

(Koester et al. 1993; Bezant et al. 1996).

29

Population genetic structure

The STRUCTURE analysis of the full dataset revealed an overall north/south division in

the natural range of wild sunflower, with a transitional zone occurring in the vicinity Nebraska

and Wyoming. Previous sampling of H. annuus genetic diversity had hinted at a similar

north/south division (Mandel et al. 2011), and our analysis builds on this finding by increasing

the marker density and sampling density within each population. Historically, this latitudinal

transect has seen similar patterns of genetic differentiation. For example, using transplant

gardens, McMillian (1959) showed that multiple grassland species exhibited heritable

differences in flowering time in which northern populations flowered significantly earlier.

Candidate adaptive loci

In terms of population genetic differentiation, we identified interesting possible

candidates for conferring local adaptation with respect to flowering time. We found two outlier

loci on chromosome 6 with SNPs that co-localize with a gene with putative kinase activity and

FT2. Both loci co-localize with previously identified QTL for flowering time, (Burke et al. 2002;

Wills and Burke 2007) in addition to other traits (Table 2.3). FT2 is a gene whose Arabidopsis

homolog has been shown to play a major role in promoting flowering (Turck et al., 2008).

Moreover, the region of sunflower LG 6 where this gene resides has been previously shown to

influence flowering time in domesticated vs. wild sunflower (Burke et al., 2002; Wills and Burke

2007; Blackman et al., 2010). It should be noted that the mapping parents for these crosses

consisted of a wild × crop and wild × landrace. The extent of linkage disequilibrium (LD) of

this region is currently unknown, although previous work indicates that, on average, LD decays

quickly in wild sunflower (Liu and Burke 2006). Studies of cultivated germplasm suggest that

there is variation in LD across the sunflower genome (Mandel et al. 2013). In addition to

30

mapping information, FT2 is an exceptional candidate for local adaptation due to previous gene

expression work across the range of wild H. annuus (Blackman et al. 2011a). In short days, a

cline in gene expression was seen for FT2 in which northern individuals exhibited higher

expression than southern individuals, consistent with this gene playing a role in adaptive

differentiation. Our results add to the observation that FT2 exhibits a latitudinal cline in gene

expression that is consistent with the effects of selection by providing population genetic

evidence of selection on this gene, as well.

We uncovered SNPs with significantly elevated population differentiation values on other

chromosomes. A strongly differentiated SNP on LG 14 resides in the sunflower homolog of

Defective in Cuticle Ridges (DCR). In A. thaliana, mutants of DCR have altered trichome

development during leaf growth (Marks et al. 2009; Panikashvili et al. 2009). Trichomes serve a

multitude of functions in plants including: reflectance of sunlight to prevent damage (Manetas

2003), retention of water (Brewer et al. 1991), and defense (Levin 1973). As many of the

aforementioned factors may correlate with growing season, it is difficult to draw any conclusions

without additional data. It is impossible to suggest that the patterns identified in this research are

in any way causative in nature. Furthermore, since we lack knowledge concerning the strength of

linkage disequilibrium in this genomic region, these SNPs may simply be linked to causal

polymorphisms found in nearby genes.

These FST outliers form a list of possible candidate genes for future experiments.

Importantly, the extent of linkage disequilibrium needs to be assessed in these genomic regions

in order to determine the size of the region of elevated population structure. A possible

explanation for the absence of co-localizing QTL for some SNPs is that no wild × wild mapping

populations currently exist for sunflower. Alternatively, many subtle (trichome density or

31

morphology) and biochemical phenotypes have not been measured and thus could not have co-

localized with population differentiation. Marker density has become the main limitation in

genome scan studies of local adaptation in natural populations (Flint-Garcia et al. 2003). The

advent of high-throughput methods such as restriction site associated DNA sequencing (RAD-

seq) and genotyping by sequencing (GBS) have allowed researchers to obtain both large

numbers of markers and an even genomic distribution (Hohenlohe et al. 2010; Davey et al. 2011;

Elshire et al. 2011).

Conclusions

In this study 246 loci characterized the range wide genetic diversity and structure of the

wild progenitor of an economically important crop species. Furthermore, these markers clearly

indicated a genetic disjunction between northern and southern populations that occurs around the

400

north latitude with Nebraska populations appearing to be admixed (Figure 2.2). This study

also generated multiple candidate genomic regions for local adaptation as defined by the extent

of their population genetic differentiation. The extent to which these genomic intervals are

associated with previous trait mapping experiments is also considered. These loci represent

larger physical genomic intervals that will be the focus of future molecular evolutionary

analyses, gene expression comparisons across the range, and field studies to further examine

their putative role in local adaptation.

Acknowledgements

We thank Scott Jackson’s laboratory in the Institute of Plant Breeding Genetics and

Genomics at the University of Georgia for greenhouse space and access to lab equipment. We

32

thank members of the Burke lab for comments on an earlier version of this manuscript. Special

thanks to Caitlin Ishibashi and Jeff Roeder for assisting with the DNA extractions and to

Shannon Ritter, Michael Cherry, and Shreyas Vangala for assistance in phenotyping. This

research was supported by grants from the NSF Plant Genome Research Program (DBI-0820451

and DBI-1444522).

References

Arnone, J.A. and Korner, C. (1997) Temperature adaptation and acclimation potential of leaf

dark respiration in two species of Ranunculus from worm and cold habitats. Arctic Alpine

Res, 29, 122-125.

Baack, E.J., Sapir, Y, Chapman, M.A., Burke, J.M. and Rieseberg, L.H. (2008) Selection on

domestication traits and quantitative trait loci in crop-wild sunflower hybrids. Molecular

Ecology, 17, 666-677.

Bachlava, E., Taylor, C.A., Tang, S., Bowers, J.E., Mandel, J.R., Burke, J.M. and Knapp, S.J.

(2012) SNP discovery and development of a high-density genotyping array for sunflower.

PLoS One, 7, e29814.

Balasubramanian, S., Sureshkumar, S., Agrawal, M., Michael, T.P., Wessinger, C., Maloof, J.N.,

Clark, R., Warthmann, N., Chory, J. and Weigel, D. (2006) The PHYTOCHROME C

photoreceptor gene mediates natural variation in flowering and growth responses of

Arabidopsis thaliana. Nat Genet, 38, 711-715.

Beaumont, M.A. and Nichols, R.A. (1996) Evaluating loci for use in the genetic analysis of

population structure. P Roy Soc B-Biol Sci, 263, 1619-1626.

Bezant, J., Laurie, D., Pratchett, N., Chojecki, J. and Kearsey, M. (1996) Marker regression

mapping of QTL controlling flowering time and plant height in a spring barely (Hordeum

vulgare L.) cross. Heredity, 77, 64-73.

Blackman, B.K., Michaels, S.D. and Rieseberg, L.H. (2011a) Connecting the sun to flowering in

sunflower adaptation. Molecular Ecology, 20, 3503-3512.

Blackman, B.K., Scascitelli, M., Kane, N.C., Luton, H.H., Rasmussen, D.A., Bye, R.A., Lentz,

D.L. and Rieseberg, L.H. (2011b) Sunflower domestication alleles support single

domestication center in eastern North America. P Natl Acad Sci USA, 108, 14360-14365.

33

Blackman, B.K., Strasburg, J.L., Raduski, A.R., Michaels, S.D. and Rieseberg, L.H. (2010) The

Role of Recently Derived FT Paralogs in Sunflower Domestication. Current Biology, 20,

629-635.

Bollmer, J.L., Ruder, E.A., Johnson, J.A., Eimes, J.A. and Dunn, P.O. (2011) Drift and selection

influence geographic variation at immune loci of prairie-chickens. Molecular Ecology,

20, 4695-4706.

Box, G.E.P. and Cox, D.R. (1964) An Analysis of Transformations. J Roy Stat Soc B, 26, 211-

252.

Brewer, C.A., Smith, W.K. and Vogelmann, T.C. (1991) Functional interaction between leaf

trichomes, leaf wettability and the optical properies of water droplets. Plant, Cell &

Environment, 14, 955-962.

Burke, J.M., Tang, S., Knapp, S.J. and Rieseberg, L.H. (2002) Genetic analysis of sunflower

domestication. Genetics, 161, 1257-1267.

Burke, J.M., Knapp, S.J. and Rieseberg, L.H. (2005) Genetic consequences of selection during

the evolution of cultivated sunflower. Genetics, 171, 1933-1940.

Cagliani, R., Fumagalli, M., Riva, S., Pozzoli, U., Comi, G.P., Menozzi, G., Bresolin, N. and

Sironi, M. (2008) The signature of long-standing balancing selection at the human

defensin beta-1 promoter. Genome Biol, 9.

Coop, G., Witonsky, D., Di Rienzo, A. and Pritchard, J.K. (2010) Using Environmental

Correlations to Identify Loci Underlying Local Adaptation. Genetics, 185, 1411-1423.

Crites, G.D. (1993) Domesticated Sunflower in 5th Millennium Bp Temporal Context - New

Evidence from Middle Tennessee. Am Antiquity, 58, 146-148.

Davey, J.W., Hohenlohe, P.A., Etter, P.D., Boone, J.Q., Catchen, J.M. and Blaxter, M.L. (2011)

Genome-wide genetic marker discovery and genotyping using next-generation

sequencing. Nat Rev Genet, 12, 499-510.

Dechaine, J.M., Burger, J.C., Chapman, M.A., Seiler, G.J., Brunick, R., Knapp, S.J. and Burke,

J.M. (2009) Fitness effects and genetic architecture of plant-herbivore interactions in

sunflower crop-wild hybrids. New Phytologist, 184, 828-841.

Earl, D.A. and Vonholdt, B.M. (2012) STUCTURE HARVESTER: a website and program for

visualizing STRUCTURE output and implementing the Evanno method. Conservation

Genetics Resources, 4, 359-361.

Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S. and Mitchell,

S.E. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high

diversity species. PLoS One, 6, e19379.

34

Excoffier, L. and Lischer, H.E. (2010) Arlequin suite ver 3.5: a new series of programs to

perform population genetics analyses under Linux and Windows. Mol Ecol Resour, 10,

564-567.

Evanno, G., Regnaut, S. and Goudet, J. (2005) Detecting the number of clusters of individuals

using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611-

2620.

Flint-Garcia, S.A., Thornsberry, J.M. and Buckler, E.S. (2003) Structure of linkage

disequilibrium in plants. Annu Rev Plant Biol, 54, 357-374.

Foll, M. and Gaggiotti, O. (2008) A Genome-Scan Method to Identify Selected Loci Appropriate

for Both Dominant and Codominant Markers: A Bayesian Perspective. Genetics, 180,

977-993.

Harter, A.V., Gardner, K.A., Falush, D., Lentz, D.L., Bye, R.A. and Rieseberg, L.H. (2004)

Origin of extant domesticated sunflowers in eastern North America. Nature, 430, 201-

205.

Hohenlohe, P.A., Bassham, S., Etter, P.D., Stiffler, N., Johnson, E.A. and Cresko, W.A. (2010)

Population genomics of parallel adaptation in threespine stickleback using sequenced

RAD tags. Plos Genet, 6, e1000862.

Ingvarsson, P.K., Garcia, M.V., Hall, D., Luquez, V. and Jansson, S. (2006) Clinal variation in

phyB2, a candidate gene for day-length-induced growth cessation and bud set, across a

latitudinal gradient in European aspen (Populus tremula). Genetics, 172, 1845-1853.

Johnson, N.C., Wilson, G.W.T., Bowker, M.A., Wilson, J.A. and Miller, R.M. (2010) Resource

limitation is a driver of local adaptation in mycorrhizal symbioses. P Natl Acad Sci USA,

107, 2093-2098.

Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters, 7, 1225-

1241.

Knight, C.A., Vogel, H., Kroymann, J., Shumate, A., Witsenboer, H. and Mitchell-Olds, T.

(2006) Expression profiling and local adaptation of Boechera holboellii populations for

water use efficiency across a naturally occurring water stress gradient. Mol Ecol, 15,

1229-1237.

Koester, R.P., Sisco, P.H. and Stuber, C.W. (1993) Indentification of quantitative trait loci

controlling days to flowering and plant height in two near isogenic lines of maize. Crop

Sci, 33, 1209-1216.

Kooyers, N.J., and Olsen, K.M. (2012) Rapid evolution of an adaptive cyanogenesis cline in

35

introduced North American white clover (Trifolium repens L.). Molecular Ecology, 21,

2455-2468.

Levin, D.A. (1973) The Role of Trichomes in Plant Defense. The Quarterly Review of Biology,

48, 3-15.

Lewontin, R.C. and Krakauer, J. (1973) Distribution of Gene Frequency as a Test of Theory of

Selective Neutrality of Polymorphisms. Genetics, 74, 175-195.

Linder, C.R. (2000) Adaptive evolution of seed oils in plants: Accounting for the biogeographic

distribution of saturated and unsaturated fatty acids in seed oils. American Naturalist,

156, 442-458.

Liu, A. and Burke, J.M. (2006) Patterns of nucleotide diversity in wild and cultivated sunflower.

Genetics, 173, 321-330.

Mandel J.R., Dechaine J.M., Marek L.F, and Burke J.M. (2011) Genetic diversity and population

structure in cultivated sunflower and a comparison to its wild progenitor, Helianthus

annuus L. Theor Appl Genet, 123, 693-704.

Mandel JR, Nambeesan S, Bowers JE, Marek LF, Ebert D, et al. (2013) Association mapping

and the genomic consequences of selection in sunflower. PLoS Genet, 9, e1003378.

Manetas, Y. (2003) The importance of being hairy: the adverse effects of hair removal on stem

photosynthesis of Verbascum speciosum are due to solar UV-B radiation. New Phytol,

158, 503-508.

Marks, M.D., Wenger, J.P., Gilding, E., Jilk, R. and Dixon, R.A. (2009) Transcriptome analysis

of Arabidopsis wild-type and gl3-sst sim trichomes identifies four additional genes

required for trichome development. Mol Plant, 2, 803-822.

McMillan, C. (1959) The Role of Ecotypic Variation in the Distribution of the Central Grassland

of North America. Ecol Monogr, 29, 285-308.

Mercer, K.L., Wyse, D.L. and Shaw, R.G. (2006) Effects of competition on the fitness of wild

and crop-wild hybrid sunflower from a diversity of wild populations and crop lines.

Evolution, 60, 2044-2055.

Mullen, L.M. and Hoekstra, H.E. (2008) Natural selection along an environmental gradient: A

classic cline in mouse pigmentation. Evolution, 62, 1555-1569.

Nei, M. (1978) Estimation of average heterozygosity and genetic distance from a small number

of individuals. Genetics, 89, 583-590.

Nielsen, E.E., Hemmer-Hansen, J., Poulsen, N.A., Loeschcke, V., Moen, T., Johansen, T.,

Mittelholzer, C., Taranger, G.L., Ogden, R. and Carvalho, G.R. (2009) Genomic

36

signatures of local directional selection in a high gene flow marine organism; the Atlantic

cod (Gadus morhua). BMC Evolutionary Biology, 9.

Paaby, A.B., Blacket, M.J., Hoffmann, A.A. and Schmidt, P.S. (2010) Identification of a

candidate adaptive polymorphism for Drosophila life history by parallel independent

clines on two continents. Molecular Ecology, 19, 760-774.

Panikashvili, D., Shi, J.X., Schreiber, L. and Aharoni, A. (2009) The Arabidopsis DCR encoding

a soluble BAHD acyltransferase is required for cutin polyester formation and seed

hydration properties. Plant Physiol, 151, 1773-1789.

Peakall, R. and Smouse, P.E. (2006) GENALEX 6: genetic analysis in Excel. Population genetic

software for teaching and research. Mol Ecol Notes, 6, 288-295.

Polley, S.D., Chokejindachai, W. and Conway, D.J. (2003) Allele frequency-based analyses

robustly map sequence sites under balancing selection in a malaria vaccine candidate

antigen. Genetics, 165, 555-561.

Pritchard, J.K., Stephens, M. and Donnelly, P. (2000) Inference of population structure using

multilocus genotype data. Genetics, 155, 945-959.

Prunier, J., Laroche, J., Beaulieu, J. and Bousquet, J. (2011) Scanning the genome for gene SNPs

related to climate adaptation and estimating selection at the molecular level in boreal

black spruce. Molecular Ecology, 20, 1702-1716.

Richter-Boix, A., Quintela, M., Segelbacher, G. and Laurila, A. (2011) Genetic analysis of

differentiation among breeding ponds reveals a candidate gene for local adaptation in

Rana arvalis. Molecular Ecology, 20, 1582-1600.

Riihimaki, M. and Savolainen, O. (2004) Environmental and genetic effects on flowering

differences between northern and southern populations of Arabidopsis lyrata

(Brassicaceae). Am J Bot, 91, 1036-1045.

Sambatti, J.B. and Rice, K.J. (2006) Local adaptation, patterns of selection, and gene flow in the

Californian serpentine sunflower (Helianthus exilis). Evolution, 60, 696-710.

Schneiter, A.A., American Society of Agronomy., Crop Science Society of America. and Soil

Science Society of America. (1997) Sunflower technology and production. American

Society of Agronomy : Crop Science Society of America : Soil Science Society of

America, Madison, Wis.

Sork, V.L., Stowe, K.A. and Hochwender, C. (1993) Evidence for Local Adaptation in Closely

Adjacent Subpopulations of Northern Red Oak (Quercus rubra L) Expressed as

Resistance to Leaf Herbivores. American Naturalist, 142, 928-936.

Stinchcombe, J.R., Weinig, C., Ungerer, M., Olsen, K.M., Mays, C., Halldorsdottir, S.S.,

37

Purugganan, M.D. and Schmitt, J. (2004) A latitudinal cline in flowering time in

Arabidopsis thaliana modulated by the flowering time gene FRIGIDA. P Natl Acad Sci

USA, 101, 4712-4717.

Turck, F., Fornara, F. and Coupland, G. (2008) Regulation and identity of florigen:

FLOWERING LOCUS T moves center stage. Annu Rev Plant Biol, 59, 573-594.

Turner, T.L., Bourne, E.C., Von Wettberg, E.J., Hu, T.T. and Nuzhdin, S.V. (2010) Population

resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nat

Genet, 42, 260-263.

Turner, T.L., von Wettberg, E.J. and Nuzhdin, S.V. (2008) Genomic analysis of differentiation

between soil types reveals candidate genes for local adaptation in Arabidopsis lyrata.

PLoS One, 3, e3183.

Wills, D.M. and Burke, J.M. (2007) Quantitative trait locus analysis of the early domestication of

sunflower. Genetics, 176, 2589-2599.

38

TABLES

Table 2.1 – Range-wide population sampling information.

Population State / Province Country Sample Size Latitude Longitude TX1 Texas USA 20 31.041 -104.821 TX2 Texas USA 20 31.189 -103.578 TX3 Texas USA 20 31.206 -102.635 TX4 Texas USA 20 35.190 -102.010 TX5 Texas USA 20 35.199 -100.799 OK1 Oklahoma USA 12 35.262 -99.669 NE1 Nebraska USA 20 41.063 -98.091 NE2 Nebraska USA 20 41.211 -101.649 WY1 Wyoming USA 20 41.418 -104.098 MT1 Montana USA 20 46.585 -108.592 MT2 Montana USA 20 46.795 -105.302 ND1 North Dakota USA 20 46.879 -102.789 SAS1 Saskatchewan Canada 19 50.048 -104.707 SAS2 Saskatchewan Canada 15 50.394 -108.480 SAS3 Saskatchewan Canada 20 50.660 -105.665

39

Table 2.2 – Population genetic statistics for 15 wild sunflower populations

Population Naa Ne

b Ho

c uHe

d FIS P

e

TX1 Mean 1.85 1.41 0.26 0.25 -0.03 0.85

SE 0.02 0.02 0.01 0.01 0.02

TX2 Mean 1.80 1.46 0.26 0.28 0.04 0.80

SE 0.03 0.02 0.01 0.01 0.02

TX3 Mean 1.84 1.49 0.28 0.30 0.05 0.84

SE 0.02 0.02 0.01 0.01 0.02

TX4 Mean 1.86 1.49 0.28 0.30 0.03 0.86

SE 0.02 0.02 0.01 0.01 0.02

TX5 Mean 1.91 1.49 0.26 0.30 0.10 0.91

SE 0.02 0.02 0.01 0.01 0.02

OK1 Mean 1.80 1.46 0.26 0.28 0.05 0.80

SE 0.03 0.02 0.01 0.01 0.02

NE1 Mean 1.91 1.51 0.30 0.31 0.00 0.91

SE 0.02 0.02 0.01 0.01 0.02

NE2 Mean 1.69 1.42 0.24 0.25 0.03 0.69

SE 0.03 0.02 0.01 0.01 0.02

WY1 Mean 1.57 1.36 0.22 0.21 -0.05 0.57

SE 0.03 0.03 0.02 0.01 0.02

MT1 Mean 1.89 1.46 0.30 0.28 -0.08 0.89

SE 0.02 0.02 0.01 0.01 0.02

MT2 Mean 1.86 1.48 0.28 0.29 -0.01 0.86

SE 0.02 0.02 0.01 0.01 0.02

ND1 Mean 1.59 1.36 0.24 0.22 -0.12 0.59

SE 0.03 0.02 0.02 0.01 0.02

SAS1 Mean 1.89 1.50 0.27 0.30 0.08 0.89

SE 0.02 0.02 0.01 0.01 0.02

SAS2 Mean 1.82 1.47 0.28 0.29 0.02 0.82

SE 0.02 0.02 0.01 0.01 0.02

SAS3 Mean 1.85 1.48 0.27 0.29 0.04 0.85

SE 0.02 0.02 0.01 0.01 0.02

a Number of alleles per locus;

b Effective number of alleles per locus;

c Observed heterozygosity;

d Unbiased expected heterozygosity;

e Percent polymorphic loci

40

Table 2.3 – Summary of candidates for genes involved in local adaptation. All of these loci had exceptionally high levels of FST as 1

determined by Arlequin and/or BayeScan and were cross-referenced against QTL information to determine the extent of QTL co-2

localization. 3

Gene name FST LG cM Position Arlequin a

Bayescan a

QTL b

Mitogen activated protein kinase kinase kinase 14 0.38 6 53.72 2/5 0/5

A, B, C, D, E, F, G, H,

N, O, X, Y, Z, AA

DCR 0.38 14 67.27 2/5 5/5 None

No annotated hit heliagene or NCBI 0.40 12 65.62 1/5 0/5 I, L, Y

FT2 0.47 6 65.9 2/5 0/5

E, F, H, M, N, O, T, U,

V, W

Armarillo type fold 0.36 7 19.29 1/5 0/5 D, E, J, L, P

No annotated hit heliagene or NCBI 0.37 4 73.86 1/5 0/5 J

GRAS / DELLA transcription factor 0.41 10 66.87 5/5 5/5 None

EF-hand-like domain 0.44 12 44.1 5/5 5/5 J, K, I, Q, R, S, Y

4

a Fraction represents the number of times a particular locus was detected as an FST outlier 5

b Letters represent co-localizing QTL. Key: A – Leaf shape (Baack et al. 2008), B – Number of ray flowers (Burke et al. 2002), C – 6

41

Disc diameter (Wills and Burke 2007), D – Height (Burke et al. 2002), E – Days to flower (Burke et al. 2002), F – Leaf fungal damage 7

(Dechaine et al. 2009), G – Achene width (Burke et al. 2002), H – Days to flower (Wills and Burke 2007), I – Leaf shape (Burke et al. 8

2002), J – Leaf number (Dechaine et al. 2009), K – Head total (Dechaine et al. 2009), L – Number of heads (Burke et al. 2002), M – 9

Seed total (Dechaine et al. 2009), N – Leaf herbivory (Dechaine et al. 2009), O – Head clipping weevil (Dechaine et al. 2009), P – 10

Head herbivory (Dechaine et al. 2009), Q – Branch number (Dechaine et al. 2009), R – Stem diameter (Dechaine et al. 2009), S – Leaf 11

shape (Dechaine et al. 2009), T – Days to flower (Baack et al. 2008), U – Height (Baack et al. 2008), V – Leaf number (Baack et al. 12

2008), W – Leaf moisture content (Baack et al. 2008), X – Height (Wills and Burke 2007), Y – Heads per branch (Burke et al. 2002), 13

Z – Stem diameter (Burke et al. 2002), AA – Achene weight (Burke et al. 2002) 14

42

Figure 2.1 – Map of the locations of the 15 populations used in this study in the central USA and

Canada.

TX1

TX4

NE1NE2

MT1 MT2 ND1

SAS2 SAS3

TX3

OK1

SAS1

TX2

TX5

WY1

0 500 1000 1500 km

-140 -120 -100 -80 -60

20

30

40

50

60

43

Figure 2.2 – STRUCTURE bar plot of full dataset. Populations correspond to those in Table 2.1.

TX1 TX2 TX3 TX4 TX5 OK1 NE1 NE2 WY1 MT1 MT2 ND1 SAS1 SAS2 SAS3

44

CHAPTER III

GENOMIC PATTERNS OF SNP DIVERSITY AND THE GENETIC BASIS OF LOCAL

ADAPTATION IN WILD SUNFLOWER2

2 McAssey, E.V., Burke, J.M. to be submitted to Journal of Heredity

45

Abstract

Examining genetic diversity across populations from multiple latitudes presents an ideal

situation to understand the genetic basis of adaption to divergent climates. In North America,

wild sunflower populations have a broad distribution, and thus individual populations have

experienced drastically different environments in low latitudes compared to high latitudes. In

sunflower, previous work has shown that flowering time and saturated fatty acid content in seeds

are especially differentiated with respect to latitude. In order to understand the genetic changes

associated with adaptation to climate regime, I took a population genomic approach whereby I

used Genotyping-by-Sequencing (GBS) to genotype individuals from a wild population in Texas,

Nebraska, and Canada. Using loci with low levels a missing data, I performed a STRUCTURE

analysis that successfully identified the presence of our three original populations indicating

clear population structure. Using over 10,000 single nucleotide polymorphisms (SNPs) I was

able to scan the genome for highly differentiated markers when comparing populations. A

number of SNPs in the flowering time pathway were found to be highly differentiated including

sunflower homologs of: ANAC52, ESD7, and EDM2. In addition to being highly structured in

our dataset, mutants of two of the above genes (ANAC52 and ESD7) have been shown in

Arabidopsis thaliana to affect the expression of FT, whose sunflower homologs have been the

focus of investigations of both domestication and local adaptation. An analysis of the genetic

location of all highly structured polymorphisms revealed numerous instances of co-localizing

with flowering time and fatty acid QTL including ANAC52. Future investigations will require a

clear determination of the extent of linkage disequilibrium in these candidate genomic regions in

order to assess the width of the selected genomic intervals.

46

Introduction

Investigating patterns of variation across the range of a species is a useful approach for

determining the genetic basis of local adaptation. Most species, especially those with large

geographic ranges, contain populations that exist in a wide variety of habitats. In many cases,

these populations exhibit trait differences that allow them to thrive in their local environments

(i.e., local adaptation [Kawecki and Ebert 2004]). Ecologically, these so-called local adaptations

can be assessed and confirmed via reciprocal transplant experiments in which individuals from

disparate populations are grown in home and away environments and resulting fitness levels are

compared (Hereford 2009). Additionally, local adaptation can be assessed at the genotypic level

using population genetic statistics (Savolainen et al., 2013).

The genomic study of local adaptation often begins with a baseline assessment of

population genetic structure. Population genetic structure, often assessed with metrics like FST

and its analogs, measures the proportion of the total genetic variation that can be attributed to the

level of populations (Wright 1951). Selection for locally adaptive traits results in elevated

population structure at a target locus, which indicates that populations are highly differentiated in

terms of allele frequencies at a single genomic region compared to the balance of the genome

(Beaumont and Nichols 1996). While relatively few markers are required to characterize broad-

scale patterns of gene flow and demography, many more markers are needed to identify the

genes, or at least genomic regions, responsible for local adaptation. The actual density of

markers required for detection of local adaptation depends on the extent of linkage

disequilibrium (LD, or the non-random association of alleles across loci; Flint-Garcia et al.,

47

2003). Species in which LD breaks down quickly require more genetic markers in order to

interrogate each individual genomic region.

With the development of increasingly efficient methods for DNA sequencing, the

generation of truly genome-wide collections of genetic markers has become feasible (e.g.,

Hohenlohe et al., 2010; Elshire et al., 2011; Poland et al., 2012). Once sequence data has been

obtained from multiple populations within a species, it is then possible to use such data to

investigate genome-wide patterns of population structure and identify genes or genomic regions

that have likely played a role in local adaptation. Such analyses have played a major role in

understanding adaptation in a variety of model and non-model species. For example, in

Arabidopsis lyrata population re-sequencing (i.e., the pooling of individuals within a population

before sequencing) led to the identification of candidate genes mediating adaptation to serpentine

soils (Turner et al., 2010). In Drosophila melanogaster, population genomic approaches

identified candidate genes that exhibited patterns of differentiation consistent with local

adaptation across eastern North America (Fabian et al., 2012). In stickleback fishes, QTL

mapping data and population genomic scans for differentiation were used to study adaptation to

local environments (Hohenlohe et al., 2010).

Beyond the utility of such analyses for improving our understanding of adaptive

differentiation in the wild, these sorts of studies also have important everyday applications. For

example, identifying the genetic mechanisms underlying local adaptation in the wild progenitors

of crop species can inform plant breeding efforts in the context of global climate change. That is,

if the genes/alleles underlying the ability to tolerate particular environmental stresses can be

48

identified in wild species, it may be possible to introgress them into crop species, thereby

facilitating growth in a novel climate. For example, microarray analyses (Friesen et al., 2010) as

well as whole genome re-sequencing approaches (Friesen et al., 2014) have been used to identify

candidate genes related to salinity resistance in Medicago truncatula, which may increase

production on marginal lands. Additionally, studies of genetic variation in ecotypes of Panicum

hallii have provided information on the genetic architecture of differentiation with respect to soil

water status, which may prove helpful in efforts to improve switchgrass Panicum virgatum

(Lowry et al., 2015). In a similar way, we analyze the substantial intraspecific diversity of wild

sunflower to understand the genetics of adaptation in a crop relative.

The common sunflower, Helianthus annuus, is a widespread annual species found in a

variety of habitats throughout North America and is the progenitor of cultivated sunflower. Wild

sunflower is differentiated for a variety of traits such as flowering time (Blackman et al., 2011a),

the degree of fatty acid desaturation (Linder 2000), plant height, and plant architecture (McAssey

et al., in prep). Of particular interest in the present study are flowering time and fatty acid profile

composition due to the extent of previous research on these traits with respect to adaptation. In

particular, flowering time has been the focus of a number of studies in H. annuus (Blackman et

al., 2010; Blackman et al., 2011a; Blackman et al., 2011b; Blackman et al., 2013), ranging from

its evolutionary role in domestication and adaptation, to the functional role of specific genes.

When grown in a common garden, high latitude populations flower earlier than low latitude

populations (McAssey et al., in prep), although this trait exhibits considerable phenotypic

plasticity (Blackman et al., 2011a). Presumably, the quick flowering phenotype in high latitudes

is related to the relatively short growing season placing a premium on flowering while conditions

49

are ideal for growth and obtaining pollinators. We additionally focus on the trait of seed fatty

acid profile (i.e., the relative proportion of saturated and unsaturated fatty acids) because of

previous reciprocal transplant experiments (Linder 2000) that showed higher percentages of fatty

acids to be selected for in lower latitudes. Linder (2000) reasoned that saturated fatty acids,

which have a higher melting point, were not found in high percentages in high latitude seed oils

because they would not be available at cool temperatures. In an analogous fashion, it was

reasoned that high latitude populations produced more unsaturated fatty acids because they are

liquid at the cool temperatures experienced while germinating.

In this study, we used genotyping-by-sequencing (GBS) to gather polymorphism data

from three populations of wild sunflower that span a latitudinal gradient from Saskatchewan to

Texas. These populations have been previously shown to be differentiated for a number of traits

including flowering time (McAssey et al., in prep; Blackman et al., 2011) and saturated fatty acid

percentage (McAssey et al., in prep; Linder 2000). We then used these data to investigate

genome-wide patterns of population differentiation, and to identify FST outliers that exhibit the

signature of local adaptation. We further compared the locations of these genes to those of

previously mapped QTL and made functional inferences based on similarity to get genes of

known effect from Arabidopsis.


Plant Material, Library Construction, and Sequencing

Seeds from three wild H. annuus populations across North America were obtained from

the USDA (Texas, USA [PI 664692]; Nebraska, USA [PI 586870]; Saskatchewan, Canada [PI

50

592316]; Figure 3.1). Individuals from these populations were previous genotyped at 246 SNP

loci and phenotyped for a variety of quantitative traits (McAssey et al., in prep). DNA samples

from these same individuals (extracted from fresh leaf tissue using the Qiagen DNeasy Plant

Mini kit [Valencia, CA, USA]; McAssey et al., in prep) were used in the construction of GBS

libraries. Specifically, libraries were made for 55 individuals (19 from Texas; 17 from Nebraska;

19 from Canada). Library prep followed a modified version of an existing GBS protocol (Elshire

et al., 2011; Poland et al., 2012). Briefly, 2 µg of DNA was digested at 37 C overnight by PstI

and MspI (NEB; Ipswich, MA, USA). Barcoded adaptors were then ligated onto PstI overhangs

and common adaptors to MspI overhangs by adding 2.5 µl ligase (1000U), 8 µl 10X buffer, 10 µl

barcoded adaptor (1.7 ng/ul), 4.5 µl common adaptor (694.5 ng/µl), and 15 µl ddH2O to 40 µl of

digested DNA before incubation for 4 hours at 22 C, 20 minutes at 65 C. The resulting DNA

fragments with ligated barcodes were then size-filtered using Ampure beads (Beckman Coulter,

Brea, CA, USA). A 0.4x concentration (v/v) of beads was added to each individual ligation

product in order to remove small fragments prior to PCR. This was done to reduce PCR bias due

to more efficient amplification of shorter fragments. A 50 µl polymerase chain reaction was

prepared by adding 5 µl of Ampure cleaned library to 25 µl of Phusion Taq master mix (NEB), 1

µl (12.5 µM) each of forward and reverse primers and 18 µl H2O. The reaction was first heated

to 95 C for 30 seconds, before undergoing 16 cycles of: 95 C for 30 seconds; 65 C for 20

seconds; 68 C for 25 seconds. The PCR finished with 5 minutes at 72 C. Ampure beads were

used again to remove leftover PCR primers prior to quantification via Nanodrop (Thermo Fisher

Scientific, San Diego, CA, USA). Equal amounts of DNA were established by performing

quantitative PCR (qPCR) on all 55 individual libraries, and then adding equimolar amounts of

each library to form a masterpool containing DNA from all 55 individuals. DNA from the master

51

pool was then loaded onto a 1% agarose gel and run for 2 hours at 96 V prior to gel extraction.

DNA in the 400-650 bp size range was gel extracted using a DNA gel extraction kit (Zymo;

Irvine, CA, USA) and the resulting samples were quantified via qPCR using Illumina standards

(Kapa Biosystems, Inc.; Wilmington, MA, USA). Sequencing (150 bp, paired-end reads) was

then performed on an Illumina NextSeq 500 at the Georgia Genomics Facility (Athens, GA).

Data Processing

Raw reads were filtered and analyzed using Stacks (Catchen et al., 2013). Reads were

first trimmed to 115 basepairs to avoid low quality regions and then filtered for the presence of

adaptor sequence and additional regions of low quality base calls. Reads were then assembled

within each individual using ‘ustacks’ requiring four reads to form a Stack, and allowing for two

polymorphic sites within a 115 base pair Stack. A catalog integrating data from all individuals

was then created via the ‘cstacks’ module, which allowed for an additional two variant sites

when comparing different individuals. Each individual was then matched to the catalog using

‘sstacks.’ Subsequently, the error correcting module ‘rxstacks’ was used before re-doing the

‘cstacks’ and ‘sstacks’ aspects of the pipeline, as recommended by the authors. Sequence stacks,

or loci, were then mapped to the genome using Bowtie2 (Langmead and Salzberg 2012), and

only uniquely mapping loci (q>10) were retained for downstream analyses. CAP3 (Huang and

Madan 1999) was then used to further curate the data by attempting to further assemble loci.

This was done to help limit the number of loci derived from read 2 that overlapped with read 1,

as they are an extreme case of non-independence (e.g., in the extreme they could represent the

same SNP being identified as an outlier twice). To collapse loci, we required them to have an

overlap length of 20 bp and the minimum sequence identity of the overlap to be greater than

52

95%. In order to retain a dataset with a reduced amount of overlaps among loci, if loci assembled

together via CAP3, only a single locus was retained for future analyses. This conservative

approach helps limit the usage of double-counted polymorphisms in downstream analyses. The

‘populations’ module within Stacks was used for further filtering the dataset prior to analyses.

We required at least eight individuals in each of the three populations in order to calculate

genetic structure for a particular locus.

Population Genetic Analyses

Using Stacks, the number of monomorphic and polymorphic loci, and number of

polymorphic sites were obtained for all individuals. Observed and expected heterozygosities

were then calculated at each polymorphic site using ‘populations’ within Stacks. After filtering

for a minor allele frequency (MAF) > 10%, FST, FST’, Dest, ΦST and FIS were calculated for all

polymorphic sites using the ‘populations’ module within Stacks. Furthermore, for each locus, the

number of haplotypes and haplotype diversity were also calculated in the ‘populations’ module

within Stacks. Data were exported into Genepop format using Stacks then converted into

Arlequin (Version 3.5.1.2; Excoffier and Lischer 2010) format in preparation for outlier testing.

Over-differentiated loci were identified using 20,000 simulations and with 100 demes per group.

Additionally, more stringent filtering was performed in which each population was required to

have 80% data present prior to performing a STRUCTURE analysis (Pritchard et al., 2000).

STRUCTURE (version 2.3.4) was run from K = 1 to 5 genetic clusters with an initial burn-in of

100,000 MCMC iterations followed by data collection over 1,000,000 MCMC iterations. These

analyses were replicated 20 times at each K value. STRUCTURE Harvester (Earl and von Holdt

53

2012) was used to determine the most likely number of genetic clusters in the dataset using the

delta-K method (Evanno et al., 2005).

Candidate Gene Analyses

Following identification, FST outliers were characterized by first determining whether

they co-localized with any previously identified QTL as well as previously identified FST outliers

from a lower density study. Loci were mapped to genetically ordered scaffolds (Bowers et al.,

2012). The genetic location of scaffolds containing FST outlier loci was then queried against the

genetic position of known QTL for flowering time (Burke et al., 2002; Wills and Burke 2007)

and fatty acid profile (Burke et al., 2005). Additionally, the FST outliers were mapped to FASTA

files of genes and 5’ UTRs identified from the sunflower genome in order to understand whether

these loci are found in or around genes. For outliers that were within genes, we determined

whether or not the A. thaliana ortholog was implicated in either of our two focal traits: flowering

time and fatty acid biosynthesis. We did this by first identifying the gene that each outlier locus

mapped to on the sunflower genome. The top BLASTp hit for that gene when searched against

the A. thaliana proteome was then compared against the genes found in TAIR associated with

the biological processes ‘Photoperiodism, flowering’ and ‘Vegetative to reproductive phase

transition in meristem,’ as well as the genes in the following KEGG pathways: ‘Biosynthesis of

unsaturated fatty acids,’ ‘Fatty acid elongation,’ ‘Fatty acid degradation,’ and ‘Fatty acid

biosynthesis’.

54

Results and Discussion

This genome-wide collection of GBS loci is capable of serving multiple purposes such as

quantifying levels of population genetic diversity, population structure, and identification of

putative candidate genes. When the distribution of genetic diversity is structured in a clear

latitudinal pattern it suggests the role of adaptation. By scanning the levels and structuring of

genetic diversity across the genome we were able to identify strong candidate genes for local

adaptation to three different regions of North America. By pairing population genetic

information with trait mapping data, we further verified the candidate status of the GBS loci for

playing a role in the establishment of two locally adaptive traits: flowering time and fatty acid

profile.

Genomic Organization of Markers

The program Stacks (Catchen et al., 2013) was used to process the 297,602,702 reads. An

average of 4,161,497 (Min = 979,193; Max = 10,895,995) reads were attributed to each of the 55

GBS libraries (Supplemental figure 3.1). When comparing the genomic location of loci with

sufficient data for outlier detection (see below), a locus was located an average 655,194 ± 14,585

bp from the nearest locus. The number of SNPs found on each of the 17 chromosomes in

sunflower was significantly correlated with the estimated length of each chromosome

(Supplemental figure 3.2). While it is encouraging that GBS markers are found more often on

larger chromosomes, a pattern consistent with expectations, it is still possible that within

individual chromosomes there could be substantial variation in marker density

55

Population Genetic Statistics

After filtering the data to retain only the 5759 loci (11,315 SNPs) with calls from at least

eight individuals per population, populations were tested for differences in the number of

haplotypes, haplotypic diversity, and expected heterozygosity. Specifically, the Texas population

had significantly less haplotypes and a lower haplotypic diversity than the Nebraska population,

but had more haplotypes and higher haplotypic diversity compared to the Canada population

(Table 3.1). Many forces including gene flow from a large number of unsampled populations,

and gene flow from Helianthus species that have range overlaps in Nebraska could have driven

this pattern. Clearly, sampling more populations with GBS markers will be required to assess the

forces affecting diversity across these latitudes. Heterozygosity values were similar among

populations (Table 3.1). All populations had significantly positive values of FIS. These positive

FIS values are most likely the result of allele dropout, a process in which only one of two alleles

is sequenced, possibly due to a SNP in the restriction cut site (Gautier et al., 2013). Additionally,

it is possible that alternative alleles for a particular locus may have diverged enough that they

ended up in different assembled loci.

Population Structure

Our STRUCTURE analysis of our restricted set (80% data present) of 1,029 SNPs

indicated the presence of three genetic clusters that correspond to the original USDA sampling

locations (Figure 3.2; Supplemental figure 3.3). After calculating various measures of genetic

differentiation, we found that the Texas vs. Canada comparison always produced the highest

level of average population structure across all loci (Table 3.2). When comparing the Nebraska

population individually to both Texas and Canada, for all metrics the Nebraska vs. Canada

56

comparison was always lowest. This means that Nebraska individuals share a higher proportion

of their genetic ancestry with northern compared to southern population. This is further

confirmed by visually inspecting the STRUCTURE bar chart (Fig. 3.2). Clearly, three

individuals have a visible level of genetic clustering with Canada. Additionally, when analyzing

the K = 2 level of structuring in this dataset, the Nebraska and Canadian populations were

merged into a single uniform cluster in 19 out of 20 runs, with Texas being assigned to its own

cluster.

FST Outliers

After identifying FST outliers using five separate runs within Arlequin, outliers were

filtered to retain only those that were found as an outlier in all five runs. This left us with 243 out

of 11,315 SNPs with consistently elevated levels of FST. These loci were located in both genic

(95) and intergenic (148) regions. FST outliers were found on all 17 chromosomes and co-

localized with multiple QTL for both flowering time and fatty acid biosynthesis (Table 3.3). Of

the 22 fatty acid and flowering time QTL, 14 had at least one FST outlier fall within the one LOD

interval (Table 3.3). However, when determining the amount of outliers one would expect to

randomly fall within QTL regions purely by chance, it was found that neither fatty acid or

flowering time traits showed any pattern of enrichment for FST outlier. In particular, two elevated

intergenic FST SNPs fell within a QTL for flowering time on chromosome 6. This is noteworthy

because previous work in sunflower has highlighted this region of the genome as playing a

strong role in adaptation. Originally, this genomic region drew interest in the context of the

genetics of sunflower domestication. Through genetic mapping (Burke et al., 2002; Burke et al.,

2005) and molecular evolutionary analyses (Blackman et al., 2010), this end of linkage group 6

57

was shown to harbor QTL for flowering time as well as a suite of duplicated FT genes which are

known to influence flowering time. Subsequently, this region of the genome has been shown to

play an important role in the wild. A low-density population genomic scan identified FT2 as

having elevated population structure compared to the balance of the genome when comparing a

range-wide sampling of wild H. annuus populations (McAssey et al., in prep). Additionally, this

gene has been shown to be differentially expressed across a latitudinal transect within the native

range of H. annuus, which further suggests a role in adaptation (Blackman et al., 2011a). The

extent of linkage disequilibrium in wild sunflower is unknown for this genomic region, so it is

currently impossible to know whether it represents a large island of elevated differentiation, or

several independent targets of selection. The GBS based outliers did not closely co-localize with

robust SNP outliers from a previous genome scan (McAssey et al., in prep). On LG 10 a SNP

chip outlier located within a GRAS transcription factor was about 1 MB away from a GBS

outlier in an unannotated gene. On LG 12 a SNP chip outlier in an EF-hand-like domain protein

was around 5MB away from the nearest GBS outlier. Finally, on LG 14 a GBS outlier near a

protein kinase was about 1 MB away from a SNP chip outlier within Defective in Cuticle Ridges.

Interestingly, three outlier loci were found in candidate pathways related to flowering

time, though none were found in the fatty acid biosynthetic pathway. A NAC domain-containing

protein is highly differentiated between Canada and the other two southern populations. In A.

thaliana, the ortholog of this gene has been shown to repress the activity of FT and hence repress

flowering (Ning et al., 2015). Interestingly this polymorphism co-localizes with a flowering time

QTL on chromosome 9. As FT2 has already been shown to be important in latitudinally

distributed wild sunflower populations (discussed above), the identification of this NAC domain-

58

containing protein as an FST outlier suggests that it might also play a role in latitudinal

adaptation. Again, a more complete picture of linkage disequilibrium in these two genomic

intervals is necessary to establish these genes as high quality adaptive candidates. Furthermore,

additional sampling will be required to confirm that latitudinal differentiation at these loci exists

in more than three populations.

Early in Short Days 7 (ENS7) was also identified as having a significantly elevated level

of genetic differentiation, although the pattern of differentiation was not correlated to latitude.

The high value of FST was due to both the Texas and Canadian populations being fixed for the

same allele; however, in Nebraska the alternative allele was found in high frequency. In A.

thaliana, a mutation in ENS7 has resulted in an early flowering phenotype (del Olmo et al.,

2010). Interestingly the mutation in ENS7 also appears to alter gene expression patterns of FT in

both long and short days (del Olmo et al., 2010). An additional significantly differentiated SNP

was found in Enhance Downy Mildew 2 (EDM2). This polymorphism exhibits latitudinal

differentiation in which Texas and Canada are fixed for different alleles whereas Nebraska is

intermediate in allele frequency. In addition to this gene having a role in plant defense, it has

been shown to be a positive regulator of flowering in A. thaliana (Tsuchiya and Eulgem 2010).

While the study of the genetic basis of adaptation has typically focused on coding

sequences, recent work has demonstrated functional relevance of intergenic sequences (e.g.,

Studer et al., 2011). Over half of the FST outlier SNPs identified herein were found in the

intergenic space / promoter regions of the sunflower genome. Here again, knowledge of the

extent of linkage disequilibrium in these regions will be needed to determine whether these

59

intergenic sequences are themselves likely to be involved in adaptive differentiation, perhaps due

to regulatory effects, or if they simply mark regions of elevated differentiation caused by

variation in a nearby gene.

Taken together, the results of this study indicate wild populations of H. annuus contain a

substantial level of genetic variation. A portion of this genetic variation was subsequently shown

to be exceptionally differentiated, which indicates its putative role in adaptation. These highly

structured polymorphisms are potential targets for introgression into cultivated germplasm.

While sunflower can be grown as far south as Texas, the majority of production is in the northern

central United States (sunflowernsa.com). As climate change continues in the coming decades,

the types of alleles that increase fitness in Texas over Canada could enhance (or at least

maintain) suitable growth and yields in cultivated lines. While the focus of this investigation was

on two traits of special importance to the sunflower community (flowering time and fatty acid

profile), future functional work on these FST outliers will be required to determine the precise

phenotypes affected by these genomic regions. Future research on the identified candidate

genomic regions must also include re-sequencing to establish the extent of linkage

disequilibrium in these important areas of the genome. The FST outlier regions containing well-

described flowering time homologs are consistent with the adaptive value of this trait and thus

are promising candidates for future work.

Acknowledgments

We would like to thank Stephan Schroeder and Greg Baute for helpful discussions

concerning library preparation. The Georgia Genomics Facility and the Georgia Advanced

60

Computing Resource Center provided valuable sequencing and bioinformatics support,

respectively.

61

References

Beaumont MA, and Nichols, R.A. (1996) Evaluating loci for use in the genetic analysis of

population structure. Proc R Soc Lond B 263: 1619-1626.

Blackman BK (2013) Interacting duplications, fluctuating selection, and convergence: the

complex dynamics of flowering time evolution during sunflower domestication. J Exp

Bot 64: 421-431.

Blackman BK, Michaels SD, Rieseberg LH (2011) Connecting the sun to flowering in sunflower

adaptation. Mol Ecol 20: 3503-3512.

Blackman BK, Rasmussen DA, Strasburg JL, Raduski AR, Burke JM, et al. (2011) Contributions

of flowering time genes to sunflower domestication and improvement. Genetics 187:

271-287.

Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH (2010) The role of

recently derived FT paralogs in sunflower domestication. Curr Biol 20: 629-635.

Bowers JE, Bachlava E, Brunick RL, Rieseberg LH, Knapp SJ, et al. (2012) Development of a

10,000 locus genetic map of the sunflower genome based on multiple crosses. G3

(Bethesda) 2: 721-729.



Burke JM, Tang S, Knapp SJ, Rieseberg LH (2002) Genetic analysis of sunflower domestication.

Genetics 161: 1257-1267.

Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool

set for population genomics. Mol Ecol 22: 3124-3140.

del Olmo I, Lopez-Gonzalez L, Martin-Trillo MM, Martinez-Zapater JM, Pineiro M, et al.

(2010) EARLY IN SHORT DAYS 7 (ESD7) encodes the catalytic subunit of DNA

polymerase epsilon and is required for flowering repression through a mechanism

involving epigenetic gene silencing. Plant J 61: 623-636.



e19379.

Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the

software STRUCTURE: a simulation study. Mol Ecol 14: 2611-2620.

Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform

population genetics analyses under Linux and Windows. Mol Ecol Resour 10: 564-567.

62

Fabian DK, Kapun M, Nolte V, Kofler R, Schmidt PS, et al. (2012) Genome-wide patterns of

latitudinal differentiation among populations of Drosophila melanogaster from North

America. Mol Ecol 21: 4748-4769.

Flint-Garcia SA, Thornsberry JM, Buckler ESt (2003) Structure of linkage disequilibrium in

plants. Annu Rev Plant Biol 54: 357-374.

Friesen ML, von Wettberg, E.J.B., Badri, M., Moriuchi, K.S., Barhoumi, F., Chang, P.L.,

Cuellar-Ortiz, S., Cordeiro, M.A., Vu, W.T., Arraouadi, S., Djebali, N., Zribi, K., Badri,

Y., Porter, S.S., Aouani, M.E., Cook, D.R., Strauss, S.Y., and Nuzhdin, S.V. (2014) The

ecological genomic basis of salinity adaptation in Tunisian Medicago truncatula. BMC

Genomics 15: 1160.

Friesen ML, Cordeiro MA, Penmetsa RV, Badri M, Huguet T, et al. (2010) Population genomic

analysis of Tunisian Medicago truncatula reveals candidates for local adaptation. Plant J

63: 623-635.

Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhue C, et al. (2013) The effect of RAD allele

dropout on the estimation of genetic variation within and between populations. Mol Ecol

22: 3165-3178.

Hereford J (2009) A quantitative survey of local adaptation and fitness trade-offs. Am Nat 173:

579-588.

Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, et al. (2010) Population genomics

of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet

6: e1000862.

Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9: 868-

877.

Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters 7: 1225-

1241.

Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:

357-359.

Linder CR (2000) Adaptive evolution of seed oils in plants: Accounting for the biogeographic

distribution of saturated and unsaturated fatty acids in seed oils. The American Naturalist

156: 442-458.

Lowry DB, Hernandez K, Taylor SH, Meyer E, Logan TL, et al. (2015) The genetics of

divergence and reproductive isolation between ecotypes of Panicum hallii. New Phytol

205: 402-414.

63

Ning YQ, Ma ZY, Huang HW, Mo H, Zhao TT, et al. (2015) Two novel NAC transcription

factors regulate gene expression and flowering time by associating with the histone

demethylase JMJ14. Nucleic Acids Res 43: 1469-1484.

Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012) Development of high-density genetic

maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing

approach. PLoS One 7: e32253.

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus

genotype data. Genetics 155: 945-959.

Savolainen O, Lascoux M, Merila J (2013) Ecological genomics of local adaptation. Nat Rev

Genet 14: 807-820.

Studer A, Zhao Q, Ross-Ibarra J, Doebley J (2011) Identification of a functional transposon

insertion in the maize domestication gene tb1. Nat Genet 43: 1160-1163.

Tsuchiya T, Eulgem T (2010) The Arabidopsis defense component EDM2 affects the floral

transition in an FLC-dependent manner. Plant J 62: 518-528.

Turner TL, Bourne EC, Von Wettberg EJ, Hu TT, Nuzhdin SV (2010) Population resequencing

reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nat Genet 42: 260-263.

Wills DM, Burke JM (2007) Quantitative trait locus analysis of the early domestication of

sunflower. Genetics 176: 2589-2599.

Wright S (1951) The genetical structure of populations. Ann Eugen 15: 323-354.

64

Tables

Table 3.1 – Levels of population genetic diversity in three populations across a latitudinal gradient in North America.

Population # of

Stacks

#

Monomorphic

#

Polymorphic H SE

H

Div SE Ho SE He SE FIS SE

Texas 12773 2068 10705 3.09 0.01 0.69 0.005 0.14 0.001 0.27 0.001 0.41 0.003

Nebraska 10796 1202 9594 3.40 0.01 0.71 0.005 0.13 0.001 0.25 0.001 0.40 0.003

Canada 13522 2393 11129 3.00 0.01 0.66 0.004 0.14 0.001 0.27 0.001 0.42 0.003

65

Table 3.2 – Pairwise population structure among three wild sunflower populations.

Comparison # of SNPs AMOVA FST SE PhiST SE FST prime SE DEST SE

Texas vs. Nebraska 10375 0.15 0.002 0.23 0.003 0.26 0.004 0.26 0.004

Texas vs. Canada 10822 0.19 0.002 0.28 0.003 0.31 0.004 0.31 0.004

Nebraska vs. Canada 10030 0.13 0.002 0.21 0.003 0.23 0.003 0.23 0.003

66

Table 3.3 – QTL co-localization

Trait Chromosome cM One LOD interval Number of FST outliers

Days to Flower a 1 17.2 10.8-21.5 0

Days to Flower a 4 61.3 53.9-64.4 2

Days to Flower a 6 52.2 48.2-54.6 2

Days to Flower a 7 2 0.0-21.7 3

Days to Flower a 8 52.7 49.3-59.0 1

Days to Flower a 8 79.3 77.0-81.3 0

Days to Flower a 9 11.3 0.0-18.8 0

Days to Flower a 9 53.5 49.5-54.3 2

Days to Flower a 17 39.2 35.6-42.6 0

Days to Flower a 17 58.9 50.9-64.9 0

Days to Flower b 6 57.6 53.6-57.7 0

Days to Flower b 7 1 0.0-5.3 1

Days to Flower b 15 57.1 57.0-58.2 2

% Palmitic acid c 6 26.3 18.4-41.7 3

% Palmitic acid c 17 39.6 33.8-43.1 0

% Stearic acid c 6 24.3 20.3-28.3 1

% Stearic acid c 10 50.7 44.7-62.2 3

% Oleic acid c 1 19.5 3.3-25.5 0

% Oleic acid c 3 63.3 41.9-69.3 7

% Oleic acid c 6 22.3 16.4-28.3 1

% Linoleic acid c 3 47.9 39.9-61.2 6

% Linoleic acid c 6 24.3 18.4-28.3 1

a Burke et al., 2002;

b Wills and Burke 2007;

c Burke et al., 2005

67

Figure 3.1 – Original USDA sampling location for the three populations genotyped in this study

68

Figure 3.2 – Bar plot indicating the proportion of membership to three genetic clusters as

identified in the program STRUCTURE

Texas Nebraska Canada

69

CHAPTER IV

TRANSCRIPTOMIC ANALYSIS OF DEVELOPING SEEDS ACROSS THE RANGE OF

WILD SUNFLOWER3

3 McAssey, E.V., Burke, J.M. to be submitted to American Journal of Botany

70

Abstract

Ecologists have identified numerous adaptive traits through classic reciprocal transplant

experiments. However, the genetic basis of these traits has only been investigated in a handful of

organisms to date. High throughput sequencing approaches like RNA-sequencing (RNA-seq)

have the ability to address questions of adaptation in non-model organisms. Wild populations of

Helianthus annuus L. have previously been shown to be adaptively differentiated for the

percentage of saturated fatty acids within seeds. Southern populations are known to contain a

higher proportion of saturated fatty acids in their seeds, which results in quicker growth under

warmer temperatures compared to northern populations. I used RNA-seq to make a wild

sunflower transcriptome assembly that was then used to test for differential expression when

comparing RNA derived from developing sunflower seeds from common garden grown northern

and southern populations. I found 1788 significantly differentially expressed isoforms located

throughout the sunflower genome. A number of these isoforms, FAD2, FAD8, FAB1, FAB2, and

FATA were derived from homologs of genes existing in the Arabidopsis thaliana fatty acid

KEGG pathways. Additionally, differentially expressed isoforms were found to exist in a number

of fatty acid quantitative trait loci intervals from previously published work, although only one

of the isoforms, FATA, was annotated as being a fatty acid homolog. The most differentially

expressed isoforms were found to be enriched for gene ontology terms such as lipid transport, in

addition to other less clear terms like defense and stress response. These differentially expressed

isoforms represent candidate genes for future functional work to directly establish a relationship

between gene expression and trait differentiation.

71

Introduction

Adaptation to divergent environments (i.e., local adaptation) requires heritable genetic

changes that confer higher fitness to an individual in one habitat over another habitat.

Historically the most sought after molecular evidence for adaptation focused on mutations that

altered the protein coding sequence of a gene. Non-synonymous nucleotide substitutions have

the potential to influence gene function, and novel alleles arising through such a process could be

favored in certain environments. In such cases, different alleles could come to predominate in

different populations. Alternatively, regulatory differences resulting in divergent expression

patterns could play an important role in adaptation in the absence of coding sequence

polymorphisms. The relative importance of coding sequence and regulatory mutations has been

extensively debated (King and Wilson 1975; Andolfatto 2005; Hoekstra and Coyne 2007). In

plant biology, when looking at the types of mutations and genes that were the targets of crop

domestication, there appears to be a big role for transcription factors (Doebley et al., 2006).

Recent advances in sequencing technology have made possible the efficient characterization of

gene expression on a genome-wide scale, which has enabled studies of transcriptional variation

in both model and non-model species (Wang et al., 2009).

Low throughput techniques requiring a priori sequence information were a major

limitation to the study of gene expression in non-model organisms. Now levels of gene

expression have been assayed in a variety of species including diploid and polyploid Glycine

species (Ilut et al., 2012), sea urchins (Pespeni et al., 2013), black-faced blenny (Schunter et al.,

2014), European silver fir (Behringer et al., 2015), and the salt tolerant species Suaeda fruticosa

(Diray-Arce et al., 2015). In the genus Helianthus, a group containing cultivated sunflower,

72

RNA-seq is already being used for a variety of applications (Renaut et al., 2012; Baute et al.,

2015; Renaut and Rieseberg 2015) including an analysis of differential expression between wild

sunflower ecotypes that varied in the amount of woody growth (Moyers and Rieseberg 2013).

In addition to secondary growth, populations of wild sunflower contain genetic variation

for seed traits. For plants, seeds represent a crucial transitional state where various genetic and

environmental cues are required to establish proper early growth. For example, dormancy

mechanisms have resulted in seed germination primarily occurring when environmental

conditions are ideal. This is thought to occur through a variety of mechanisms including the

sensing of temperature and light (Footitt et al., 2013). An additional crucial stage of seeds

maturation is in early development, when seeds acquire crucial provisions in the form of seed

storage proteins and oils (Shewry et al., 1995; Ruuska et al., 2002). These molecules provide raw

materials and energy for the early growth of a seedling and can be focal points for adaptive

differentiation between populations.

In angiosperms, for example, trends in seed oil composition have been detected across

latitudinal gradients in a wide variety of species, suggesting that this trait may play an important

role in latitudinal adaption (Linder 2000). Plant seed oils consist of a variety of saturated and

unsaturated fatty acids. These molecules vary in melting point due to the presence of double

bonds in unsaturated fatty acids. Unsaturated fatty acids have lower melting points due to their

inability to pack tightly together. Unsaturated fatty acids are less energy rich because there is a

cost to forming their characteristic double bonds. In a meta-analysis of data from a wide variety

of plant species, Linder (2000) found that low latitude species/populations had a relatively higher

73

proportion of saturated fatty acids in their seeds. Linder (2000) reasoned that if the melting point

of fatty acids was adaptive, low latitude families, species, and populations would have a higher

relative amount of saturated fatty acids because the comparatively warmer germination

temperatures would allow individuals to use saturated fatty acids. On a related note, low

temperatures in high latitudes would cause saturated fatty acid to remain solid and unusable. For

example, at the intraspecific level, it was found that northern populations of Helianthus annuus

L. contained significantly less saturated fatty acids compared to southern populations, which was

in line with theoretical predictions (Linder 2000). A further characterization of saturated fatty

acids across a latitudinal gradient showed that this pattern was rather gradual across the continent

as opposed to a large clear disjunction in trait values (McAssey et al., in prep).

A reciprocal transplant experiment showed that southern genotypes of H. annuus grew

significantly faster than northern genotypes when placed in a warm germination environment

(Linder 2000). Correspondingly, northern genotypes germinated earlier than southern genotypes

when both were grown in a cool environment. The above pattern is consistent with local

adaptation; the respective home populations perform better in their native environment compared

to a foreign environment. While the ecological patterns and functional relevance of the trait have

been fairly well established, we know comparatively little about the underlying genetics that

mediate this adaptive phenotype. The fatty acid biosynthesis pathway has been worked out in the

model plant species Arabidopsis thaliana and involves genes responsible for elongation of the

fatty acid chain, desaturation of parts of the chain (if necessary), and then export for storage or

usage (Ohlrogge and Browse 1995).

74

The most important type of genes for conferring this difference in germination under

different temperature regimes would be any gene that changes the average melting point of the

resulting fatty acids. Therefore, we decided to investigate the genes in the fatty acid pathway

across the range of wild sunflower. As we already know that there is a phenotypic difference

between high and low latitude populations (Linder 2000; McAssey et al., in prep) we expect to

uncover genes that are differentially expressed in a direction consistent with the accumulation of

more saturated fatty acids in low latitude individuals. To establish whether or not this pattern

exists in wild sunflower, we performed RNA-sequencing of individuals from the southern and

northern ends of the native range. By performing tests of differential expression between regions,

we present a list of candidate genes for playing a role in the development of this trait.

Furthermore, we analyze these candidates in terms of their position in the fatty acid pathway,

their co-localization with known QTL for fatty acid profile, and collectively in terms of whether

or not gene ontology terms are over-represented in the differentially expressed isoforms relative

to the frequency of terms found in all of the seed expressed isoforms.


Plant Materials

Seeds from six populations representing the northern and southern ends of the wild

sunflower range in North America (three from Saskatchewan, Canada and three from Texas,

USA; Table 4.1) were rinsed with 3% hydrogen peroxide before being placed on moist paper

towels in a darkened cold room at 4 C for two weeks to break dormancy. Seeds used in this study

were produced from crossing USDA individuals from a previous experiment (McAssey et al., in

prep). After two weeks, seeds were placed in a growth room at 23 C and maintained under 16

75

hour days : 8 hour nights with artificial light. Once seedlings developed a substantial hypocotyl

and radicle, they were transplanted into flats of soil. Established seedlings were then moved into

a greenhouse, transplanted into soil pots, and populations of plants were arranged randomly in

the greenhouse. Flowering heads were bagged once buds began to develop in order to prevent

unintended cross pollination. Since wild sunflower is self-incompatible, I manually cross-

pollinated pairs of individuals originating from the same population. This involved removing the

pollination bags from the two focal plants, collecting pollen from both plants directly into a petri

dish, and then using a paint brush to apply pollen onto the same two plants. Heads were then re-

bagged and plants were allowed to set seed. Fifteen days after pollination, eight achenes (i.e.,

single-seed fruits) were collected from the center of each developing seed head, placed into 1.5

mL tubes, and frozen in liquid nitrogen. These tubes were then stored at -80 C until RNA

extraction.

RNA Extraction

The eight frozen achenes from each maternal plant were ground in a mortar and pestle

using liquid nitrogen and a pinch of PVPP. The ground tissue was then transferred into a tube

and placed in liquid nitrogen. After removing the tube from liquid nitrogen storage, one mL of

Trizol was added to each tube. The contents of each tube were then mixed and allowed to

incubate at room temperature for five minutes. Chloroform (300 µl) was then added to each tube

and the tubes were manually shaken and then centrifuged at 12,000 G for 10 minutes. The

aqueous phase of each sample was then removed via pipetting and transferred to a new tube.

After mixing with a 0.53X volume of 100% ethanol, the solution was applied to a Qiagen

76

RNeasy Plant Mini protocol (Valencia, CA, USA) with on-column DNase digest for RNA

purification.

Library Construction

RNA quality was assessed using a Bioanalyzer RNA chip (Agilent, Santa Clara, CA). All

samples (nine from Texas, eleven from Canada; Table 4.1) used for library construction had a

RIN value of at least 8.5 out of 10. Libraries were constructed using a Kapa mRNA-seq kit

(Kapa Biosystems, Wilmington, MA). This kit utilizes magnetic beads to perform size selection

and poly-A tail selection. Libraries were constructed to include a size range of approximately

200-500 bp, and size ranges were checked using a fragment analyzer (Advanced Analytical

Technologies, Ankeny, IA). Individual libraries were then quantified via qPCR using Illumina

standards. Equimolar amounts of library were then pooled into a single tube and submitted for

Illumina NextSeq 500 PE75 sequencing at the Georgia Genomics Facility (Athens, GA).

Transcriptome Assembly and Tests for Differential Expression

Adaptor sequences were trimmed from reads prior to quality filtering using Trimmomatic

(Bolger et al., 2014). With one exception, all reads from each library were used to produce a

single de novo assembly in Trinity (Gragherr et al., 2011). The lone exception was dramatically

overrepresented in the raw data (123 million fragments sequenced with paired-end reads), and

therefore 9.5 million paired-end reads were subsampled from this library to be used in the

assembly. We used the Trinity option –min_kmer_cov 2 to reduce the computing power

necessary to assemble this large dataset. We then performed a BLASTx comparing all assembled

transcripts to the A. thaliana proteome. Isoforms whose E-value was greater than 1 x 10-5

were

77

removed from the assembly. Reads were then mapped to the BLAST-filtered assembly using

Bowtie2 (Langmead and Salzberg 2012) and a matrix of read counts from each individual

mapping to each isoform was constructed using Rsem (Li and Dewey 2011). This matrix was

then used as input into EdgeR (Robinson et al., 2010) to test for differential expression. One

library was under-sequenced and removed from analyses of differential gene expression. The

remaining 19 libraries were split into a Saskatchewan and Texas group and then tested for

differential expression using a fisher’s exact test. Only isoforms that were expressed in eight

individuals at the level of at least three counts per million were retained for tests of differential

expression. Significantly differentially expressed isoforms were determined based on an FDR <

0.05. Furthermore, we used the physical position of sunflower genes to perform an auto-

correlation analysis. Specifically, we calculated the physical distance between each expressed

gene on an individual chromosome in order to form a matrix. When multiple trinity isoforms

mapped to the same gene in the genome assembly we arbitrarily chose a single isoform for

creating the matrix. In a similar fashion, a distance matrix of log fold change was created to

compare expression differences between all combinations of genes on each individual

chromosome. For each chromosome, the physical distance and log fold change matrices were

compared using a mantel test as implemented in the R package Ade4 with 999 permutations.

Gene Ontology Term Enrichment and Candidate Gene Identification

Ermine J (Gillis et al., 2010) was used to test for gene ontology term enrichment in my

differential expression dataset. Specifically, Ermine J first looks at a list expressed isoforms

sorted by p-value, and then establishes whether or not particular GO terms are found

78

preferentially at the top of the ranked list (i.e., GO terms associated with DE isoforms). This was

done using receiver operator characteristic scoring looking at GO terms present between 10 and

200 times in the dataset. GO terms were considered to be significantly enriched if their FDR

value was less than 0.1. The best blast hit IDs for differentially expressed genes were compared

to the A. thaliana IDs that make up various fatty acid pathways (pathways used: biosynthesis of

unsaturated fatty acids, fatty acid biosynthesis, fatty acid elongation, fatty acid degradation)

found in the KEGG database (Kanehisa et al., 2015). Here, we assessed whether or not the

patterns of differential expression were consistent with known phenotypes. In other words, for

fatty acid desaturases, we assessed whether or not they were more highly expressed in Canadian

plants relative to Texas plants as would be predicted given theory and known phenotypic data.

Analysis of Fatty Acid QTL Regions

All transcripts were mapped using BLASTn to mRNA sequences derived from the

sunflower genome. These mRNA sequences have been previously connected to specific genes in

A. thaliana by using BLASTp. After mapping these sequences, we determined the number and

identity of differentially expressed transcripts occurring within QTL for fatty acid profile from

previously published research (Burke et al., 2005). Specifically, we asked whether any of the

differentially expressed transcripts co-localizing with QTL were annotated as being related to

fatty acid biosynthesis/processing. This was done by first identifying the flanking scaffolds

surrounding the QTL interval from the consensus genetic map of the sunflower genome (Bowers

et al., 2012). The locations of these flanking scaffolds were then identified in the current genome

assembly [HA412.v1.1.bronze.20141015]. The QTL interval was then scanned to identify the

presence of differentially expressed isoforms.

79

Results

After de-multiplexing, we determined that each library had an average of ca. 15 million

fragments sequenced with paired-end reads (Figure 4.1). After filtering the assembly via a

BLASTx to the A. thaliana proteome, the assembly contained 79,286 isoforms corresponding to

a N50 of 1219 bp. The differential expression analysis identified 1,798 of the 30,943 tested

isoforms as being significantly differentially expressed between Texas and Canada. Of these, 997

had significantly higher expression in Canada relative to Texas, and 801 had significantly higher

expression in Texas relative to Canada (Supplemental table 4.1; Supplemental table 4.2).

Differentially expressed isoforms were found on all 17 chromosomes (Figure 4.2). After

correcting for multiple testing, none of the 17 sunflower chromosomes exhibited an association

between physical distance and similarity in expression patterns (Supplemental table 4.3). In other

words, just because genes are adjacent to one another there is no evidence that they are expressed

at a similar level.

Numerous genes related to fatty acid biosynthesis, modification, and breakdown were

differentially expressed (Table 4.2). Fatty acid desaturase 2 (FAD2), fatty acid desaturase 8

(FAD8), and fatty acid biosynthesis 2 (FAB2) all play a role in producing unsaturated fatty acids,

and all were differentially expressed in developing seeds (Table 4.2). FAD2 and FAB2 were

more highly expressed in Canada relative to Texas, whereas FAD8 was more highly expressed in

Texas. Homologs of the fatty acid degradation genes alcohol dehydrogenase 1 and a zinc

binding alcohol dehydrogenase were both more highly expressed in Texas relative to a Canada

whereas acetoacetyl-CoA thiolase 1, long chain acyl-CoA synthetase 9, and multifunctional

protein 2 were more highly expressed in Canada. Another FAB gene, FAB1, was differentially

80

expressed in the same direction as FAB2 (higher expression in Canada). Genes that control the

usage of fatty acids were also differentially expressed, including a homolog of delta(2)-enoyl

CoA isomerase 3 (higher expression in Texas).

Differentially Expressed Isoform Co-localization with QTL

We identified 166 instances of differentially expressed isoforms co-localizing with a

quantitative trait locus for one of the four main fatty acids in sunflower seed oil: palmitic, stearic,

oleic, and linoleic acid (Burke et al., 2005). These 166 instances of co-localization corresponded

to 130 genes due to multiple DE isoforms mapping to the same sunflower gene. After identifying

genes that co-localize with multiple QTL on linkage groups 3 and 6, we were able to establish

that 95 unique genes in the sunflower genome are both differentially expressed and co-localize

with fatty acid QTL (Table 4.3). Of these 95 differentially expressed genes, only one fatty acid

isoform, a homolog of acyl-ACP thioesterase (FATA), co-localized with a previously mapped

fatty acid QTL – an oleic acid QTL on chromosome 1 (Table 4.3). It should be noted that based

on the amount of genetic space occupied by fatty acid QTL, one would expect about 190

differentially expressed isoforms to co-localize. Thus, in this dataset there is no evidence for

enrichment of differentially expressed genes being located within fatty acid QTL.

Gene Ontology Enrichment

By analyzing the entire list of 30,943 expressed isoforms at our filtering standards (eight

individuals; counts per million greater than 3) we identified gene ontology terms that were

preferentially found among statistically significant isoforms (Table 4.4). When looking at

81

biological processes, we found a variety of terms including embryo development, and lipid

localization, that were significant at a FDR cutoff of 0.1.

Discussion

Analyses of differential gene expression have the potential to provide unique insights into

the molecular mechanisms underlying adaptive differences. Furthermore, by comparing

populations from opposite ends of the natural range, we likely are sampling gene expression

patterns that affect a number of adaptive aspects of seed biology. Differences in gene expression,

and presumably protein accumulation, affect the very first stages of growth. Therefore, this

developmental stage is crucial for lifetime fitness, and differentially expressed genes in this stage

may fundamentally affect adult stature despite only being differentially expressed very early in

development. By establishing whether or not differentially isoforms are implicated in fatty acid

biosynthesis in the model organism A. thaliana, it is possible to easily narrow the list of

candidate genes for conferring the adaptive trait of fatty acid profile.

Differential Expression of Fatty Acid Genes

The large amount of past work on the biochemistry of seed oil biosynthesis has allowed

us to prioritize candidate pathways. In particular, the elevated expression of desaturases, which

effectively reduce the melting point of available fatty acids, ought to be favored in high latitude

populations according to theory (Linder 2000). Consistent with this expectation, we identified a

homolog of FAD2 as being significantly more highly expressed in plants from Canada vs. those

from Texas. The FAD2 protein is responsible for converting 18:1 oleic acid into 18:2 linoleic

acid (Chen et al., 2011; Liu et al., 2013). FAD2 has been the focus of numerous investigations as

82

it is an attractive breeding target for altering the fatty acid profile in commercial seed oils (Patel

et al., 2004; Pham et al., 2010). Functional work in A. thaliana has further indicated that de-

activation of this gene results in over accumulation of oleic acid and subsequent delayed

germination in cool temperatures compared to wild type (Miquel and Browse 1994). Work in

cultivated sunflower has shown that down-regulation of a different copy of FAD2 on LG 14,

known as FAD2-1, is responsible for oleic acid accumulation in so-called high oleic cultivars

(Schuppert et al., 2006; Lacombe et al., 2009). In this dataset, FAD2-1 corresponded to four

isoforms that were all expressed at high levels across all libraries; as such, there was no evidence

of differential expression for this gene (all FDR > 0.05). The additional copy of FAD2 on LG 6

may, however, play a role in determining the observed difference in oil composition across the

range of wild sunflower. It should be noted that the variance in expression of FAD2 in northern

libraries is quite high and that many libraries only have modest levels of expression despite this

gene being identified as differentially expressed.

FAB2 is an enzyme that plays a role in the conversion of 18:0 fatty acids to 18:1 mono-

unsaturated oleic acid (Lightner et al., 1994). Consistent with expectations, this gene also shows

a latitudinal trend in expression whereby northern individuals exhibit higher expression than

southern individuals (80 vs. 40 FPKM; Table 4.2). In contrast, another differentially expressed

fatty acid desaturase, FAD8, exhibited higher expression in southern vs. northern individuals (32

vs. 9 FPKM; Table 4.2). This gene has been implicated in the production of 18:3 linolenic acid

from 18:2 linoleic acid (Gibson et al., 1994). In sunflower, linolenic fatty acid is generally not

seen in seeds but rather can be found in high amounts in leaves (Cantisan et al., 1999). The low

melting point conferred by highly unsaturated fatty acids like linolenic acid may be beneficial in

leaf tissue when plants are experiencing cold temperatures (Gibson et al., 1994).

83

In addition to fatty acid biosynthesis and modification, the degradation of fatty acids

could play a role in establishing the phenotypic difference seen in fatty acid profile of

latitudinally distributed populations. A homolog of delta(2)-enoyl CoA isomerase 3 (ECI3) is

more highly expressed in plants from Texas compared to Canada (Two isoforms, 17 vs. 6, and

14 vs. 6 FPKM; Table 4.2). ECI3 is known to play a role in the breakdown of linoleic (and

linolenic) fatty acids. However, biochemical evidence suggests that the Arabidopsis copy of

ECI3 is not found in the peroxisome, where fatty acid breakdown occurs, despite being able to

catalyze the reaction (Goepfert et al., 2008). The precise location of ECI3 protein within the cells

of H. annuus seeds remains unknown. A homolog of multifunctional protein 2 (MFP2) was more

highly expressed (although generally weak expression overall) in Canadian individuals relative

to those from Texas (8 vs. <1 FPKM; Table 4.2). MFP2 is has been shown to be quite important

for breaking down fatty acids in germinating seeds (Rylott et al., 2006). The extent to which both

MFP2 and ECI3 play functional roles in developing sunflower seeds will require follow up

biochemical experiments.

This set of differentially expressed genes can be used to further curate previously

described lists of candidates. Specifically, individuals derived from these latitudes (as well as a

middle latitude in Nebraska) have been genotyped with GBS markers. These markers were used

to uncover the population genetic signal of local adaptation occurring across latitudes in wild

sunflower. Ultimately 243 SNP loci (out of 11,315) showed significantly elevated population

differentiation (McAssey et al., in prep). Of these candidate loci for local adaptation, two were

differentially when querying the GBS outliers against this study’s differentially expressed gene

list. One is a microtubule based kinesin motor protein on LG 4 that has higher (although

relatively weak) expression in Canada compared to Texas. The other such gene was PEPC

84

(phosphoenolpyruvate carboxylase) homolog, which is found on LG 8. These gene was more

highly expressed in Canada relative to Texas, although the expression difference was modest and

as such its FDR value just barely was significant (FDR = .049). As these genes show both a

population genetic and a gene expression signal of adaptation, they represent interesting

candidates to pursue.

QTL Co-localization

A number of differentially expressed isoforms co-localized with QTL for various fatty

acid products. Larger QTL intervals were found to contain more differentially expressed

isoforms as identified by a linear regression (r2 = 0.91). A homolog of FATA co-localized with an

oleic acid QTL on LG 1 (Burke et al., 2005). FATA expression in A. thaliana appears to affect

the relative abundance of 18:2 and 18:3 fatty acids (Moreno-Perez et al., 2012) though, as noted

above, sunflower seeds do not typically accumulate 18:3 fatty acids in their seeds. As QTL

intervals are typically large, it is challenging to connect variation in a particular phenotype to a

single gene. For example, in the relatively small QTL intervals on LG 6 none of the four

differentially expressed genes have fatty acid gene annotations.

Gene Ontology Term Enrichment

The GO enrichment analysis revealed a number of terms associated with the most

significantly differentially expressed isoforms (Table 4.4). The presence of ‘lipid transport’ and

‘lipid localization’ as significant terms suggests broad differences between northern and southern

populations in how lipids are shuttled and organized within developing seeds. Other enriched GO

terms like ‘response to stress’ and ‘defense response’ were not initially expected to be

85

significant, as these plants were grown in a common garden. However, it is possible that stress

and defense genes could be differentially expressed due to northern and southern populations

experiencing the common garden greenhouse conditions in Georgia in different ways. For

example, the level of watering and greenhouse temperatures may have been perceived by

individuals from one end of the range as being rather benign (no stress response), whereas

individuals from the other end of the range might sense the same conditions as being stressful,

thereby eliciting a transcriptomic response.

Analyses of differential gene expression have revealed a considerable amount of

intraspecific diversity between latitudinally structured sunflower populations. As previous work

has strongly implicated the higher saturated fatty acid content in southern populations as being

adaptive in H. annuus, the reported results have generated numerous candidates related to the

molecular basis of this adaptive trait in developing seeds. Further assessment of the functional

relevance of these candidate genes will require additional experimentation. This work could

include quantification of expression levels across multiple developmental stages to gain a more

nuanced understanding of how these genes are regulated. As a complement to such work,

transformation of sunflower alleles into a heterologous system (A. thaliana), as has been done

previously (e.g., Blackman et al., 2010), has the potential to provide direct insights into the

functional relevance of these genes in sunflower. Taken together, this work illustrates the utility

of transcriptomic analyses for the study of locally adaptive traits. As the price of sequencing

continues to drop and throughput increases, such analyses are likely to become routine, even in

species with minimal genomic resources. In particular, by pairing RNA-sequencing experiments

with previous reciprocal transplant evidence it may be possible to conclusively demonstrate the

beneficial genotype by environment interaction that defines local adaptation.

86

Acknowledgements

We would like to thank Greg Cousins for greenhouse assistance. Savithri Nambeesan

provided guidance concerning tissue sampling and RNA extraction. The Georgia Advanced

Computing Resource Center (GACRC) provided computational support throughout this project.

Karolina Heyduk provided valuable bioinformatic assistance. This work was supported by a

grant from the NSF Plant Genome Research Program (DBI-0820451 to JMB).

87

References

Andolfatto P (2005) Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149-

1152.

Baute GJ, Kane NC, Grassa CJ, Lai Z, Rieseberg LH (2015) Genome scans reveal candidate

domestication and improvement genes in cultivated sunflower, as well as post-

domestication introgression with wild relatives. New Phytol 206: 830-838.

Behringer D, Zimmermann H, Ziegenhagen B, Liepelt S (2015) Differential gene expression

reveals candidate genes for drought stress response in Abies alba (Pinaceae). PLoS One

10: e0124564.

Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH (2010) The role of

recently derived FT paralogs in sunflower domestication. Current Biology 20: 629-635.

Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence

data. Bioinformatics 30: 2114-2120.

Bowers JE, Bachlava E, Brunick RL, Rieseberg LH, Knapp SJ, et al. (2012) Development of a

10,000 locus genetic map of the sunflower genome based on multiple crosses. G3

(Bethesda) 2: 721-729.



Cantisan S, Martinez-Force E, Alvarez-Ortega R, Garces R (1999) Lipid characterization in

vegetative tissues of high saturated fatty acid sunflower mutants. J Agric Food Chem 47:

78-82.

Chen W, Song K, Cai Y, Li W, Liu B, et al. (2011) Genetic modification of soybean with a novel

grafting technique: Downregulating the FAD2-1 gene increases oleic acid content. Plant

Molecular Biology Reporter 29: 866-874.

Diray-Arce J, Clement M, Gul B, Khan MA, Nielsen BL (2015) Transcriptome assembly,

profiling and differential gene expression analysis of the halophyte Suaeda fruticosa

provides insights into salt tolerance. BMC Genomics 16: 353.

Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127:

1309-1321.

Footitt S, Huang Z, Clay HA, Mead A, Finch-Savage WE (2013) Temperature, light and nitrate

sensing coordinate Arabidopsis seed dormancy cycling, resulting in winter and summer

annual phenotypes. Plant J 74: 1003-1015.

88

Gibson S, Arondel V, Iba K, Somerville C (1994) Cloning of a temperature-regulated gene

encoding a chloroplast omega-3 desaturase from Arabidopsis thaliana. Plant Physiol 106:

1615-1621.

Gillis J, Mistry M, Pavlidis P (2010) Gene function analysis in complex data sets using ErmineJ.

Nat Protoc 5: 1148-1159.

Goepfert S, Vidoudez C, Tellgren-Roth C, Delessert S, Hiltunen JK, et al. (2008) Peroxisomal

Delta(3),Delta(2)-enoyl CoA isomerases and evolution of cytosolic paralogues in

embryophytes. Plant J 56: 728-742.

Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, et al. (2011) Full-length

transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol

29: 644-652.

Hoekstra HE, Coyne JA (2007) The locus of evolution: evo devo and the genetics of adaptation.

Evolution 61: 995-1016.

Hubner S, Korol AB, Schmid KJ (2015) RNA-Seq analysis identifies genes associated with

differential reproductive success under drought-stress in accessions of wild barley

Hordeum spontaneum. BMC Plant Biol 15: 134.

Ilut DC, Coate JE, Luciano AK, Owens TG, May GD, et al. (2012) A comparative

transcriptomic study of an allotetraploid and its diploid progenitors illustrates the unique

advantages and challenges of RNA-seq in plant species. Am J Bot 99: 383-396.

Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2015) KEGG as a reference

resource for gene and protein annotation. Nucleic Acids Res.

King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188:

107-116.

Lacombe S, Souyris I, Berville AJ (2009) An insertion of oleate desaturase homologous

sequence silences via siRNA the functional gene leading to high oleic acid content in

sunflower seed oil. Mol Genet Genomics 281: 43-54.

Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:

357-359.

Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or

without a reference genome. BMC Bioinformatics 12: 323.

Lightner J, Wu J, Browse J (1994) A Mutant of Arabidopsis with Increased Levels of Stearic

Acid. Plant Physiol 106: 1443-1451.

89


distribution of saturated and unsaturated fatty acids in seed oils. The American Naturalist

156: 442-458.

Liu Q, Cao S, Zhou XR, Wood C, Green A, et al. (2013) Nonsense-mediated mRNA degradation

of CtFAD2-1 and development of a perfect molecular marker for olol mutation in high

oleic safflower (Carthamus tinctorius L.). Theor Appl Genet 126: 2219-2231.

Mathilde G, Ghislaine G, Daniel V, Georges P (2003) The Arabidopsis MEI1 gene encodes a

protein with five BRCT domains that is involved in meiosis-specific DNA repair events

independent of SPO11-induced DSBs. The Plant Journal 35: 465-475.

Miquel MF, Browse, J.A. (1995) High-oleate oilseeds fail to develop at low temperatures. Plant

Physiology 106: 421-427.

Moreno-Perez AJ, Venegas-Caleron M, Vaistij FE, Salas JJ, Larson TR, et al. (2012) Reduced

expression of FatA thioesterases in Arabidopsis affects the oil content and fatty acid

composition of the seeds. Planta 235: 629-639.

Moyers BT, Rieseberg LH (2013) Divergence in gene expression is uncoupled from divergence

in coding sequence in a secondarily woody sunflower. International Journal of Plant

Sciences 174: 1079-1089.

Ohlrogge J, Browse J (1995) Lipid biosynthesis. Plant Cell 7: 957-970.

Patel M, Jung S, Moore K, Powell G, Ainsworth C, et al. (2004) High-oleate peanut mutants

result from a MITE insertion into the FAD2 gene. Theor Appl Genet 108: 1492-1502.

Pespeni MH, Barney BT, Palumbi SR (2013) Differences in the regulation of growth and

biomineralization genes revealed through long-term common-garden acclimation and

experimental genomics in the purple sea urchin. Evolution 67: 1901-1914.

Pham A, Lee, J., Shannon, J.G., Bilyeu, K.D. (2010) Mutant alleles of FAD2-1A and FAD2-1B

combine to produce soybeans with the high oleic acid seed oil trait. BMC Plant Biology

10: 195.

Renaut S, Grassa CJ, Moyers BT, Kane NC, Rieseberg LH (2012) The population genomics of

sunflowers and genomic determinants of protein evolution revealed by RNAseq. Biology

(Basel) 1: 575-596.

Renaut S, Rieseberg LH (2015) The accumulation of deleterious mutations as a consequence of

domestication and improvement in sunflowers and other compositae crops. Mol Biol

Evol 32: 2273-2283.

Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential

expression analysis of digital gene expression data. Bioinformatics 26: 139-140.

90

Ruuska SA, Thomas, G., Benning, C., Ohlrogge, J.B. (2002) Contrapuntal Networks of Gene

Expression during Arabidopsis Seed Filling. The Plant Cell 14: 1191-1206.

Rylott EL, Eastmond PJ, Gilday AD, Slocombe SP, Larson TR, et al. (2006) The Arabidopsis

thaliana multifunctional protein gene (MFP2) of peroxisomal beta-oxidation is essential

for seedling establishment. Plant J 45: 930-941.

Schunter C, Vollmer SV, Macpherson E, Pascual M (2014) Transcriptome analyses and

differential gene expression in a non-model fish species with alternative mating tactics.

BMC Genomics 15: 167.

Schuppert GF, Tang, S., Slabaugh, M.B., Knapp, S.J. (2006) The sunflower high-oleic mutant Ol

carriers variable tandem repeats of FAD2-1, a seed-specific oleoyl-phosphatidyl choline

desaturase. Molecular Breeding 17: 241-256.

Shewry PR, Napier JA, Tatham AS (1995) Seed storage proteins: structures and biosynthesis.

Plant Cell 7: 945-956.

Wang L, Tiffin P, Olson MS (2014) Timing for success: expression phenotype and local

adaptation related to latitude in the boreal forest tree, Populus balsamifera. Tree Genetics

& Genomes 10: 911-922.

Wang W, Feng B, Xiao J, Xia Z, Zhou X, et al. (2014) Cassava genome from a wild ancestor to

cultivated varieties. Nat Commun 5: 5110.

Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat

Rev Genet 10: 57-63.

91

Tables

Table 4.1 – Original sample locations and associated sampling depth

USDA PI# Location Latitude Longitude N

413160 Texas, USA 31.03972222 -104.8302778 4

664692 Texas, USA 31.18916667 -103.5780556 1

468476 Texas, USA 31.27277778 -102.6922222 4

592320 Saskatchewan, Canada 50.0475 -104.7072222 4



92

Table 4.2 – Differentially expressed fatty acid isoforms

Gene ID At ID # of DE

isoforms

Kegg pathway

membership Log fold change FDR Chromosome

AACT1 AT5G47720 1 B, E -1.001467733 0.022680079 15

ADH1 AT1G77120 6 B 1.204962596 0.000644478 N/A

alpha/beta hydrolase super

family AT3G60340 1 D, E -0.712966159 0.036220746 13

BCCP AT5G16390 1 C, E -1.17269982 0.036700762 16

CAC2 AT5G35360 1 C, E -1.689500937 7.40321E-06 4

ECI3 AT4G14440 2 D 1.272111638 0.000299732 3

FAB1 AT1G74960 2 C, E -0.900625098 0.018662491 15

FAB2 AT2G43710 1 A, C, E -1.027371065 0.017887013 11

FAD2 AT3G12120 1 A, E -3.136681738 0.042268888 6

FAD8 AT5G05580 1 A, E 1.681796277 0.030616556 N/A

FATA AT3G25110 1 C 2.653147656 4.80035E-06 1

KCR1 AT1G67730 1 A, D, E 1.049408074 0.01214747 6

KCS1 AT1G01120 1 D 1.027052723 0.045393104 4

KCS10 AT2G26250 2 D 1.077585019 0.003067057 5

LACS9 AT1G77590 2 B, C, E -1.780734298 0.000111006 11

MFP2 AT3G06860 1 B, E -4.415321265 1.28306E-19 N/A

Zinc binding alcohol

dehydrogenase family protein AT5G42250 1 B 1.29067737 0.007552823 N/A

A = Biosynthesis of unsaturated fatty acids; B = Fatty acid degradation; C = Fatty acid biosynthesis; D = Fatty acid elongation; E =

Fatty acid metabolism; N/A = gene has not been assigned to a specific chromosome in the genome assembly

93

Table 4.3 – Fatty acid QTL co-localization with differentially expressed isoforms

Trait Chromosome cM One LOD interval # of DE isoforms # of DE physical positions DE FA isoforms

% Palmitic acid 6 26.3 18.4-41.7 12 10 -

% Palmitic acid 17 39.6 33.8-43.1 7 3 -

% Stearic acid 6 24.3 20.3-28.3 3 3 -

% Stearic acid 10 50.7 44.7-62.2 7 6 -

% Oleic acid 1 19.5 3.3-25.5 58 49 FATA

% Oleic acid 3 63.3 41.9-69.3 37 26 -

% Oleic acid 6 22.3 16.4-28.3 4 4 -

% Linoleic acid 3 47.9 39.9-61.2 34 25 -

% Linoleic acid 6 24.3 18.4-28.3 4 4 -

All QTL from Burke et al., 2005

94

Table 4.4 – Gene ontology enrichment terms

Name GO ID Corrected P value

Response to stress GO:0006950 0.00009037

Defense response GO:0006952 0.0002111

Iron ion transport GO:0006826 0.0003512

Embryo development GO:0009790 0.005188

Lipid transport a GO:0006869 0.00503

Transition metal ion transport GO:0000041 0.007719

RNA metabolic process GO:0016070 0.009252

Multicellular organismal development b GO:0007275 0.009424

Nucleic acid metabolic process GO:0090304 0.01467746

Gene expression GO:0010467 0.03542668 a Same GO category as lipid localization (GO:0010876);

b same GO category as single-

multicellular organism process (GO:0044707). All terms significant at a level of FDR < 0.01.

95

Figure 4.1 - Frequency histogram of sequencing effort across 20 RNA-seq libraries in wild

sunflower

0

1

2

3

4

5

6

<)1 1)*2.5 2.5)*5 5)*7.5 7.5)*10 10)*12.5 12.5)*15 20)*30 >30

Number of

libraries

Number of fragments sequenced in millions

96

Figure 4.2 – Plot of log fold change in gene expression between Texas and Canada across the

sunflower genome. Black dots represent isoforms that were not significantly differentially

expressed. Red dots represent significantly differentially expressed genes. Positive log fold

change indicates higher expression in Texas relative to Canada.

97

CHAPTER V

CONCLUSIONS

In order to understand the process of local adaptation at the molecular level it is crucial to

both identify phenotypically divergent populations and then characterize their molecular

variation to elucidate the changes that closely associate with the respective populations. In short,

a modern study of local adaptation requires one to characterize the extent and distribution of

heritable genetic variation in the form of common garden phenotypes, DNA genotypes, and gene

expression patterns. The extent to which a locus shows elevated structure among populations

may suggest an adaptive role for that particular genomic region (Beaumont and Nichols 1996).

In sunflower, the large size of the natural range makes the species amenable to studies of

adaptation. In particular, the availability of genomic resources in the related crop species further

solidifies this group as an ecological model system.

Across a latitudinal transect extending from Texas to Canada, I found a tremendous

amount of phenotypic diversity within the species. Flowering time was structured by latitude in

which northern populations tend to flower earlier compared to southern populations. This general

pattern could be an adaptation related to growing season. Environments closer to the equator

experience longer stretches of favorable growing conditions, which may have caused populations

to become differentiated with respect to the transition to reproductive growth. The short growing

season in high latitudes may have selected for variants that promote quick growth before adverse

conditions set in. The pressures at high latitudes could include changes in pollinator abundance

that track climate variation.

98

Latitude was related to other important traits such as the percent of seed oil that contains

saturated fatty acids. As sunflower is an oil seed crop, there is a general interest in understanding

the genetic variation underlying this trait. In wild sunflower I corroborated and further extended

a phenotypic survey of fatty acid variation across a North American transect. Both my work, and

that of Linder (2000), have shown that northern populations have significantly less saturated

fatty acids than southern populations. My work extended this result by fine scale sampling

throughout the continent, which revealed that the difference in saturated fatty acids between

Texas and Canada is rather gradual. The selective pressure that generates the above pattern may

relate to germination temperatures. The warmer temperatures in Texas may have selected for

individuals that produce more of the high energy, high melting point saturated fatty acids.

Relatively cool germination temperatures in Canada result in plants that have made a

compromise: they produce more unsaturated fatty acids, which have lower energy associated

with them yet due to their low melting point that energy is accessible at low temperatures.

Population genetic variation mirrored the distributions of flowering time and fatty acid

composition traits. Nearly 250 markers identified a broad north south structuring of genetic

variation. The identification of population structure allowed me to then test for which loci were

exceptionally more differentiated than one would expect by chance. By performing FST outlier

analyses I was able to identify eight genes that showed an elevated structure relative to the

genome-wide average. A gene identified by the approach, FT2, has numerous additional lines of

evidence suggestive of its role in adaptation including QTL and gene expression analyses. This

investigation clearly set the stage for additional investigations of genetic regions for harboring

the genetic signature of local adaptation.

99

Population genomic protocols such as Genotyping-by-Sequencing (GBS) have allowed

researchers to genotype many individuals at many loci (Elshire et al., 2011). Techniques like

GBS are revolutionizing the way population geneticists go about detecting the molecular changes

conferring adaptation. My work has used markers derived from a modified GBS protocol to

perform a higher density sampling of wild sunflower genomes. By specifically targeting a subset

of populations from my original sampling, one population each from Texas, Nebraska, Canada, it

was possible to identify over 11,000 SNP markers. These additional markers were used to assess

the levels of population genetic variation, structure, and to identify elevated genetic structure

throughout the genome. In terms of population genetic structure, all populations were clearly

separated from one another as determined by a STRUCTURE analysis (Pritchard et al., 2000).

However, the Nebraskan population was found to be slightly more similar in overall allele

frequencies to the Canadian population in terms of various estimates of genetic structure.

As these populations were previously genotyped with the above SNP chip markers, it is

important to consider the extent to which these datasets identify similar and/or different patterns

of genetic variation. Despite the vast differences in marker number, both datasets identified

similar population genetic structuring of variation. Specifically, by re-running STRUCTURE on

only the SNP chip data from the three populations that eventually were GBS genotyped, I found

the same pattern in which three genetic clusters were most likely, and those closely corresponded

to the initial USDA sampling were identified. This was an important result because it indicated

that the two classes of markers were capable of identifying the same type of genetic structure. A

second consideration with the newly developed GBS markers was whether they would identify

the same genomic regions as having elevated genetic structure. On the whole, it was found SNP

chip outliers generally were not physically close to GBS outliers. The closest instances of co-

100

localization of FST outliers between the two datasets involved polymorphisms over one megabase

apart from one another. It is necessary then to consider why these two datasets did not identify

the same genomic regions as being relevant to local adaptation. One possible reason could be a

lack a lack of compatible restriction cut sites nearby the SNP chip polymorphisms. Alternatively,

many loci were dropped from the analysis due to small sample sizes in one of the three

populations. It is therefore possible that a SNP chip outlier could have been re-identified in my

analysis had libraries been sequenced to a greater depth. Another possible explanation would be

the extent of populations sampled with the two technologies. The SNP chip based analysis of

local adaptation addressed pairwise FST between all combinations of the 15 latitudinally

distributed populations. This is in stark contrast to the GBS work that targeted three populations,

and thus sampled a much reduced pool of genetic variation. Despite the two markers classes not

identifying the same genomic regions as being important for adaptation as described above, it is

certainly true that the GBS markers facilitated a much broader sampling of the sunflower

genome compared to the initial 246 SNPs used previously.

SNPs identified in the GBS dataset were found to co-localize with a number of QTL for

both flowering time and fatty acid biosynthesis traits. These co-localizations suggest that these

genomic intervals, rather than the individual SNPs, may be important in sunflower adaptation to

the environment. When analyzing the annotations of the genes with outlier SNPs I found that

three of the genes had roles in flowering time. A NAC domain containing protein 52 homolog,

ANAC52, is a highly differentiated gene found to co-localize with a QTL on chromosome 9.

Interestingly this gene has been shown to interact with FT in A. thaliana. As FT was already

shown to play a role in sunflower adaptation, this candidate presents an interesting situation in

which sequential genes in a developmental pathway show elevated differentiation. In a parallel

101

fashion, a homolog of Early is Short Days 7, a flowering time gene, has both elevated

differentiation, and affects the expression of FT. Thus, the increased marker density has

drastically improved my ability to detect whether or not a genomic region has elevated

differentiation. Of course, a more detailed understanding of the extent of linkage disequilibrium

will be required to more conclusively state whether these candidates are the targets of selection,

or linked to the true selected locus. Despite the increased marker density, none of the outlier

genes were annotated as playing a role in fatty acid biosynthesis.

In order to understand the genetics of fatty acid profile based adaptation I took a targeted

approach that focused on the extremes of the natural range, Texas and Canada. Using RNA-seq

to assay the levels of gene expression of developing sunflower seeds I sought to identify

differentially expressed genes that may contribute to this ecologically and agronomically

important trait. Upon analyzing the gene ontology terms of all genes with respect to the extent of

differential expression, I found a number of over represented terms. In particular, lipid transport

was over-represented, which make sense given I sampled developing seed tissue in an oil

producing species.

I found that a number of isoforms related to fatty acid biosynthesis, modification, and

degradation were differentially expressed between the extremes of my latitudinal transect. In

particular, a homolog of FAD2 was more highly expressed in Canada compared to Texas, a

pattern which is consistent with observed phenotypes and theory. Interestingly, additional genes

responsible for breaking down unsaturated fatty acids fatty acids were more highly expressed in

southern individuals. These patterns may suggest that the observed phenotypic differences in

saturated fatty acid percentages reflect both differential production, as well as differential break

down of fatty acids. Future experiments that specifically look at time course sampling of

102

developing seeds may yield a more nuanced understanding of how gene expression is affecting

differential seed phenotypes in wild sunflower populations. The differential expression analyses

provided a means for testing the potential functional relevance of FST outliers identified earlier.

Specifically, I found that two genes, a kinesin motor protein and a phosphoenolpyruvate

carboxylase, that were both a FST outlier and differentially expressed. This result highlights the

importance of pairing population genome scans with RNA-seq experiments. Since both GBS and

RNA-seq do not require a priori information, these techniques are particularly attractive when

working in non-model systems.

Together, these investigations highlight the extent of intraspecific variation in H. annuus.

I have shown that traits, allele frequencies, and gene expression patterns exhibit significant

differences with respect to latitude, and by association, climate. Molecular studies of adaptation

in wild species are an important step in understanding how the species adapt to a changing

environment. The variants identified in this work may be the types of changes that are selected

on in the coming years in the context of higher latitudes becoming warmer, and thus more like

lower latitudes. Furthermore, the introgression of these adaptive genomic regions into cultivated

germplasm could provide a potential to extend the production areas of sunflower and/or maintain

current yields in novel climatic conditions. The GBS outliers and differentially expressed genes

are clear candidates for future target experiments to further clarify the precise genetic mechanism

of adaptation in wild sunflower.

References

Beaumont MA, Nichols RA (1996) Evaluating loci for use in the genetic analysis of population

structure. Proceedings of the Royal Society B-Biological Sciences 263: 1619-1626.

103



e19379.


distribution of saturated and unsaturated fatty acids in seed oils. American Naturalist 156:

442-458.

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus

genotype data. Genetics 155: 945-959.

104

APPENDIX A

Supporting information for chapter II

Supplemental table 2.1 – SNP genotypes for 286 individuals at 246 loci.

Supplemental table 2.2 – Raw trait values for 286 common garden grown individuals

Supplemental table 2.3 – Results of REML analysis of phenotype data. Each tab contains

statistical results for a single trait. Additionally, the result of a Tukey’s test to determine

significant regional effects is presented on the right portion of each sheet.

* Note that supplemental tables 2.1, 2.2, and 2.3 are Excel documents that will be uploaded to

the journal website

Supplemental figure 2.1 – Delta K plot of STRUCTURE results. STRUCTURE harvester

results indicate K=2 as being the most likely number of groupings within the full dataset.

105

Supplemental figure 2.2 – STRUCTURE bar plot of southern regions. Populations correspond

to those in Table 1.

TX1 TX2 TX3 TX4 TX5 OK1

106

Supplemental figure 2.3 – Delta K plot for southern STRUCTURE plot found in Supplemental

figure 2.2.

107

Supplemental figure 2.4 – STRUCTURE bar plot of northern regions. Populations correspond

to those in Table 1.

MT1 MT2 ND1 SAS1 SAS2 SAS3

108

Supplemental figure 2.5 – Delta K plot for northern STRUCTURE plot found in Supplemental

figure 2.4.

109

Supplemental figure 2.6 – STRUCTURE bar plot corresponding to K = 6 for the six

populations with the southern two regions.

110

APPENDIX B

Supporting information for chapter III

Supplemental figure 3.1 – Sequencing effort per library

<1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 >10

Library sequencing distribution

Reads X 10^6

To

tal

02

46

810

12

14

111

Supplemental figure 3.2 – SNP density per chromosome

0

200

400

600

800

0e+00 1e+08 2e+08 3e+08

Length

Nu

m_

Sta

cks

112

Supplemental figure 3.3 – STRUCTURE delta K plot

113

Supplemental figure 3.4 – STRUCTURE bar plot based on previous SNP genotyping.

114

Supplemental figure 3.5 – STRUCTURE delta K plot for the three populations genotyped with

a SNP chip.

115

APPENDIX C

Supporting information for chapter IV

Supplemental table 4.1 – Top ten most differentially expressed nuclear isoforms with higher expression in Canada

Gene ID At ID logFC FDR Chromosome

NHL domain-containing protein AT1G23880 -7.209152704 4.63548E-40 4

DNAse 1 - like AT3G58580 -5.338613346 1.75265E-38 N/A

Phototropic-responsive NPH3 family protein AT5G47800 -5.504416002 7.35257E-33 2

Polyadenylate-binding protein RBP45B AT1G11650 -8.650369689 6.36317E-31 8

Serine carboxypeptidase-like 49 AT3G10410 -7.292796909 2.50579E-30 8

Polyadenylate-binding protein RBP45B AT1G11650 -7.603261431 5.51519E-28 8

Hypothetical protein AT2G18100 -6.1607632 6.70907E-28 5

Phototropic-responsive NPH3 family protein AT5G47800 -5.177943316 2.30409E-27 2

DNAse 1 - like AT3G58580 -4.4200042 2.32101E-27 N/A

Serine carboxypeptidase-like 49 AT3G10410 -6.939534621 2.36725E-27 8

116

Supplemental table 4.2 – Top ten most differentially expressed nuclear isoforms with higher expression in Texas

Gene ID AT ID logFC FDR Chromosome

GRAM domain family protein AT5G13200 7.062037115 9.66701E-22 N/A

Tubby-like protein 8 AT1G16070 4.949754276 1.81253E-21 9

GRAM domain family protein AT5G13200 6.679200448 2.07451E-21 N/A

Hypothetical protein AT1G70750 3.768337654 4.38976E-16 12

Kunitz family trypsin and protease inhibitor protein AT1G17860 3.588530937 9.48592E-13 7

TIR-NBS-LRR class disease resistance protein AT4G16920 2.422635816 5.13868E-12 5


Small heat shock protein 23.6 AT4G25200 4.907818042 1.4306E-10 3


Sulphur deficiency-induced 1 AT5G48850 4.121138197 1.71722E-09 13

117

Supplemental table 4.3 - Auto-correlation analysis of physical genome position and expression

similarity

Chromosome r p-value1

1 0.007319802 0.28

2 -0.02492382 0.958

3 0.008872539 0.285

4 -0.01529262 0.843

5 0.007694233 0.261

6 -0.009814005 0.653

7 -0.002112057 0.563

8 0.01544674 0.175

9 0.00214163 0.417

10 -0.005591531 0.725

11 0.03373421 0.019

12 -0.01681713 0.849

13 0.00475819 0.398

14 0.01102912 0.281

15 0.02180371 0.097

16 0.01854987 0.1

17 0.02069085 0.095

1 New critical value = 0.05/17 = 0.003

118

Supplemental figure 4.1 – Differentially expressed fatty acid isoforms 1. Gray libraries (1-8)

represent southern individuals and white libraries (9-19) represent northern individuals. Isoforms

A B

C D

E F

119

represent sunflower homologs of the following A. thaliana genes: A) KCR1, B) FAB2, C)

FAD8, D) FAD2, E) ADH1, F) ADH1.

120



A B

C D

E F

121

represent sunflower homologs of the following A. thaliana genes: A) ADH1, B) ADH1, C)

ADH1, D) ADH1, E) LACS9, F) LACS9.

122



E

A B

C D

F

123

represent sunflower homologs of the following A. thaliana genes: A) MFP2, B) Zinc binding

alcohol dehydrogenase family protein, C) AACT1, D) FAB1, E) FAB1, F) FATA.

124



125

represent sunflower homologs of the following A. thaliana genes: A) BCCP, B) CAC2, C) KCS1,

D) KCS10, E) KCS10, F) alpha/beta hydrolase super family.

126



represent sunflower homologs of the following A. thaliana genes: A) ECI3, B) ECI3.

A B

range-wide studies of genetic and transcriptomic …

Documents