an integrated transcriptome atlas of the crop model...
TRANSCRIPT
An integrated transcriptome atlas of the crop modelGlycine max, and its use in comparative analyses in plants
Marc Libault1,*, Andrew Farmer2, Trupti Joshi3, Kaori Takahashi1, Raymond J. Langley2, Levi D. Franklin3, Ji He4, Dong Xu3,
Gregory May2 and Gary Stacey1
1Division of Plant Sciences, National Center for Soybean Biotechnology, C.S. Bond Life Sciences Center,
University of Missouri, Columbia, MO 65211, USA,2National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA,3Computer Science Department, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA, and4Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, OK 73401, USA
Received 18 January 2010; revised 25 March 2010; accepted 31 March 2010; published online 14 May 2010.*For correspondence (fax +573 884 9676; e-mail [email protected]).
SUMMARY
Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be
converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The
sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to
examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived
from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a
browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcrip-
tion of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative
pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals
strong differences in gene expression patterns between different tissues, especially between root and aerial
organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs.
In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated
in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR.
The availability of the soybean gene expression atlas allowed a comparison with gene expression documented
in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for
Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.
Keywords: soybean, gene expression atlas, comparative genomic, transcription factors, nodulation.
INTRODUCTION
After grasses, legumes are the most economically impor-
tant plant family based on their consumption in human and
animal nutrition. In addition, the use of legumes in biofuel
production will further increase the economic impact of this
plant family. These characteristics justify a substantial effort
by the research community to better understand legume
biology. An attribute of most legumes is the development of
a symbiotic interaction with soil bacteria (rhizobia) that fix
and assimilate atmospheric dinitrogen (atmN2). This symbi-
osis is based on the chemical recognition of diffusible sig-
nals by both partners, which determines the specificity of
the interaction (Oldroyd and Downie, 2008). For example,
the recognition of the lipo-chitin Nod factor, produced by
rhizobia, by the root hair cells of the compatible host leads
to plant morphological and biochemical changes (e.g. root
hair cell curling, cortical cell division, induction of Nod
factor-responsive plant genes and calcium spiking in root
hair cells). These changes are the first signs of the devel-
opment of a new plant organ, the nodule, where the bac-
teria differentiate into bacteroids and reduce atmN2. In
exchange, the plant provides a steady supply of carbon to
the bacteroids.
As part of the effort to better understand legume biology,
the genome sequences of three legume species are now
complete, or nearly complete: that is, Lotus japonicus
(Lotus; http://www.kazusa.or.jp/lotus), Glycine max (soy-
bean; http://www.phytozome.net/soybean) and Medicago
truncatula (Medicago; http://www.medicago.org/genome).
Schmutz et al. (2010) recently described the complete soy-
bean genome sequence. In each case, a large number of
86 ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd
The Plant Journal (2010) 63, 86–99 doi: 10.1111/j.1365-313X.2010.04222.x
genes were predicted. The availability of these genome
sequences now enables a variety of functional genomic
methods to characterize these genes and their related
functions. For example, large-scale cDNA sequencing tech-
nologies [e.g. 454 Life Sciences (Margulies et al., 2005) and
Illumina Solexa platforms (Bennett et al., 2005)] provide a
means to accurately profile gene expression (e.g. Libault
et al., 2010). In the past, gene expression atlases were
established in Arabidopsis thaliana (Schmid et al., 2005),
Oryza sativa (Nobuta et al., 2007; Jiao et al., 2009), M. trun-
catula (Benedito et al., 2008) and L. japonicus (Hogslund
et al., 2009) by using massive, parallel-signature sequencing
and array-hybridization technologies.
In this study, the high-throughput Illumina Solexa
sequencing platform was used to develop a gene expression
atlas of the soybean genome. cDNAs derived from a total of
nine different soybean tissues were sequenced. Included in
the soybean gene atlas are five additional data sets,
described by Libault et al. (2010), for a combined total of
14 different conditions (tissues). This provides an unprece-
dented coverage of the transcriptome, including documen-
tation of expression from annotated pseudogenes and
unannotated genes, and also provides accurate quantifica-
tion of low abundant transcripts (Cheung et al., 2006; Weber
et al., 2007; Libault et al., 2010). To demonstrate the utility of
the soybean gene expression atlas, we focused specifically
on expression in root hair cells, as well as on meristem-
specific genes and expression of transcription factor (TF)
genes. The results from the soybean gene expression atlas
were also compared with previously published expression
data from A. thaliana, M. truncatula and L. japonicus. For
example, the comparison to the well-annotated A. thaliana
genome identified putative soybean genes involved in the
determination of floral organs and the maintenance of the
shoot apical meristem (SAM). The availability of the soybean
gene expression atlas should facilitate additional studies on
the basic biology of soybean, while also supporting applied
research to improve soybean agronomic performance.
RESULTS AND DISCUSSION
Sequence-based transcriptome atlas of soybean:
an overview
We used the Illumina Solexa sequencing platform to quan-
tify the expression of soybean genes (i.e. the number of
sequence reads/million reads aligned) in nine different
conditions: root hair cells isolated 84 and 120 h after sowing
(HAS), root tip, root, mature nodules, leaves, SAM, flower
and green pods. Our choice to include root hair cells isolated
at two different time points in this analysis was motivated by
the changes in their transcriptome during development
(Libault et al., 2010). Between 4.18 and 6.84 million reads of
around 36 bp were generated for each of the nine condi-
tions. Among them, 45.8–82.6% of the reads aligned with
less than five loci on the soybean genome (Table 1). Such
variation resulted from the high and low numbers of
unaligned and repetitive reads (i.e. from matches with more
than five loci) in pod (54.2% of the total reads) and flower
samples (17.4% of the total reads), respectively. We classi-
fied the sequence reads aligned with less than five loci on
the soybean genome into two different groups based on the
number of matches identified against the soybean genome
[i.e. non-unique reads (from two to five loci) and unique
reads (only one soybean locus); Table 1]. To insure accuracy
in the quantification of expression in the different tissues
tested, only the sequence reads matching uniquely against
the soybean genome were used. A total of 51 529 annotated
soybean genes (74.5% of the 69 145 putative, annotated
soybean genes) were found to be expressed in at least one
condition (Table S1). Included in the present analysis are
five additional data sets described by Libault et al. (2010) –
i.e. root hairs harvested 12, 24 and 48 h after Bradyrhizobi-
um japonicum inoculation (HAI); 24-HAI mock-inoculated
root hairs; and 48-HAI inoculated stripped roots (Table S2) –
resulting in the documentation of expression for a total of
52 947 annotated genes. No gene expression in any of the 14
conditions was detected for 16 198 annotated genes,
Table 1 Distribution of Illumina-Solexa36-bp reads according to their alignmentagainst the Glycine max (soybean)genome Sample
G. maxunique
G. maxnon-unique(2–5 matches)
Unaligned andhighly repetitivereads (>5 matches) Total reads
Root tip 3 235 689 850 750 1 068 142 5 154 581Root 3 790 433 884 257 1 432 754 6 107 44484-HAS root hairs 2 828 246 719 626 2 063 637 5 611 509120-HAS root hairs 4 086 965 1 052 457 1 698 787 6 838 209Nodule 3 401 083 936 037 1 999 389 6 336 509Leaves 2 813 916 1 202 914 1 279 012 5 295 842Shoot apicalmeristem
3 947 566 1 041 894 1 488 700 6 478 160
Flower 3 372 444 902 730 901 116 5 176 290Green pods 1 462 809 453 340 2 268 639 4 184 788
HAS, h after sowing.
Soybean transcriptome atlas 87
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
suggesting that these genes were not expressed, were
expressed at a level below our detection limit or were
expressed only under highly restricted conditions (Table
S2). The data also shows expression from 7314 different
soybean loci currently lacking any gene annotation (Table
S2). Considering only the nine conditions sequenced as part
of the current study, the data demonstrate expression from
7174 currently unannotated regions (Table S1). A number of
root hair genes were found to be specifically expressed upon
inoculation with B. japonicum, as documented by Libault
et al. (2010).
The soybean genome annotation, as described by
Schmutz et al. (2010), refers to 46 430 soybean genes
predicted with high confidence, with the remaining genes
predicted with low confidence. We compared our gene list
for which no detectable expression was found across 14
conditions with the list of low-confidence genes. From the
list of 16 198 putative genes lacking expression, 12 673
(78.2%) were predicted with low confidence in the current
soybean genome annotation (Table S3). The presence of an
expressed sequence tag (EST) or full-length cDNA sequence
led to the annotation of the remaining 3525 genes with high
confidence (Table S3). Having reviewed the conditions in
which these 3525 transcripts were detected, we conclude
these genes were expressed under highly restricted condi-
tions, such as at very specific stages of organ development
or in specific response to abiotic stress, such as drought
stress. Therefore, it is likely that most of the 12 673 low-
confidence genes, which lack expression, are pseudogenes.
Soybean is an allotetraploid that has undergone at least
two rounds of whole genome duplication, with the most
recent having occurred approximately 13 Mya (Schlueter
et al., 2004, 2007; Gill et al., 2009). In a previous study, we
demonstrated cases in which the homeologous gene pairs
showed significant divergence in their expression (Libault
et al., 2010). In order to examine this on a whole genome
basis, we established syntenic relationships between 19 533
annotated genes (28.2% of the annotated soybean genes) to
establish their homeology (Table S4). Among the 12 673
predicted pseudogenes, we identified homeologs expressed
at some level in all conditions tested for only 61 (<1%;
Table S5). Such results are consistent with current theories
of gene evolution, where, after whole genome duplication,
gene fates include silencing or neofunctionalization of one
of the two copies (Adams, 2007).
A number of sequence reads matched against the 7314
loci currently lacking gene annotation (Table S2). The
majority of these loci (7127) were found in regions assem-
bled as part of the chromosome pseudomolecules, whereas
the remainder (187) were located on currently unanchored
scaffolds. In a previous study, we demonstrated the use of
high-throughput cDNA sequencing to improve the current
soybean genome annotation (Libault et al., 2010). Therefore,
we mined 20 kbp of the genomic DNA sequence around
each of the 7127 regions found to have gene expression.
Using FGENESH, we predicted putative protein-coding
genes for 6059 of the 7127 loci (85%). Among them, 4323
of the gene predictions overlapped existing annotated
genes, resulting in the 5¢ or 3¢ expansion of the currently
annotated cDNA sequences (Table S6). The remaining 1736
genes predicted by FGENESH did not overlap currently
annotated genes, suggesting the existence of new protein-
coding genes. We used Interproscan (Zdobnov and Apwe-
iler, 2001) software to identify the signature domains of the
encoded proteins: 542 and 1194 genes encode protein with
and without conserved domains, respectively (Table S7).
Altogether, our analysis suggested that 57 352 soybean
genes are transcribed (i.e. 55 616 out of the 69 145 putative
genes in the current, published soybean genome annota-
tion; the remaining 13 529 are putative pseudogenes, plus
1736 newly annotated genes).
Tissue-specific gene expression
Benedito et al. (2008) noticed large differences in the tran-
scriptome between one M. truncatula organ compared with
another, based on a number of DNA microarray hybridiza-
tions. Similarly, Schmid et al. (2005) and Aceituno et al.
(2008) concluded that the A. thaliana transcriptome strongly
varied from one organ to another. These studies suggest
that the identity of specific plant organs is derived from the
respective transcriptome. In soybean, across the nine tis-
sues tested in the current study, the number of annotated/
unannotated sequences transcribed was similar from one
tissue to another (min. 52.4% in pod; max. 61.2% in the
SAM; Table 2). Altogether, these percentages were slightly
lower than those reported in M. truncatula (55–63%;
Benedito et al., 2008) and A. thaliana tissues (55–67%;
Schmid et al., 2005). Such differences might be a direct
consequence of the non-negligible number of putative
pseudogenes mentioned above, and might also reflect the
residual background or cross-hybridization existing when
using array hybridization technology. A similar number of
soybean genes were expressed in a single cell type (root
hair) and in multicellular organs (e.g. 45 717, 40 034, 43 377
and 46 173 soybean genes were expressed in flower, pod,
84- and 120-HAS root hair cells, respectively; Table 2). Jiao
et al. (2009) previously reported that transcripts undetect-
able in cDNA derived from shoot, root or germinated seeds
could be detected if mRNA was sampled from a single cell
type from this organ. Therefore, we hypothesize that the
heterogeneous population of differentiated cells composing
a soybean organ results in a larger diversity of expressed
sequences, but also in the poor detection of low-abundance
transcripts. In contrast, cDNA derived from the single cell
root hairs allows for the detection of low-abundance tran-
scripts, because of a lack of dilution from other tissues, and
the homogeneity of the tissue sampled. Apparently, these
opposing factors result in approximately the same number
88 Marc Libault et al.
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
of transcripts sequenced from a single cell type and multi-
cellular organ samples.
To better establish the identity of the different soybean
tissues, we generated a heat map based on the correlation
between their transcriptomes (Figure 1a). Based on this
map, the nine organs can be divided into three different
groups: (i) root tip, root and root hairs; (ii) SAM, pod, flower
and leaf; and (iii) nodule. The lack of correlation between
root-related tissues and aerial organs was previously
reported by Benedito et al. (2008) in M. truncatula. These
results are likely to reflect the divergence in function
between the root and aerial portions of the plant. Consistent
with this notion, other tissues show significant overlap in
their transcriptomes. For example, gene expression in the
soybean pod and SAM was strongly correlated (Figure 1a).
The transcription profile can also reflect development. For
example, the flower and leaf transcriptomes were closely
correlated. In 1790, Goethe hypothesized that floral organs
were modified leaves (Coen, 2001). Indeed, four MADS-box
TF genes named SEPALLATA1–4 (SEP1, SEP2, SEP3 and
SEP4, previously named AGL2, AGL4, AGL9 and AGL3) were
characterized for their role in the acquisition of floral organ
identity, as sep mutants develop leaf-like organs instead of
flowers (Honma and Goto, 2001; Pelaz et al., 2001; Ditta
et al., 2004). These results suggest that organ-specific gene
expression could be the result of the action of relatively few
regulatory genes.
The soybean nodule transcriptome showed little correla-
tion with other organs, with the exception of mature roots. It
is interesting to note that the soybean root hair transcrip-
tome was not strongly correlated with that of the whole root,
nor with any of the other soybean tissues analyzed (Fig-
ure 1a). This is likely to reflect the specialization of this single
cell type, but also the tissue dilution that occurred by
sampling the other organs, especially the roots.
In a previous study, Aceituno et al. (2008) showed that the
Arabidopsis organ transcriptomes were not strongly
affected in response to environmental changes. Therefore,
the unique transcriptomic patterns exhibited by the various
soybean organs are likely to reflect their unique identity, and
are not the result of specific environmental conditions.
Therefore, in order to better understand soybean organ
development, we analyzed the soybean gene expression
atlas to identify those genes that were ubiquitously
expressed across the nine tissues, and those showing a
very high level of tissue-specific expression. The results of
this analysis showed that 58 703 soybean genome loci,
including both annotated and unannotated regions, were
expressed in at least one of the nine soybean tissues.
Roughly half of these genes (28 374) were transcribed
ubiquitously (Table S8). In theory, organ identity could
depend on both the level of expression of ubiquitously
expressed genes and the organ-specific expression of
selected genes. To address this issue, we first compared
the overall expression levels of the 28 374 ubiquitous genes
between the nine conditions (Figure 1b). As shown in
Figure 1, this analysis revealed significant differences in
the absolute expression levels of the 28 374 ubiquitously
expressed genes. These data also leave the impression that
few, if any, soybean genes are stably expressed in the
various soybean tissues. In order to examine this directly,
we included the additional five conditions from the publi-
cation by Libault et al. (2010) to define genes constitutively
expressed by the following criteria: (i) the gene was
expressed in all 14 conditions tested; (ii) the fold change in
the relative expression levels was not higher than three
between conditions where genes were the most and the
least expressed. These criteria identified 2532 putative
constitutive genes (Figure S1; Table S9). Among these,
PFAM, KOG or PANTHER conserved domains were identi-
fied for 2187 genes, leading to the identification of 140 TF
genes [2.5% of the 5671 predicted TF genes in the soybean
genome; Schmutz et al., 2010; Libault et al., 2009a; PFAM,
KOG and PANTHER domain predictions are available from
ftp://ftp.jgi-psf.org/pub/JGI_data/Glycine_max/Glyma1/Gly-
ma1_domains). Such a relatively low number is a direct
reflection of the specific role of TF genes in the determina-
tion of plant organ identity.
Table 2 Distribution of expressed and notexpressed annotated and unannotatedsequences across nine Glycine max (soy-bean) tissues
Number of expressed sequencesNumber of silenced sequences(i.e. no transcript detected)
Annotatedsequences (%)
Unannotatedsequences (%)
Annotatedsequences (%)
Unannotatedsequences (%)
Root hair 84 HAS 38 645 (50.54) 4732 (6.19) 30 500 (39.89) 2582 (3.38)Root hair 120 HAS 40 849 (53.43) 5324 (6.97) 28 296 (37.01) 1990 (2.60)Root tip 36 882 (48.24) 4624 (6.05) 32 263 (42.20) 2690 (3.52)Root 40 576 (53.07) 5126 (6.71) 28 569 (37.37) 2188 (2.86)Nodule 36 369 (47.57) 4438 (5.81) 32 776 (42.87) 2876 (3.76)Leaf 37 600 (49.18) 4518 (5.91) 31 545 (41.26) 2796 (3.66)Shoot apical meristem 41 415 (54.17) 5341 (6.99) 27 730 (36.27) 1973 (2.58)Flower 40 863 (53.44) 4854 (6.35) 28 282 (36.99) 2460 (3.22)Pod 36 325 (47.51) 3709 (4.85) 32 820 (42.92) 3605 (4.72)
Soybean transcriptome atlas 89
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
We also sought to identify soybean transcripts expressed
solely in one soybean organ. These genes were classified
into four groups depending on their tissue specificity:
preferentially (‡3- and <10-fold changes between the expres-
sion levels of the most highly expressed and second most
highly expressed genes), specifically (‡10- and <100-fold
change), very specifically (‡100- and <1000-fold change) and
exclusively identified in one tissue (‡1000-fold change).
These criteria identified 5313, 1374, 147 and nine genes that
were preferentially, specifically, highly specifically and
exclusively expressed in one tissue, respectively (Figure 2a;
Table S10). Benedito et al. (2008) reported that M. truncatula
seeds and nodules possessed the largest number of tissue-
specific genes. Hogslund et al. (2009) found that L. japoni-
cus flowers exhibited the highest degree of tissue-specific
gene expression. In soybean, the largest numbers of tissue-
specific genes were identified in nodules and flowers (1465
and 1145 genes, respectively; ‡3-fold change; Figure 2b).
Using more stringent parameters, soybean nodule, flower
and pod were the organs that were strongly enriched in
highly tissue-specific genes (61, 54 and 29 genes, respec-
tively; ‡100-fold change; Figure 2b). Given the lack of
correlation in overall gene expression between the nodule
transcriptome and the other tissues sampled (Figure 1), it
was not surprising to identify this tissue among those
showing the highest level of organ-specific gene expression.
In contrast, it would appear that the correlation in the overall
level of gene expression between flowers and leaves
(Figure 1) hides a significant level of flower-specific gene
expression (1145 flower-specific genes; ‡3-fold change).
These genes are clearly strong candidates for determining
the specific functional components of the flower. The overall
soybean transcriptome was also mapped relative to the
position of the respective genes in the assembled soybean
genome. As an aid to visualization of these data, we
established a color-code map for each chromosome, and
for each tissue, to reflect the overall gene expression level
(Figure 3). These data, as well as the data from the earlier
Libault et al. (2010) study, can best be viewed as part of the
soybean genome browser available at http://digbio.
missouri.edu/soybean_atlas. Visualizing the data in this
way rapidly demonstrates that most of the protein-coding
genesandalso themoststronglyexpressedgenesare located
on the chromosome arms, whereas expression from the
less gene-dense pericentromeric regions is much reduced.
Root hair and meristem-specific soybean genes
Root hairs are single cell extensions of the root epidermis,
and play a key role in water and nutrient uptake. However,
in legumes, they play a secondary role as the primary site
for rhizobial infection, leading to the development of
nitrogen-fixing nodules. Root hairs also exhibit polar cell
expansion. In a previous study, we identified around 2000
soybean genes regulated in root hair cells in response to
B. japonicum infection (Libault et al., 2010). In order to
extend our understanding of the soybean root hair cell, we
also sought to identify genes that were specifically
expressed in root hairs. Using the same criteria outlined
above, we identified 451 soybean sequences that were
preferentially expressed in root hairs, including 69 and
three root hair-specific and highly specific genes, respec-
tively (Table S11). Using PFAM, KOG and PANTHER
domain predictions, we predicted the functions of 304 of
the 451 annotated genes. Some gene families are clearly
over-represented in this list of root hair-specific genes. For
example, cellulase (three genes, 1%), pectinesterase (four
genes, 1.3%), peroxidase (eight genes, 2.6%) and extensin
genes (four genes, 1.3%) were gene families preferentially
84HAS RH
120HAS RH
Root
Root tip
Nodule
SAM
Leaf
Flower
Pod
84HAS RH
120HAS RH
Root
Root tip
Root t
ip
Root t
ip
Nodule
SAM
Leaf
Flower
Pod
–0.75–1
–0.5–0.250
0.751
0.50.25
–0.75–1
–0.5–0.250
0.751
0.50.25
(a)
(b)
Figure 1. Comparison of the transcriptomes of various Glycine max (soy-
bean) tissues.
Ward hierarchical clustering of log2 transformed gene distribution in nine
diverse soybean organs [root hair cells isolated 84 and 120 h after sowing,
root tip, root, mature nodules, leaves, shoot apical meristem (SAM), flower
and green pods], based on Pearson correlation coefficients. The entire
soybean tissue transcriptome (a) or the 28 374 annotated soybean genes
identified to be expressed in all nine tissues (b) were used to generate two
distinct maps. The color scale indicates the degree of correlation (green, low
correlation; red, strong correlation). The heat map was generated using
JMP GENOMICS 4.0.
90 Marc Libault et al.
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
expressed in root hairs (v2 < 1 · e)50). These families rep-
resented only 0.06% (28 genes), 0.3% (144 genes), 0.4%
(205 genes) and 0.03% (16 genes) of the 47 724 soybean
annotated genes for which predicted functions were
established. It is likely that the expression of these gene
families reflects the polar growth of the root hair cells,
where continuous cell wall expansion is required, and
where reactive oxygen species are essential (Baumberger
et al., 2001, 2003; Bucher et al., 2002; Carol and Dolan,
2006).
Shoot apical and root meristems are the locations of the
intense cell division required for plant growth. We combined
the transcriptomes of these two meristematic tissues to
identify 28 soybean genes that were preferentially expressed
in the soybean meristematic zones (Table S11). Among
these, 18 genes encode proteins with conserved domains,
Annotated and unannotated sequences
Unannotated sequences
TF genes
0
1000
2000
3000
4000
5000
6000
7000
8000(a)
(b)
Tissue specificity
Gen
e n
um
ber
Fold-change
3 10 100 1000
6843
1
9
68
156213
120
1530899
6240
Gen
e n
um
ber
3 fold-change cut-off 10 fold-change cut-off
0
200
400
600
800
0
400
800
1200
1600
100 fold-change cut-off 1000 fold-change cut-off
0
2
4
6
0
20
40
60
80
Gen
e n
um
ber
Figure 2. Gene expression specificity across
nine Glycine max (soybean) tissues.
(a) All soybean transcripts (dashed grey line),
unannotated transcripts (black line) and tran-
scription factor transcripts (grey line) were clas-
sified into four groups according to their tissue
specificity: preferentially (‡3- and <10-fold
changes between the expression levels of the
most highly expressed and second most highly
expressed genes), specifically (‡10- and <100-
fold changes), very specifically (‡100- and
<1000-fold changes) and exclusively identified
in one tissue (‡1000-fold change).
(b) Distribution of the number of overall soy-
bean transcripts in the nine different soybean
tissues tested according to their level of speci-
ficity (3-, 10-, 100- and 1000-fold change cut-off).
Soybean transcriptome atlas 91
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
including three encoding a predicted kinesin, a regulator of
cytokinesis (Muller et al., 2006). In addition, eight transcrip-
tional and translational regulators (e.g. bHLH, SBP, Zf-HD
TFs; RNA polymerase subunit, PIWI and ribosomal protein)
were also preferentially expressed in soybean meristematic
zones, suggesting strong transcriptional and translational
activities, which are probably also involved in maintaining
the high cell division rate and in controlling cell determina-
tion, differentiation and elongation.
Expression pattern of soybean nodulation-related genes
A unique feature of legumes, including soybean, is their
formation of a novel root organ, the nodule, in response to
rhizobial infection. Previously, Schmutz et al. (2010) anno-
tated approximately 100 soybean genes as those predicted
to play a role in nodulation, based on an extensive review of
the nodulation literature. Among these 100 putative nodu-
lation-related soybean genes, 14 were regulated during root
Figure 3. Color code maps of gene expression across the 20 Glycine max (soybean) chromosomes.
For each chromosome, gene expression (i.e. number of sequence reads per million reads aligned: <0.5, yellow; 0.5–2, orange; 2–5, light green; 5–10, green; 10–25,
greenish brown; 25–50, brown; 50–100, brownish red; 100, red) is indicated for nine different tissues (from top to bottom: root hairs 84 h after sowing, root hairs
120 h after sowing, nodule, root, root tip, shoot apical meristem, leaf, flower and pod). The final color strip at the bottom of each chromosome represents gene
density (i.e. number of genes per 100 kbp; 0–15 or higher fi black-white). These maps were generated by using the comparative map and trait viewer (CMTV)
software.
92 Marc Libault et al.
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
hair cell infection by B. japonicum (Libault et al., 2010). An
examination of the soybean gene expression atlas showed
that only one, Glyma13g12440 (a putative GmN56 gene;
Schmutz et al., 2010), of the 100 soybean nodulation-related
genes (Table S12) was not expressed in any of the nine tis-
sues sampled. In a previous study of soybean nodulation,
Kouchi and Hata (1995) clearly identified a transcript for
GmN56. Consequently, we looked at the expression of Gly-
ma13g12490 and Glyma13g12500, two homeologous genes
to Glyma13g12440 (Schmutz et al., 2010). Both genes were
expressed and to a significantly higher level in nodules
(Figure S2; Table S12). Therefore, it is likely that the GmN56
EST identified by Kouchi and Hata (1995) arose from either
Glyma13g12490 or Glyma13g12500, and not from Gly-
ma13g12440. Of the remaining 100 putative nodulation-
related genes, 70 genes were not expressed preferentially in
nodules (£3-fold change between nodule and the eight
remaining tissues), including those encoding the putative
Nod factor receptors (NFR1a-b and NFR5a-b), and TFs known
to regulate root hair cell infection (e.g. NSP1 and NSP2)
(Table S12). The induction of the expression of these genes
during root hair infection by B. japonicum (Libault et al.,
2010), but not in mature nodules, is in agreement with their
early role during legume infection (Catoira et al., 2000; Amor
et al., 2003; Madsen et al., 2003; Oldroyd and Long, 2003;
Radutoiu et al., 2003; Kalo et al., 2005; Smit et al., 2005;
Heckmann et al., 2006; Murakami et al., 2006). The remain-
ing 29 genes were preferentially expressed in nodules (‡3-
fold change; Figure S2; Table S12). Among these, 16 and
seven genes were specifically (‡10- and <100-fold changes)
and very specifically (‡100- and <1000-fold changes)
expressed in the nodules (Figure S2; Table S12). Homeolo-
gous pairs of NIN (Glyma04g00210 and Glyma06g00240),
NIN2 (Glyma12g05390 and Glyma11g13390) and CYCLOPS
genes (Glyma01g35260 and Glyma09g34690) were
expressed specifically in soybean nodules (Figure S2; Table
S12). The role of NIN in L. japonicus nodule development
was previously noted by Schauser et al. (1999), whereas
CYCLOPS function during L. japonicus nodule development
was not clearly established (Yano et al., 2008). In addition,
consistent with their initial characterization, 23 encoded
nodulins were also expressed specifically in nodules.
Recently, Haney and Long (2009) identified seven flotillin-
like genes in M. truncatula, which are gene homologs of the
soybean nodulin GmNod53b (Winzer et al., 1999). Two of
the M. truncatula flotillin genes were induced at 24 HAI with
Sinorhizobium meliloti. Utilizing the GmNod53b sequence,
we identified only two, homeologous flotillin genes in soy-
bean (Glyma06g06930 and Glyma04g06830; e-value < e)20).
However, their expression patterns across the nine tissues
were very different. For example, Glyma04g06830 expres-
sion was not detected in any tissues, with the exception of
nodule tissue, where its transcript was barely detected.
Glyma06g06930 was strongly and primarily expressed
in nodules, but also in root hair cells uninoculated by
B. japonicum. In addition, Glyma06g06930 expression was
induced in soybean root hairs at 12 (3.7-fold change), but not
at 24 and 48 HAI, with B. japonicum (Table S2). These data
suggest that the flotillin encoded by Glyma06g06930 is likely
to be orthologous to the genes shown by Haney and Long
(2009) to be crucial to root hair infection by S. meliloti.
Expression patterns of soybean transcription factor genes
The TF genes are of clear interest because they control plant
responses to the environment, as well as developmental
pathways (for a review, see Libault et al., 2009a). For
example, our earlier study (Libault et al., 2010) identified a
number of soybean TF genes in which expression
responded to B. japonicum inoculation. Soybean genes
homologous to MtHAP2.1, MtERN and LjNIN genes, genes
controlling M. truncatula and L. japonicus nodule develop-
ment (Schauser et al., 1999; Combier et al., 2006; Middleton
et al., 2007), were clearly identified based on syntenic rela-
tionships and their nodule-specific expression (Libault et al.,
2009a,b).
The soybean gene expression atlas was mined to identify
TF genes exhibiting tissue-specific expression. This analysis
identified 624 TF genes that were expressed preferentially in
one soybean tissue compared with the eight others, includ-
ing 114, five and one TF genes, specifically, very specifically
and exclusively expressed in one tissue, respectively (Fig-
ure 2a; Table S13).
Examination of this list of 120 TF genes specifically
expressed in at least one tissue (‡10-fold change) identified
a significant number of C2H2 (Zn) and NIN-like TF genes
expressed preferentially in nodules (Figure 4). As described
above, the role of NIN-like genes in legume nodulation is
well established. However, to date, there is no functional
demonstration of a role for C2H2 (Zn) TF genes during
legume nodulation. Our data suggests that this should be
examined more closely. Members of the Homeodomain TF
family were restricted to the SAM, whereas members of the
LIM, MADS and NAC TF families were preferentially
expressed in flowers, suggesting a specific role for these
TF gene families in the normal development of these tissues
(Figure 4). In A. thaliana, a large number of MADS TF genes,
such as SEP1, SEP2, SEP3, SEP4, APETALA1 (AP1), APET-
ALA3 (AP3), PISTILLATA (PI) and AGAMOUS (AG), are key
regulators of flower development (for a review, see Robles
and Pelaz, 2005). Arabidopsis thaliana Homeodomain TF
genes, such as WUSCHEL (WUS) and SHOOTMERISTEM-
LESS (STM), are important in the formation and mainte-
nance of the SAM (Barton and Poethig, 1993; Endrizzi et al.,
1996; Laux et al., 1996; Mayer et al., 1998). Consequently, we
hypothesized that some of the soybean Homeodomain and
MADS TF genes expressed specifically in the SAM and
flower may be orthologs to WUS and STM, and to SEP1,
SEP2, SEP3, SEP4, AP1, AP3, PI and AG, respectively. In
Soybean transcriptome atlas 93
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
order to establish this orthology, we looked for syntenic
relationships between these gene families in the A. thaliana
and G. max genomes. With the exception of SEP3 and PI
genes, we identified soybean orthologs of the flower and
SAM-related Arabidopsis genes (Figure S3). In most cases,
the recent duplication of the soybean genome logically led
to the identification of two putative orthologs. More surpris-
ingly, the Glyma18g50900 gene was identified as the
potential ortholog of SEP1 and SEP2, whereas the region
encoding Glyma02g13420 was orthologous to both SEP4
and AP1. Such a surprising result suggested the gene pairs
SEP1/SEP2 and SEP4/AP1 probably diverged from common
gene ancestors before the divergence between soybean and
Arabidopsis. To provide further evidence of the orthology
between the soybean MADS and Homeodomain genes
WUS, STM, SEP1, SEP2, SEP4, AP1, AP3 and AG, we mined
the Arabidopsis gene expression data (Hruz et al., 2008) to
compare the expression profiles of the genes in both
organisms. Similarly to A. thaliana, a significant number of
soybean genes putatively involved in flower development
were strongly but not exclusively expressed in flowers
(Figure 5). Among them, four MADS genes (Gly-
ma01g08150, Glyma02g13420, Glyma04g02980 and Gly-
ma06g02990), orthologs to AtAP1, AtSEP4 and AtAP3,
were identified as specifically expressed in flowers (Fig-
ure S3; Table S13). The function of the remaining four
soybean MADS genes and seven Homeodomain genes
expressed specifically in flower and SAM needs to be
investigated. Altogether, this analysis clearly demonstrates
the usefulness of combining genome and transcriptome
comparisons to identify genes playing critical developmen-
tal roles in soybean.
Taking advantage of this analysis, and to validate the
accurate measurement of soybean gene expression by
Illumina Solexa technology, we compared the Illumina
Solexa data set with transcriptomic analyses performed on
11 soybean tissues using the previously published quanti-
tative RT-PCR primer set library, designed against more than
ABI3/VP1
AP2-EREBP
AS2 AUX-IAA-ARF
bHLH
BZIP
C2C2 (Zn) CO-likeC2C2 (Zn) Dof
C2C2 (Zn) YABBY
C2H2 (Zn)
CAMTACCAAT
DHHC (Zn)
GRAS
HomeodomainHOMEOBOXLIM
MADS
MYB
MYB/HD-like
NAC
NIN-like
SBP SRS
TPR
WRKYZf-HD
SAM(1)Pod
(2)
Nodule(11)
SAM(7)
Flower(1)
Flower(5)
Flower(8)
Pod(2)
Nodule(2)
Flower(5)
Nodule(6)
Figure 4. Distribution of Glycine max (soybean) transcription factor genes expressed specifically in one soybean tissue, based on their family membership.
The sub-pies highlight the distribution of specific transcription factor gene families in the different tissues, based on the specificity of their expression.
94 Marc Libault et al.
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
1000 soybean regulatory genes, including 652 TF genes
(Libault et al., 2009b). In virtually all cases, the qRT-PCR
results validated the measurements made by Illumina
Solexa sequencing. Full details are provided in Appen-
dix S1.
Comparison of the M. truncatula, L. japonicus and G. max
transcriptome
Glycine max, M. truncatula and L. japonicus probably
diverged around 40 Mya, reflecting the extensive micro-
synteny that exists between their genomes (Choi et al., 2004;
Cannon et al., 2006; Young and Udvardi, 2009). This rela-
tionship provides opportunities to transfer genetic knowl-
edge between these three species. However, such
comparisons also need to allow for divergence in the
expression patterns of orthologous genes during legume
evolution, especially given the more recent whole genome
duplication in soybean (Schlueter et al., 2004, 2007), and the
silencing of homeologous genes (Libault et al., 2009a,b;
present study). Consequently, the use of orthology to
deduce common function among the three legume species
will not only require the establishment of a syntenic
relationship, but also the demonstration of similar gene
expression patterns. This is further evidence for the utility of
gene expression atlases for these three species.
The majority of gene expression data available for
M. truncatula and L. japonicus come from a variety of
Affymetrix microarray experiments. Therefore, as a first
step to compare gene expression from these two species
with that of soybean, we sought to identify the ortholo-
gous genes present on the M. truncatula and L. japonicus
Affymetrix arrays, and their counterparts in soybean. To
simplify this analysis, we focused on the 147 annotated
soybean genes expressed very specifically in only one
Root tip
(b)
(a)
Root hair84HAS
Root hair120HAS
Root
Nodule
SAM
Leaf
Flower
Pod
Arabidopsis thalianaCallusCell culture/primary cell
Sperm cellSeedling
CotyledonsHypocotylRadicleImbibed seed
InflorescenceFlowerSiliqueSeedStemNodeShoot apexCauline leaf
RosetteJuvenile leafAdult leafPetioleSenescent leafHypocotylLeaf primordiaStem
RootLateral rootRoot hair zone
Elongation zoneEndodermisEndodermis + cortexEpid. atrichoblastsLateral root capStele
Root tip
00.91.82.73.64.55.46.37.28.1
9
Figure 5. Gene expression patterns of Arabid-
opsis genes involved in the formation and
maintenance of the shoot apical meristem
(SAM) and the determination of flower organs
(a), and their putative orthologs in Glycine max
(soybean) (b).
Genevestigator (Hruz et al., 2008) and the soy-
bean gene atlas were mined to establish the
expression pattern of the Arabidopsis and soy-
bean genes, respectively.
Soybean transcriptome atlas 95
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
tissue (‡100-fold change; Table S10). Subsequently, we
mined the M. truncatula and L. japonicus expression data
for the corresponding orthologs by referencing the respec-
tive gene expression atlases (Benedito et al., 2008; Hogsl-
und et al., 2009). This approach allowed the direct
comparison of 40 soybean genes in five tissues (nodule,
root, leaf, flower and pods) with the corresponding
M. truncatula orthologs in the same five tissues, and the
L. japonicus orthologs in four tissues (nodule, root, leaf
and flower). This comparison showed that 18 soybean
genes share similar tissue specificity with their putative
orthologs in M. truncatula and L. japonicus (Table S14).
This number may simply reflect the difficulty of establish-
ing true orthology, or may reflect subfunctionalization or
neofunctionalization of the remaining 22 soybean putative
orthologs. To better establish orthology, we analyzed
microsynteny between the G. max, M. truncatula and
L. japonicus loci encoding the various putative orthologs.
Significant microsynteny was found between three G. max
and M. truncatula and eight G. max and L. japonicus gene
regions (Figure S4; Table S14). For example, microsynteny
was found between the Glyma01g44660 soybean gene
region and the corresponding regions in both M. trunca-
tula (Medtr5g006680) and L. japonicus (CM0591.50.nd).
These three genes were expressed specifically in flowers.
Interestingly, during our analysis we also highlighted
synteny between legume genes not identified during the
initial screen (Figure S4). For instance, in addition to appar-
ent orthology to Glyma07g16290, LjCM0147.870.nc was also
orthologous to Glyma18g40360, a soybean gene preferen-
tially expressed in the nodules, based on the soybean gene
atlas (Figure S4; Table S2). These three genes are predicted
to encode C2H2 (Zn) TFs, consistent with the previously
mentioned abundant expression of this family of TF genes in
nodule tissue. Microsynteny was found between genes in
G. max and M. truncatula, which have very different expres-
sion patterns. For example, Glyma09g41200, Gly-
ma18g44670 and Glyma18g44680 are soybean genes
expressed specifically in flowers, and lie on a region of the
soybean genome microsyntenic to Medtr7g080300, which
also appears microsyntenic to the soybean loci encoding
Glyma01g32750 and Glyma01g32760, two soybean genes
expressed in a variety of organs (Figure S4; Tables S2).
This example suggests the subfunctionalization of
Glyma01g32750 and Glyma01g32760 after the divergence
of G. max and M. truncatula. As Glyma18g44670–Gly-
ma18g44680 and Glyma01g32750–Glyma01g32760 proba-
bly arose by tandem duplication, we assume that the
subfunctionalization of Glyma01g32750 and Gly-
ma01g32760 occurred after the duplication of the soybean
genome, but before their tandem duplication. The above
example further illustrates the value of genome and tran-
scriptome comparisons that allow interesting conclusions
concerning the orthology of specific genes, and their
evolutionary history. Space prevents us from presenting a
variety of additional examples. At this point, the annotation
of the G. max, M. truncatula and L. japonicus genomes
clearly needs improvement. We predict that the full integra-
tion of the syntenic and transcriptome analysis of these
three genomes will ultimately lead to the systematic iden-
tification of legume orthologs. At that point, it will be
possible to rapidly transfer genetic and functional knowl-
edge derived in one species to the others.
EXPERIMENTAL PROCEDURES
Bacterial cultures
Bradyrhizobium japonicum USDA110 was grown at 30�C for 3 daysin HM medium (Cole and Elkan, 1973), supplemented with yeastextract (0.025%), D-arabinose (0.1%) and chloramphenicol (0.004%).Before plant inoculation, B. japonicum cells were pelleted (2000 g
for 10 min), washed and diluted with sterile water to OD600 = 0.1.
Plant culture
All tissues described below were isolated from soybean G. max (L.)Merr. cultivar ‘Williams 82’ plants. For each tissue, three indepen-dent biological replicates were performed on a different set of plantsto ensure the reproducibility of the plant tissues analyzed (i.e. seedswere sowed three times on different days, and tissues were har-vested as described below).
Soybean seeds were surface sterilized according to the methoddescribed by Wan et al. (2005), and were sowed on nitrogen-freeB&D agar medium (Broughton and Dilworth, 1971). Untreatedroot hair cells and stripped roots used for qRT-PCR were isolatedfrom 3-day-old seedlings, as described by Wan et al. (2005). Asimilar protocol was used to isolate 84- and 120-HAS root hairs(Libault et al., 2010; 84- and 120-HAS root hairs were mock-inoculated root hairs isolated 12 and 48 h after being sprayedwith water).
Other tissues were isolated as described below. The 3-day-oldseedlings were germinated between moist Whatman filter paper.Root tips were harvested on these seedlings. To produce othertissues, germinated seedlings were transferred to the glasshouseunder long-day conditions (16-h day/8-h night) at 27�C on Promix Bxsoil (Premier Horticulture, http://www.premierhort.com). Fourteen-day-old SAM (V2 stage), 18-day-old trifoliate leaves, stem and roots(V2 stage), flowers (R2 stage), and seeds and pods (R6 stage) wereharvested. Nodules were harvested 32 days after the inoculationof 1 ml of B. japonicum suspension (OD600 = 0.1) on transferred3-day-old seedlings.
RNA extraction, DNase treatments, and reverse
transcription
Total RNA was isolated using Trizol Reagent (Invitrogen, http://www.invitrogen.com) according to the manufacturer’s instructions,followed by a chloroform extraction to improve their purity. TotalRNAs were treated and reverse-transcribed differentially regardingthe technology used to quantify cDNA levels.
qRT-PCR. The qRT-PCR reactions including the different controlswere performed as described by Libault et al. (2009b).
Solexa sequencing. For each condition, similar quantities of totalRNA isolated from three independent biological replicates werepooled together. After first- and second-strand cDNA synthesis, the
96 Marc Libault et al.
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
cDNAs were end repaired prior to ligation of Solexa adaptors. Theproducts were sequenced on a Solexa platform.
Quantitative PCR reaction conditions and data analysis
The qRT-PCR reactions were performed as described by Libaultet al. (2009b). The specificity of primer sets was confirmed byanalyzing the dissociation curve profile of each qRT-PCR amplicon,and the efficiency of primers (Peff) was quantified using LinRegPCR(Ramakers et al., 2003). Cons6, encoding an F-box protein (Libaultet al., 2008), was used to normalize the expression levels of puta-tive soybean regulatory genes. The cycle threshold (Ct) value of thereference gene was subtracted from the Ct values of the test geneanalyzed (DCt). The expression level (E) of each gene was calcu-lated according to the equation: E = Peff
()DCt). The average of
the expression levels between three different replicates wascalculated.
Solexa read alignment, statistical analysis and data
representation
Illumina Genome Analyzer II image data were base-called andquality filtered using the default filtering parameters of the IlluminaGA Pipeline GERALD stage (Illumina, Inc., http://www.illu-mina.com). Alignments of passing 36-mer reads to all contigs of theGlyma1 8x Soybean Genome assembly (Soybean Genome Project,http://www.jgi.doe.gov) were performed using GSNAP (Wu andNacu, 2010), an alignment program derived from GMAP (Wu andWatanabe, 2005), with optimizations for aligning short transcriptreads from next-generation sequencers to genomic referencesequences. Alignments were processed using the Alpheus pipeline(Miller et al., 2008), keeping only alignments that had at least 34 outof 36 identities, and had no more than five equivalent best hits. Readcounts used in expression analyses were based on the subset ofuniquely aligned reads that also overlapped the genomic spans ofthe Glyma1 gene predictions. Read counts for a given sample werenormalized by using values for a gene’s uniquely aligned readcounts per million reads uniquely aligning within that sample.
The raw and normalized Solexa data are available on http://digbio.missouri.edu/soybean_atlas, whereas the entire set ofSolexa sequences used in our studies can be downloaded fromthe NCBI SRA browser (accession number SRA012188.1; http://www.ncbi.nlm.nih.gov/Traces/sra).
The color code maps of the soybean transcriptome across the 20chromosomes were generated by using the comparative map andtrait viewer (CMTV) software (Sawkins et al., 2004).
Synteny analysis
To establish microsynteny between G. max and A. thaliana, aminoacid sequences of the A. thaliana candidate genes and at least the 20genes surrounding them were blasted against soybean genomesequences. Using a P < e)20 as a cut-off, BLAST results and geneannotation were analyzed manually to established microsynteny.
To compare the gene expression of orthologous genes betweenG. max, M. truncatula and L. japonicus, we first mapped themedicago and lotus Affymetrix probe sets against their respectivegenomes based on NCBI BLASTN searches. Only probe sets with atleast nine matching probes, sited at least 22-bp up- or downstreamof a 4000-bp region, were considered for further analysis. TheBLAST of the predicted soybean transcripts against the MedicagoMt v3.0 (http://www.medicago.org/genome) and Lotus pseudoge-nomes (http://www.kazusa.or.jp/lotus) associated with the mappingof the Medicago and Lotus Affymetrix probe sets led to a directcomparison of the expression of the soybean, Medicago and Lotusgenes. When genes shared a similar tissue specificity, we high-
lighted their orthology by establishing a microsynteny relationshipbetween them using the same methodology described above.
Graphics showing microsynteny relationships were generated byusing CMTV (Sawkins et al., 2004).
ACKNOWLEDGEMENTS
We thank Melanie Mormile, Sandra Thibivilliers and CharlieP. Jones for their critical reading of the manuscript. We also thankChia Rou Yeo for technical assistance and Shaoxing Wang forproviding some total RNA samples. We are also grateful to theMedicago Genome Sequence Consortium (MGSC) for providingM. truncatula genomic sequences. This work was funded by a grantfrom the National Science Foundation (Plant Genome Program,#DBI-0421620). TJ, LDF and DX were supported by United SoybeanBoard grant #8236.
SUPPORTING INFORMATION
Additional Supporting Information may be found in the onlineversion of this article:Figure S1. Expression levels of putative soybean (Glycine max)constitutive genes in 14 different conditions (y-axis) compared withthe average of their expression levels across the 14 conditions(x-axis).Figure S2. A total of 29 soybean (Glycine max) nodulation-relatedgenes were expressed preferentially in mature nodules.Figure S3. Syntenic relationship between Glycine max and Arabid-opsis thaliana genes involved in flower organ determination andmaintenance of the shoot apical meristem.Figure S4. Syntenic relationship between Glycine max (soybean),Medicago truncatula and Lotus japonicus genes surrounding soy-bean-nodule- and flower-specific genes.Figure S5. Comparison of the transcriptomes of 1016 soybeanregulatory genes by qRT-PCR tissues.Table S1. Gene expression pattern of predicted and unannotatedGlycine max (soybean) genes in nine different tissues.Table S2. Gene expression pattern of predicted and unannotatedGlycine max (soybean) genes in nine different tissues, and in roothair and stripped roots in response to Bradyrhizobium japonicum.Table S3. Confidence in gene prediction according to Schmutz et al.(2010) of 16 198 Glycine max (soybean) genes not expressed insoybean tissues, and in the early steps of nodulation.Table S4. Gene expression of Glycine max (soybean) homeologousgenes.Table S5. Gene expression of Glycine max (soybean) homeologousgenes relative to putative pseudogenes.Table S6. Unannotated sequence reads that overlap Glycine max(soybean) annotated genes leading to an improvement of thesoybean gene annotation.Table S7. Identification of the signature domains of the 1736proteins encoded by the putative new Glycine max (soybean)genes.Table S8. Expression levels of Glycine max (soybean) sequencesidentified to be ubiquitously expressed across the nine soybeantissues tested.Table S9. Gene expression and function of putative Glycine max(soybean) constitutive genes across 14 different conditions.Table S10. Identification of Glycine max (soybean) transcriptspreferentially (‡3- and <10-fold changes between the expressionlevels of the most highly expressed and second most highlyexpressed genes; yellow), specifically (‡10- and <100-fold changes;orange), very specifically (‡100- and <1000-fold changes; red) andexclusively (‡1000-fold change; purple) identified in one of the ninetissues tested.
Soybean transcriptome atlas 97
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
Table S11. Identification of Glycine max (soybean) transcriptspreferentially (‡3- and <10-fold change; yellow), specifically (‡10-and <100-fold change; orange) and very specifically (‡100- and<1000-fold change; red) identified in soybean root hair cells andmeristems.Table S12. Relative gene expression levels of putative Glycinemax (soybean) nodulation-related genes in nine different tissues,including mature nodules.Table S13. Identification of Glycine max (soybean) transcriptionfactor genes preferentially (‡3- and <10-fold changes; yellow),specifically (‡10- and <100-fold changes; orange), very specifically(‡100- and <1000-fold changes; red) and exclusively (>1000-foldchange; purple) expressed in one out of the nine tissues tested.Table S14. Gene expression pattern between Glycine max(soybean), Medicago truncatula and Lotus japonicus orthologousgenes.Table S15. Gene expression of 1016 Glycine max (soybean)regulatory genes in 11 different soybean tissues.Table S16. Identification of tissue-specific Glycine max (soybean)regulatory genes, based on qRT-PCR experiments.Appendix S1. Large-scale qRT-PCR of Glycine max (soybean)transcription factor genes.Please note: As a service to our authors and readers, this journalprovides supporting information supplied by the authors. Suchmaterials are peer-reviewed and may be re-organized for onlinedelivery, but are not copy-edited or typeset. Technical supportissues arising from supporting information (other than missingfiles) should be addressed to the authors.
REFERENCES
Aceituno, F.F., Moseyko, N., Rhee, S.Y. and Gutierrez, R.A. (2008) The rules of
gene expression in plants: organ identity and gene body methylation are
key factors for regulation of gene expression in Arabidopsis thaliana. BMC
Genomics, 9, 438.
Adams, K.L. (2007) Evolution of duplicate gene expression in polyploid and
hybrid plants. J. Hered. 98, 136–141.
Amor, B.B., Shaw, S.L., Oldroyd, G.E., Maillet, F., Penmetsa, R.V., Cook, D.,
Long, S.R., Denarie, J. and Gough, C. (2003) The NFP locus of Medicago
truncatula controls an early step of Nod factor signal transduction
upstream of a rapid calcium flux and root hair deformation. Plant J. 34,
495–506.
Barton, M.K. and Poethig, R.S. (1993) Formation of the shoot apical meristem
in Arabidopsis thaliana: an analysis of development in the wild type and in
the shoot meristemless mutant. Development (Cambridge, England), 119,
823–831.
Baumberger, N., Ringli, C. and Keller, B. (2001) The chimeric leucine-rich
repeat/extensin cell wall protein LRX1 is required for root hair morpho-
genesis in Arabidopsis thaliana. Genes Dev. 15, 1128–1139.
Baumberger, N., Doesseger, B., Guyot, R. et al. (2003) Whole-genome
comparison of leucine-rich repeat extensins in Arabidopsis and rice. A
conserved family of cell wall proteins form a vegetative and a reproductive
clade. Plant Physiol. 131, 1313–1326.
Benedito, V.A., Torres-Jerez, I., Murray, J.D. et al. (2008) A gene expression
atlas of the model legume Medicago truncatula. Plant J. 55, 504–513.
Bennett, S.T., Barnes, C., Cox, A., Davies, L. and Brown, C. (2005) Toward the
1,000 dollars human genome. Pharmacogenomics, 6, 373–382.
Broughton, W.J. and Dilworth, M.J. (1971) Control of leghaemoglobin
synthesis in snake beans. Biochem. J. 125, 1075–1080.
Bucher, M., Brunner, S., Zimmermann, P., Zardi, G.I., Amrhein, N., Willmitzer,
L. and Riesmeier, J.W. (2002) The expression of an extensin-like protein
correlates with cellular tip growth in tomato. Plant Physiol. 128, 911–923.
Cannon, S.B., Sterck, L., Rombauts, S. et al. (2006) Legume genome evolution
viewed through the Medicago truncatula and Lotus japonicus genomes.
Proc. Natl Acad. Sci. USA, 103, 14959–14964.
Carol, R.J. and Dolan, L. (2006) The role of reactive oxygen species in cell
growth: lessons from root hairs. J. Exp. Bot. 57, 1829–1834.
Catoira, R., Galera, C., de Billy, F., Penmetsa, R.V., Journet, E.P., Maillet, F.,
Rosenberg, C., Cook, D., Gough, C. and Denarie, J. (2000) Four genes of
Medicago truncatula controlling components of a nod factor transduction
pathway. Plant Cell, 12, 1647–1666.
Cheung, F., Haas, B.J., Goldberg, S.M., May, G.D., Xiao, Y. and Town, C.D.
(2006) Sequencing Medicago truncatula expressed sequenced tags using
454 Life Sciences technology. BMC Genomics, 7, 272.
Choi, H.K., Mun, J.H., Kim, D.J. et al. (2004) Estimating genome conservation
between crop and model legume species. Proc. Natl Acad. Sci. USA, 101,
15289–15294.
Coen, E. (2001) Goethe and the ABC model of flower development. C. R. Acad.
Sci. III, 324, 523–530.
Cole, M.A. and Elkan, G.H. (1973) Transmissible resistance to penicillin G,
neomycin, and chloramphenicol in Rhizobium japonicum. Antimicrob.
Agents Chemother. 4, 248–253.
Combier, J.P., Frugier, F., de Billy, F. et al. (2006) MtHAP2-1 is a key tran-
scriptional regulator of symbiotic nodule development regulated by micro-
RNA169 in Medicago truncatula. Genes Dev. 20, 3084–3088.
Ditta, G., Pinyopich, A., Robles, P., Pelaz, S. and Yanofsky, M.F. (2004) The
SEP4 gene of Arabidopsis thaliana functions in floral organ and meristem
identity. Curr. Biol. 14, 1935–1940.
Endrizzi, K., Moussian, B., Haecker, A., Levin, J.Z. and Laux, T. (1996) The
SHOOT MERISTEMLESS gene is required for maintenance of undifferen-
tiated cells in Arabidopsis shoot and floral meristems and acts at a different
regulatory level than the meristem genes WUSCHEL and ZWILLE. Plant J.
10, 967–979.
Gill, N., Findley, S., Walling, J.G., Hans, C., Ma, J., Doyle, J., Stacey, G. and
Jackson, S.A. (2009) Molecular and chromosomal evidence for allopoly-
ploidy in soybean. Plant Physiol. 151, 1167–1174.
Haney, C.H. and Long, S.R. (2009) Plant flotillins are required for infection by
nitrogen-fixing bacteria. Proc. Natl. Acad. Sci. USA, 107, 478–483.
Heckmann, A.B., Lombardo, F., Miwa, H., Perry, J.A., Bunnewell, S., Parniske,
M., Wang, T.L. and Downie, J.A. (2006) Lotus japonicus nodulation requires
two GRAS domain regulators, one of which is functionally conserved in a
non-legume. Plant Physiol. 142, 1739–1750.
Hogslund, N., Radutoiu, S., Krusell, L. et al. (2009) Dissection of symbiosis
and organ development by integrated transcriptome analysis of lotus
japonicus mutant and wild-type plants. PLoS ONE, 4, e6556.
Honma, T. and Goto, K. (2001) Complexes of MADS-box proteins are suffi-
cient to convert leaves into floral organs. Nature, 409, 525–529.
Hruz, T., Laule, O., Szabo, G., Wessendorp, F., Bleuler, S., Oertle, L.,
Widmayer, P., Gruissem, W. and Zimmermann, P. (2008) Genevestigator
V3: a reference expression database for the meta-analysis of transcripto-
mes. Adv. Bioinformatics, 420747.
Jiao, Y., Tausta, S.L., Gandotra, N. et al. (2009) A transcriptome atlas of rice
cell types uncovers cellular, functional and developmental hierarchies. Nat.
Genet. 41, 258–263.
Kalo, P., Gleason, C., Edwards, A. et al. (2005) Nodulation signaling in
legumes requires NSP2, a member of the GRAS family of transcriptional
regulators. Science, 308, 1786–1789.
Kouchi, H. and Hata, S. (1995) GmN56, a novel nodule-specific cDNA from
soybean root nodules encodes a protein homologous to isopropylmalate
synthase and homocitrate synthase. Mol. Plant Microbe Interact. 8, 172–
176.
Laux, T., Mayer, K.F., Berger, J. and Jurgens, G. (1996) The WUSCHEL gene is
required for shoot and floral meristem integrity in Arabidopsis. Develop-
ment, 122, 87–96.
Libault, M., Thibivilliers, S., Bilgin, D.D., Radwan, O., Benitez, M., Clough, S.J.
and Stacey, G. (2008) Identification of four soybean reference genes for
gene expression normalization. The Plant Genome, 1, 44–54.
Libault, M., Joshi, T., Benedito, V.A., Xu, D., Udvardi, M.K. and Stacey, G.
(2009a) Legume transcription factor genes: what makes legumes so spe-
cial? Plant Physiol. 151, 991–1001.
Libault, M., Joshi, T., Takahashi, K. et al. (2009b) Large-scale analysis of
putative soybean regulatory gene expression identifies a myb gene
involved in soybean nodule development. Plant Physiol. 151, 1207–1220.
Libault, M., Farmer, A., Brechenmacher, L. et al. (2010) Complete transcrip-
tome of soybean root hair cell, a single cell model, and its alteration in
response to Bradyrhizobium japonicum infection. Plant Physiol. 152, 541–
552.
Madsen, E.B., Madsen, L.H., Radutoiu, S. et al. (2003) A receptor kinase gene
of the LysM type is involved in legume perception of rhizobial signals.
Nature, 425, 637–640.
98 Marc Libault et al.
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99
Margulies, M., Egholm, M., Altman, W.E. et al. (2005) Genome sequencing in
microfabricated high-density picolitre reactors. Nature, 437, 376–380.
Mayer, K.F., Schoof, H., Haecker, A., Lenhard, M., Jurgens, G. and Laux, T.
(1998) Role of WUSCHEL in regulating stem cell fate in the Arabidopsis
shoot meristem. Cell, 95, 805–815.
Middleton, P.H., Jakab, J., Penmetsa, R.V. et al. (2007) An ERF transcription
factor in Medicago truncatula that is essential for Nod factor signal trans-
duction. Plant Cell, 19, 1221–1234.
Miller, N.A., Kingsmore, S.F., Farmer, A.D. et al. (2008) Management of high-
throughout DNA sequencing projects: Alpheus. J. Comput. Sci. Syst. Biol.
1, 132–148.
Muller, S., Han, S. and Smith, L.G. (2006) Two kinesins are involved in the
spatial control of cytokinesis in Arabidopsis thaliana. Curr. Biol. 16, 888–
894.
Murakami, Y., Miwa, H., Imaizumi-Anraku, H., Kouchi, H., Downie, J.A.,
Kawaguchi, M. and Kawasaki, S. (2006) Positional cloning identifies Lotus
japonicus NSP2, a putative transcription factor of the GRAS family,
required for NIN and ENOD40 gene expression in nodule initiation. DNA
Res. 13, 255–265.
Nobuta, K., Venu, R.C., Lu, C. et al. (2007) An expression atlas of rice mRNAs
and small RNAs. Nat. Biotechnol. 25, 473–477.
Oldroyd, G.E. and Downie, J.A. (2008) Coordinating nodule morphogenesis
with rhizobial infection in legumes. Annu. Rev. Plant. Biol. 59, 519–546.
Oldroyd, G.E. and Long, S.R. (2003) Identification and characterization of
nodulation-signaling pathway 2, a gene of Medicago truncatula involved in
Nod actor signaling. Plant Physiol. 131, 1027–1032.
Pelaz, S., Tapia-Lopez, R., Alvarez-Buylla, E.R. and Yanofsky, M.F. (2001)
Conversion of leaves into petals in Arabidopsis. Curr. Biol. 11, 182–184.
Radutoiu, S., Madsen, L.H., Madsen, E.B. et al. (2003) Plant recognition of
symbiotic bacteria requires two LysM receptor-like kinases. Nature, 425,
585–592.
Ramakers, C., Ruijter, J.M., Deprez, R.H. and Moorman, A.F. (2003)
Assumption-free analysis of quantitative real-time polymerase chain
reaction (PCR) data. Neurosci. Lett. 339, 62–66.
Robles, P. and Pelaz, S. (2005) Flower and fruit development in Arabidopsis
thaliana. Int. J. Dev. Biol. 49, 633–643.
Sawkins, M.C., Farmer, A.D., Hoisington, D., Sullivan, J., Tolopko, A., Jiang,
Z. and Ribaut, J.M. (2004) Comparative map and trait viewer (CMTV): an
integrated bioinformatic tool to construct consensus maps and compare
QTL and functional genomics data across genomes and experiments. Plant
Mol. Biol. 56, 465–480.
Schauser, L., Roussis, A., Stiller, J. and Stougaard, J. (1999) A plant regulator
controlling development of symbiotic root nodules. Nature, 402, 191–195.
Schlueter, J.A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J.J. and
Shoemaker, R.C. (2004) Mining EST databases to resolve evolutionary
events in major crop species. Genome, 47, 868–876.
Schlueter, J.A., Lin, J.Y., Schlueter, S.D. et al. (2007) Gene duplication and
paleopolyploidy in soybean and the implications for whole genome
sequencing. BMC Genomics, 8, 330.
Schmid, M., Davison, T.S., Henz, S.R., Pape, U.J., Demar, M., Vingron, M.,
Scholkopf, B., Weigel, D. and Lohmann, J.U. (2005) A gene expression map
of Arabidopsis thaliana development. Nat. Genet. 37, 501–506.
Schmutz, J., Cannon, S.B., Schlueter, J. et al. (2010) Genome sequence of the
palaeopolyploid soybean. Nature, 463, 178–183.
Smit, P., Raedts, J., Portyanko, V., Debelle, F., Gough, C., Bisseling, T. and
Geurts, R. (2005) NSP1 of the GRAS protein family is essential for rhizobial
Nod factor-induced transcription. Science, 308, 1789–1791.
Wan, J., Torres, M., Ganapathy, A., Thelen, J., DaGue, B.B., Mooney, B., Xu, D.
and Stacey, G. (2005) Proteomic analysis of soybean root hairs after
infection by Bradyrhizobium japonicum. Mol. Plant Microbe Interact. 18,
458–467.
Weber, A.P., Weber, K.L., Carr, K., Wilkerson, C. and Ohlrogge, J.B. (2007)
Sampling the Arabidopsis transcriptome with massively parallel pyrose-
quencing. Plant Physiol. 144, 32–42.
Winzer, T., Bairl, A., Linder, M., Linder, D., Werner, D. and Muller, P. (1999) A
novel 53-kDa nodulin of the symbiosome membrane of soybean nodules,
controlled by Bradyrhizobium japonicum. Mol. Plant Microbe Interact. 12,
218–226.
Wu, T.D. and Nacu, S. (2010) Fast and SNP-tolerant detection of complex
variants and splicing in short reads. Bioinformatics, 26, 873–881.
Wu, T.D. and Watanabe, C.K. (2005) GMAP: a genomic mapping and align-
ment program for mRNA and EST sequences. Bioinformatics, 21, 1859–
1875.
Yano, K., Yoshida, S., Muller, J. et al. (2008) CYCLOPS, a mediator of sym-
biotic intracellular accommodation. Proc. Natl Acad. Sci. USA, 105, 20540–
20545.
Young, N.D. and Udvardi, M. (2009) Translating Medicago truncatula
genomics to crop legumes. Curr. Opin. Plant Biol. 12, 193–201.
Zdobnov, E.M. and Apweiler, R. (2001) InterProScan–an integration platform
for the signature-recognition methods in InterPro. Bioinformatics, 17, 847–
848.
Soybean transcriptome atlas 99
ª 2010 The AuthorsJournal compilation ª 2010 Blackwell Publishing Ltd, The Plant Journal, (2010), 63, 86–99