genome-wide expression profiling during germination in plants · pdf filegenome-wide...

20
Genome-Wide Expression Profiling during Germination in Plants Reena Narsai ARC CoE Plant Energy Biology, CoE Computational Systems Biology University of Western Australia, Australia James Whelan ARC CoE Plant Energy Biology University of Western Australia, Australia

Upload: phungkhue

Post on 09-Mar-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Genome-Wide Expression Profiling during Germination in Plants

Reena Narsai ARC CoE Plant Energy Biology, CoE Computational Systems Biology

University of Western Australia, Australia

James Whelan ARC CoE Plant Energy Biology

University of Western Australia, Australia

1 Introduction

Land plants are thought to have evolved from algae more than 400 million years ago (Gray, 1985). The diversity of land plants seen today is due to developmental changes that have taken place in this time pe-riod, resulting in land plants that inhabit every continent on the planet. Early land plants had no special-ised reproductive organs, with the earliest plants relying on spores for reproduction (Gray, 1985). Plants are sessile; therefore a significant amount of regulation is required at the molecular level during devel-opment and in response to changing environmental conditions.

The typical life cycle of flowering plants involves several distinct developmental stages, beginning from a seed that undergoes the process of germination to form an established plant that then progresses to a reproductive stage, in which specialised organs for reproduction are produced, ultimately enabling new seed production (Ma et al., 2005). Following fertilisation, embryo development occurs on the maternal plant and during this time lipids, carbohydrates and proteins are stored as energy sources necessary for development. Once this storage is complete and the embryo is mature, the seed is released from the ma-ternal plant, allowing it to either progress into a state of dormancy or undergo germination. Seed dorman-cy is defined as a metabolically inactive or quiescent state that the seed remains in until conditions are suitable for germination (Bewley, 1997). Given that plants have adapted to their native environments, the time taken between fertilisation and germination of the mature seed can vary greatly between species, and dormancy can last from weeks to years.

Once dormancy is alleviated, germination occurs. The process of germination is one of the most crucial stages in the plant life cycle, beginning with the uptake of water by the dry seed, known as imbi-bition, and concluding when a part of the embryo emerges from the structures surrounding it (Bewley, 1997). Germination typically occurs in a matter of hours, in which a seed undergoes a rapid transition from a state of dormancy to a metabolically active seedling. It is proposed to consist of 3 phases; begin-ning with a rapid uptake of water (phase I), followed by a plateau phase of water uptake (phase II) and ending with a final water uptake phase (phase III). During phases I and II, large metabolic changes in-volving the utilisation of the stored reserves for energy production occur before the plant becomes auto-trophic and these metabolic changes are necessary to prepare the embryo for the growth that occurs dur-ing phase III (Bewley, 1997). During germination, the large changes in metabolic activity are driven by underlying regulatory processes. Hormonal regulation has been well characterised during germination and generally occurs by the antagonistic interaction of the phytohormones; abscisic acid (ABA) and gib-berellins (GA), whereby ABA represses germination and GA promotes germination (Bewley, 1997; Holdsworth et al., 2008).

Although the general characteristics of the germination and other phases of the plant life cycle have been well characterised, the underlying molecular mechanisms controlling these processes have only been uncovered over the last few decades, in the advent of the “–omics” era. Specifically, it has been shown that these processes in the plant life cycle, including germination, are regulated at multiple molecular levels including at the genomic, transcriptomic, proteomic and metabolomic levels. The “-omics” era began with DNA sequencing and with the latest advances in high-throughput sequencing, genomic sequence information for plants is ever increasing. The genome sequence of maize (Schnable et al., 2009), Populus (Tuskan et al., 2006), soybean (Schmutz et al., 2010), cotton (Wang et al., 2012), as well as Chlamydomonas (Merchant et al., 2007), Physcomitrella (Rensing et al., 2008) and others, means that the use of post-genomic tools in all aspects of plant biology is giving greater insights into a variety of

processes. The first plant genome sequence was for the model dicot plant, Arabidopsis thaliana (Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, 2000), followed by the rice (Oryza Sativa) genome sequence (Goff et al., 2002), representing rice as the model monocot species. Given that these species were among the first plant species to be sequenced, they were in the forefront of technology designed for plants. Thousands of publications have utilised this sequence data in a variety of studies. These include proteomic studies that utilise genome sequence information for the prediction of protein coding regions, as well as transcriptomic investigations, in which the genome sequence data is utilised to design primers or probes required for the analysis of transcript abundance.

With respect to germination, greater insights into these multiple levels of regulation have been uncovered to date. For example, regulation of germination at the genomic level has been observed, whereby DNA hypo-methylation during germination was found to be a necessary step before transcrip-tional activation for gene expression occurs (Lu et al., 2006). Several studies have also examined changes in transcripts (Howell et al., 2009; Nakabayashi et al., 2005; Narsai et al., 2011) and proteins (Gallardo et al., 2002; Law et al., 2012) during germination, with the aim to gain insight into the processes that are occurring at these levels, as well as to dissect the underlying regulatory mechanisms. Specifically, in a recent in-depth study of Arabidopsis germination, the significant spatial and temporal resolution of the transcriptome enabled the identification of unique gene-sets that may represent important developmental switches required for germination (Narsai et al., 2011). Interestingly, examining the genome-wide tran-scriptomic responses during germination in both monocots e.g. rice (Howell et al., 2009) and dicots e.g. Arabidopsis (Narsai et al., 2011) reveals the common and distinct early expression responses occurring within the first few hours of germination. The identification of specific gene expression patterns and the putative cis-elements that may be controlling these genes, gives insight into the tight regulation that oc-curs during germination in plants, with these genes representing potential candidate genes that can be used to prevent precocious germination. Given that this high resolution of analysis is possible at the tran-script level, an ever increasing number of studies are using microarray technology and more recently, RNA sequencing technology to carry out global analyses of the transcriptomic changes occurring during plant development.

2 Transcriptomics

Before complete genome sequences were available for plants, the analysis of the mRNA expression lev-els was limited, both by the lack of sequence information and high-throughput technology. Methods in-cluding northern blot analysis, dot-blots, slot-blots allowed quantification of mRNA expression levels of a single or a few genes at a time. The discovery and utilization of a quantitative method of real-time pol-ymerase chain reaction (qRT-PCR) then increased the number of transcripts that could be analysed at a time. However, the genome-wide approach to determining expression levels came about after the devel-opment of specific microarrays, allowing quantitative analysis of all transcripts in a single sample, at the same time i.e. the transcriptome. As complete genome sequences of species are available, specific ge-nome microarrays for these species are developed, hence a number of studies have used microarrays to study the transcriptomic processes occurring during germination and development in Arabidopsis (Nakabayashi et al., 2005; Narsai et al., 2011; Schmid et al., 2005), rice (Howell et al., 2009; Jain et al.,

2007; Narsai et al., 2009) and other plant species (e.g. barley (Hansen et al., 2009; Sreenivasulu et al., 2008)).

Recently, we examined the in-depth transcriptomic changes that occurred from dry seed to within hours of imbibition in the model plant Arabidopsis thaliana (Narsai et al., 2011). Specifically, 10 time points were selected, including freshly harvested seeds (before desiccation, directly upon removal from the silique; H), seeds desiccated for 15 d in the darkness (0 h), seeds stratified at 4 ⁰C in the dark for 1 h (1 h S), 12 h (12 h S), and 48 h (48 h S), seeds transferred into continuous light (after stratification) and further collected at 1 h SL, 6 h SL, 12 h SL, 24 h SL, and 48 h SL (Figure 1). Before this study, the tran-scriptomic changes that occurred during dormancy and germination in Arabidopsis were also examined (Figure 1) (Nakabayashi et al., 2005). Similarly, in 2008, using the partial Barley genome sequence, mi-croarrays were used to examine germination and seedling development in Barley (Figure 1) (Sreenivasulu et al., 2008). Given that rice undergoes germination under both aerobic and anaerobic con-ditions, a greater understanding of rice germination was also gained under both conditions (Howell et al., 2009; Narsai et al., 2009) (Figure 1). As germination is defined as concluding when a part of the embryo emerges from the testa (generally around 24 h after imbibition in the light), the final time point (48 h SL) is considered a post-germination time point (Figure 1).

Figure 1: Overview of germination studies in Arabidopsis (Narsai et al., 2011; Nakabayashi et al., 2005), rice (Howell et al., 2009; Narsai et al., 2009) and barley (Sreenivasulu et al., 2008) involving genome-wide transcriptomic analyses.

Having a high resolution view of the transcriptome during germination revealed specific temporal expression patterns that to our knowledge have not previously been identified. Interestingly, it can be seen that even in these different species with different life cycle lengths, a significant percentage of tran-scripts remain stored in the dry seed, and germination generally occurred by 24 h after imbibition (Figure 1). It has been shown that although germination can occur in the absence of transcription, it cannot pro-gress to seedling establishment or beyond (Rajjou et al., 2004). Thus, although transcription is not essen-tial for germination per se, it is essential for any further growth and development. Together, these find-ings suggest that the stored transcripts in the dry seed are crucial during the early hours of germination.

2.1 Experimental design

In order to gain maximum insight into the transcriptome, whilst minimizing on cost and time, it is crucial to optimise the experimental design. This is particularly important for time course experiments, such as those analysing processes occurring in a matter of hours, such as germination. The selection of time points for analysis is particularly crucial, where the inclusion or exclusions of specific time points can significantly affect the output and insight gained. The importance of this can clearly be displayed in the study examining germination in Arabidopsis (Narsai et al., 2011), where the expression of a number of genes were analysed by quantitative real time PCR over 15 time points and based on this, 10 time points were selected for global transcriptomic analysis using microarrays. In total, 30 microarrays were used for these 10 time points, as each time point involved the analysis of 3 biological replicates, which is crucial for optimal experimental design and output. When fewer time points and a different experimental design was used for germination in Arabidopsis, for example, when germination was examined without stratifi-cation (Nakabayashi et al., 2005), the output and expression profiles can differ significantly. By examin-ing the transcriptomic changes over 10 time points during germination (Narsai et al., 2011), it was possi-ble to identify a group of genes showing a highly specific transient expression pattern, whereby these genes rapidly increased in expression at the end of stratification before significantly decreasing in expres-sion by 6 h later. Without inclusion of these specific time points early after the transfer from stratification into continuous light, during germination, it would not have been possible to identify this very rapid regulation occurring within these hours. Similarly, in the studies examining germination in rice, a crucial transient expression, largely of genes encoding transcription factors was seen at 3 hours after imbibition, and without inclusion of this time point, this rapid peak in expression would also not have been distin-guished from other expression profiles.

2.2 Microarray processing and analysis

There are several different platforms available for microarrays, e.g. Agilent, Affymetrix and custom-made arrays for each species. The greatest number of microarrays deposited in public microarray data-bases for both rice and Arabidopsis have been carried out using the Affymetrix platform. In the in-depth transcriptome study during germination in Arabidopsis, Affymetrix microarrays were used. Like any mi-croarray experiment, following RNA isolation, cDNA synthesis was carried out (Figure 2). Using this as a template, in-vitro transcription was carried out to generate aRNA (amplified complementary RNA), which was purified and hybridised to the microarray (Figure 2). Following the staining and scanning of a microarray, a CEL file is generated that maps probe intensities to a grid and therefore contains all the raw intensity information for all probesets. The first pre-processing steps of microarray analysis involves quality control checks including confirming expression of the hybridization and poly-A control genes. Several reviews to date give detailed insight into the necessary quality criteria that must be met before further analysis can be done (Allison et al., 2006; Nettleton, 2006). Following this, the next steps in mi-croarray analysis involves the statistical determination of gene expression as above background. To do this for Affymetrix microarrays, MAS5 normalisation of the CEL files was carried out, which uses an algorithm that shows the summarised signal intensities for all genes, and a statistical determination of whether this signal intensity was defined as significantly (p<0.05) above the background intensity, de-fined as “present.” For the Arabidopsis germination time course, genes were defined as present if a gene was expressed in at least 2 out of 3 replicates and in this way 15,789 genes were identified as expressed in one or more time points over the germination time course (Figure 2). The raw CEL files were then

GC-RMA normalized, which refers to a Robust Multi-array Average normalisation that accounts for the GC-content in probeset sequences. In this way, signal intensities for all the probesets on each microarray was generated and this was first compared between replicates to determine the Pearson correlation coef-ficients (typically needs to be >0.95). As expected, the correlation coefficients between replicates was >0.97 between replicates in samples from the germination time course (Narsai et al., 2011). The GC-RMA normalised data was then filtered to only show the 15, 789 genes that were present. There are many different normalization methods that can be used for microarray data, and many of these have been re-viewed (Allison et al., 2006; Nettleton, 2006). The standard preference for Affymetrix microarrays is the robust GC-RMA normalization. All downstream steps, including the differential expression analysis were carried out using this GC-RMA normalized data (Figure 2).

Figure 2: General overview showing the microarray analysis workflow, from experimental preparations to transcriptomic data analysis. The work-flow shown was largely used in for the microarray analysis carried out in the Arabidopsis (Narsai et al., 2011) and rice (Howell et al., 2009) germination studies.

2.2.1 Visualisation of expression data

Visualization of microarray data is much more than a data presentation method, and can even be the rea-son that some biological findings become apparent. For example, principle component analysis (PCA) compresses all expression values from a single microarray into 2 or 3 dimensional space, which not only provides an indication of the similarity between replicates, but can also reveal relationships between groups of microarrays, based on distances between these groups. Similarly, one of the most common vis-ualizations of microarray data is clustering, which involves grouping genes that show similar expression patterns together. This includes supervised clustering such as self-organising map clustering (Tamayo et al., 1999), and unsupervised clustering such as hierarchical clustering (Shannon et al., 2003). Clustering effectiveness can be seen by comparing data before and after hierarchical clustering of the normalized expression levels over germination (Figure 3A and B). To generate these clusters, expression values were first normalised to the maximum over the time course and hierarchically clustered using average linkage

clustering based on Euclidean distance and in this way 4 distinct clusters were identified (Figure 3B). Similarly, when the same clustering method was applied to the microarray data from the other Arabidop-sis germination time course study (Nakabayashi et al., 2005), similarities and differences between the expression patterns over germination became apparent (Figure 3C). Specifically, it was more difficult to identify the genes showing a tight transient expression pattern in this time course. Similarly, the use of hierarchical clustering for the germination time course in rice also revealed 4 distinct clusters, where one specific cluster also showed a tight transient expression of genes during germination (Howell et al., 2009) (Figure

Figure 3: Visualisation of microarray data. A) Un-ordered normalised expression levels over Arabidopsis germination (Narsai et al., 2011). B) Hierarchical clustering of data from A. C) Hi-erarchical clustering of microarray data over Arabidopsis germination without stratification (re-normalised from Nakabayashi et al., 2005). D) Hierarchical clustering of microarray data show-ing expression changes over rice germination (Howell et al., 2009).

2.2.2 Differential expression analysis

Although the visualization of gene expression profiles can be very useful to identify sets of genes with similar gene expression profiles, it is usually not a statistical method for identifying significant changes in gene expression. In order to do this, differential expression analysis was carried out, which allows the fold-change and significance of that fold change (associated p-value) to be determined. Given that the germination study involved multiple comparisons of time points over germination, a step-wise approach to the differential expression analysis was taken, whereby the expression of each gene at a given time point was compared to the time point prior to it. In this way, the differential expression analysis showed, H vs. 0 h (dry seed), 0 h (dry seed) vs. 1 h S, 1 h S vs. 12 h S and 12 h S vs. 48 h S to give insight into the expression changes occurring prior to and during dark stratification (4°C). Then, following a transfer into continuous light (at 22°C), the comparisons were 48 h S vs. 1 h SL, 1 h SL vs. 6 h SL, 6 h SL vs. 12 h SL, 12 h SL vs. 24 h SL and 24 h SL vs. 48 h SL.

Differential expression No. Over-rep. functional category example z-score Up-regulated genes 0 h v 1 h S 848 - - 1 h S v 12 h S 2498 DNA synthesis/chromatin structure 5.46 12 h S v 48 h S 5611 Protein synthesis 20.88 48 h S v 1 h SL 1009 Abiotic stress 5.48 1 h SL v 6 h SL 4571 Protein synthesis (chloroplast) 6.37 6 h SL v 12 h SL 4021 Photosynthesis 9.86 12 h SL v 24 h SL 4030 Cell wall 10.40 24 h SL v 48 h SL 3913 Photosynthesis 10.84 Down-regulated genes 0 h v 1 h S 452 Protein degradation -cysteine protease 2.86 1 h S v 12 h S 2267 Protein degradation-ubiquitin.E3 4.99 12 h S v 48 h S 4798 Protein degradation-ubiquitin.E3 8.40 48 h S v 1 h SL 1273 Photosynthesis 8.53 1 h SL v 6 h SL 4329 Abiotic stress 4.48 6 h SL v 12 h SL 3779 Stress 5.97 12 h SL v 24 h SL 5475 Development-storage proteins 5.10 24 h SL v 48 h SL 5595 Development-storage proteins 3.95

Table 1: Summary of the differentially expressed gene-set over germination in Arabidopsis (Narsai et al., 2011). For each of the step-wise comparisons (Differential expression), the number (No.) of significantly (p<0.05, PPDE>0.96) differentially expressed genes, an exam-ple of an over-represented functional category and the z-score associated with it are shown (determined using Pageman; Usadel et al. 2006). Z-scores that are greater than 1.96 represent p-values <0.05.

To carry out the differential expression analysis, Cyber-T (http://cybert.microarray.ics.uci.edu/) (Kayala & Baldi, 2012) was used with PPDE (posterior probability of differential expression) values computed. Given that multiple comparisons are required for the thousands of genes analysed by microar-rays, it is essential to also carry out a correction for false discovery rate (FDR). Using the Cyber-T tool, differential expression analysis was carried out and significance was determined where p-values were <0.05 (with a false discovery rate of <5%). For the first time in Arabidopsis, was seen revealed that over 10, 000 genes were significantly differentially expressed during the process of stratification, indicating that despite the finding that only small morphological changes are visible during stratification, it is a cru-cial time during which significant changes are occurring at the transcriptomic level. Note that although Cyber-T (Kayala & Baldi, 2012) was used for this analysis in the Arabidopsis germination study (Narsai et al., 2011), other tools such as the LIMMA package in R (Smyth, 2005) and other methods can also be used for differential expression analysis and false discovery correction.

2.3 Functional over-representation analysis

After generating hierarchical clusters (e.g. Figure 3B) and carrying out differential expression analysis, the clusters or gene-sets generated by these analyses are often made up of hundreds or thousands of genes. In order to gain insight into the relationship between these specific gene-sets or clusters and the function of the encoded proteins, a functional over-representation analysis can be carried out. This in-volves determining whether a given gene-set or cluster is enriched in genes encoding specific putative functions, compared to the expected occurrence of that functional category across all genes in the ge-nome. A recent study examined barley germination at the transcriptomic level and characterised the tran-scriptional regulatory program during the different phases of germination, revealing stage specific ex-pression of specific functions (An & Lin, 2011). Similarly, during germination in Arabidopsis (Narsai et al., 2011), functional over-representation analysis was carried out on the sets of significantly differential-ly expressed over germination using the Pageman tool (Usadel et al., 2006). This tool generated a heat-map showing z-scores as an indication of over/under representation of functional categories. Table 1 show examples of functional categories that were significantly over-represented in each of the gene-sets generated from the step-wise comparisons. Specifically, we can see that during germination, between 1 h and 12 h of stratification (1 h S vs. 12 h S), 2498 genes were significantly up-regulated and these genes are enriched in DNA synthesis/chromatin structure functions (Table 1). It can also be seen that as germi-nation progresses, there is significant up-regulation of protein synthesis and photosynthesis functions (Table 1). In contrast the down-regulated genes-sets were seen to be significantly enriched in protein deg-radation functions e.g. during stratification 12 h S v 48 h S, 4798 genes were down-regulated and these were enriched in ubiquitination functions (Table 1).

Notably, while the Pageman tool was very useful to determine enriched functional categories, we also wanted to gain insight into the genes that were not necessarily significantly differentially expressed based on the step-wise comparisons, but showed a trend in expression over the germination time course (Narsai et al., 2011). In addition, to determine whether there was an enrichment of genes encoding pro-teins targeted to specific sub-cellular localisations’, a gene ontology (GO) over-representation analysis was also carried out using the genes in specific clusters (Narsai et al., 2011). Table 2 shows the GO cate-gories that were over-represented in the gene-sets in each of the clusters (clusters are as shown in Figure 3B). As expected, it can be seen that the genes increasing in expression over time, specifically after 24 h SL (Cluster 1; Figure 3B), were enriched in genes annotated to encode plastid proteins and energy path-

ways (Table 2). In contrast, the genes that decreased in expression over germination (Cluster 2; Figure 3B) were enriched in genes annotated to encode nuclear proteins, specifically, those with transcription factor activity. Interestingly, it was seen that the genes showing transient expression (Cluster 3; Figure 3B) were enriched in both nuclear and mitochondrial localised proteins involved in DNA or RNA metab-olism (Table 2), while the genes in Cluster 4 was not found to be significantly enriched in any of the GO functional categories (Narsai et al., 2011).

Cluster Gene Ontology category example z-score Cellular component

1 plastid 6.51 2 nucleus 4.15 3 nucleus 9.06

mitochondria 4.92 Molecular function

1 structural molecule activity 8.03 2 transcription factor activity 5.03

3 nucleic acid binding 9.37 DNA or RNA binding 8.97

Biological process 1 electron transport or energy pathways 4.32 2 transcription 4.52

3 developmental processes 4.01 DNA or RNA metabolism 3.98

Table 2: Gene Ontology over-representation analysis for the genes in each cluster identified during germination in Arabidopsis (Narsai et al., 2011). The cluster number, gene ontology category and z-scores are shown for each category. Z-scores greater than 1.96 represent p-values <0.05.

2.4 Using public microarrays and phenotype databases

In order to extend transcriptomic studies, often publically available microarrays are analysed in parallel with a transcriptomic study. It is now standard for microarray data to be submitted to freely available public databases. In many instances, microarray data submission is a prerequisite for publication and has seen the establishment of compulsory microarray data repositories such as GEO (http://www.ncbi. nlm.nih.gov/geo) and ArrayExpress (http://www.ebi.ac.uk/arrayexpress). As with the submission of se-quence data, the submission of transcriptome data allows data to be used by the wider scientific commu-nity and has many impacts beyond the immediate study that produced the data. In particular it allows individual investigators carrying out experiments that utilise a limited number of hybridisations to com-pare their data to tens or hundreds of other experiments and thus give greater insights into biochemical processes and regulatory mechanisms, and often prompts new experiments or hypotheses. In 2005, hun-

dreds of microarrays were carried out showing the transcriptome changes over Arabidopsis development, referred to as the At-GenExpress Developmental dataset (Schmid et al., 2005). An excellent use of these public microarrays was presented in the Arabidopsis germination study (Narsai et al., 2011), whereby these microarrays were analysed in parallel with the germination time course microarrays. By analysing these microarrays in parallel, it was possible to identify genes that showed the highest level of expression (defined as germination specific expression) during germination in Arabidopsis, in comparison to 70 oth-er developmental tissues/stages. In this way it was possible to confirm that the transient expression seen during germination is not only a peak in expression over the germination time course, but that for many of these genes, this was the highest expression seen over the entire course of Arabidopsis development (Narsai et al., 2011). Furthermore, functional over-representation analysis of these also confirmed that these were enriched in genes encoding nuclear and mitochondrial proteins, specifically for RNA metabo-lism functions (Narsai et al., 2011). Following this, specific databases and studies providing phenotype information for loss-of-function mutants in Arabidopsis, such as the Seedgenes database (Meinke et al., 2008) and other studies e.g. (Pagnussat et al., 2005) revealed that the loss-of-function of a significant (p<0.05) number of genes showing this germination specific expression resulted in embryo lethality (Narsai et al., 2011). These findings suggest that there is an essential requirement for RNA metabolism functions during early germination and development in Arabidopsis.

3 Factors regulating mRNA abundance

Unlike the genome sequence, the transcriptome is highly variable, as evidenced by tissue or stage specif-ic expression patterns, such as those seen during germination (Howell et al., 2009; Narsai et al., 2009; Narsai et al., 2011). In addition, it has been shown that various factors can affect the transcriptome and ultimately the progression of germination under both normal and adverse conditions. For example, a re-cent study examined the germination of desiccation sensitive seeds in a polyethylene glycol solution and showed how this affected the transcriptome and enabled the reestablishment of desiccation tolerance (Maia et al., 2011). However, it is important to note that transcriptomic studies most often provide a snap-shot of the steady-state abundance of all transcripts in a single sample, and this is a result of a bal-ance achieved by both transcription and degradation. Thus, when considering the factors regulating mRNA abundance, it is important to consider both transcriptional regulation by cis-acting regulatory el-ements (CAREs) and factors regulating mRNA degradation.

3.1 Transcriptional regulation

With completed genome sequences available for both Arabidopsis and rice, it was possible to search for putative transcription factor binding sites in the upstream regions of the genes that showed this specific, transient expression pattern during germination (Howell et al., 2009; Narsai, et al., 2011). For example, during rice germination, a set of genes were observed that specifically peaked in expression at 3 hours after imbibition (Cluster 3; Figure 3D; Howell et al., 2009). In order to determine potential regulatory sites that may have a role in regulating the expression of these genes, the 1 kb upstream regions of these genes was extracted from TIGR (http://rice.plantbiology.msu.edu) and these were searched using a num-ber of tools including MEME (http://meme.sdsc.edu/meme/intro.html) (Bailey et al., 2006) and the Regulatory Sequence Alignment Tool - RSAT (http://rsat.ulb.ac.be) (Thomas-Chollier et al., 2008). Us-

ing these tools, a number of putative CAREs were identified that may be involved in regulating expres-sion resulting in the observed transient expression pattern (Howell et al., 2009). Similarly, for the transi-ently expressed genes during Arabidopsis germination, the 1 kb upstream regions were extracted from TAIR (http://www.arabidopsis.org) and putative CAREs were identified based on the Athamap (Galuschka et al., 2007) and AGRIS (Davuluri et al., 2003) databases that list experimentally determined binding transcription factor binding sites (Narsai et al., 2011). Thus, the use of both expression and se-quence information together in this way can provide the basis for designing experiments to confirm the function of putative CAREs and elements associated with mRNA decay. For example, a recent study confirmed a role for the FUSCA3 transcription factor in controlling the gene expression during seed germination at high temperature in Arabidopsis, where it was seen that over-expression of this transcrip-tion factor resulted in hypersensitivity to temperature and prevented germination (Chiu et al., 2012). Similarly, a recent study has identified a transcription factor, named DOF6, which when over-expressed, significantly affects seed germination, resulting in growth defects and sterility (Rueda-Romero et al., 2012). Thus, it can be seen that there is a crucial role for transcriptional control during germination.

3.2 mRNA degradation

Rapid changes in transcript abundance are often seen both in response to stress and over plant develop-ment. Just as transcripts were observed to rapidly increase in abundance during germination in both Ara-bidopsis and rice, these sets of transiently expressed genes also showed an equally rapid decrease in abundance, after 6 h SL in Arabidopsis (Cluster 3-Figure 3B; Narsai et al., 2011) and 3 hours after imbi-bition in rice (Cluster 3-Figure 3D; Howell et al., 2009). For Arabidopsis, mRNA degradation rates have been examined at the genome-wide level, revealing an association between the rate of mRNA degrada-tion and function of the encoded proteins (Narsai et al., 2007). Thus, it was possible to examine whether the transcripts rapidly decreasing in abundance after 6 h SL (Cluster 3-Figure 3B) were enriched in tran-scripts with short half-lives. Interestingly, the transiently expressed genes during Arabidopsis germina-tion were not enriched in transcripts with short half-lives, suggesting that the rapid decrease in abundance after 6 h SL (Cluster 3-Figure 3B) involves more regulatory factors than mRNA decay alone. Previous studies in plants and other eukaryotes have identified cis-acting elements, such as AU rich elements, in the 3’untranslated regions of the transcripts that have been shown to have a role in regu-lating mRNA degradation (Ohme-Takagi et al., 1993) . Given the rapid decrease in expression also seen in the transient expressed genes during germination in rice, it was interesting to find that the 3’untranslated regions of these transcripts were enriched in putative elements associated with rapidly degrading transcripts, suggesting a role for mRNA decay in controlling the observed expression pattern (Howell et al., 2009; Narsai et al., 2007). One of the ways that RNA degradation can also be controlled is via the expression of microRNAs, which have been shown to have a significant role in regulating expression in various plant tissues and developmental stages. A recent study reported on microRNA isolation in plant tissues and also revealed microRNA target genes that are likely to be involved in seed dormancy and germination (Kumar et al., 2011), thus revealing another way in which gene expression is controlled during germination.

4 Sub-cellular localisation

One of the many advantages of transcriptomic analyses is that it can provide an excellent basis for further functional research. When an interesting gene-set that showed transient expression was identified during germination in Arabidopsis, it was particularly notable that a significant percentage of these were anno-tated as mitochondrial proteins (Narsai, et al., 2011). Therefore, green florescence protein (GFP) analysis was carried out for 65 proteins, in order to confirm these putative localisation annotations and experimen-tally determine the sub-cellular localisation of the proteins that largely showed a specific transient ex-pression pattern during germination (Narsai, et al., 2011).

For GFP analysis, fusion proteins containing GFP and the first 100 amino acids of the protein of interest were constructed and transiently transformed into Arabidopsis cell culture using biolistic trans-formation (for predicted/putatively annotated mitochondrial and chloroplast proteins). For proteins pre-dicted to be peroxisomal, the last 100 amino acids were used for the transient transformation into Ara-bidopsis cell culture. Organelle targeting was verified using red florescence protein (RFP) analysis for the controls with AOX-RFP as a mitochondrial control, SSU-RFP as a plastid control, and RFP-SRL as a control for peroxisomal targeting (Narsai et al., 2011). In addition to the Arabidopsis germination study (Narsai et al., 2011), several other studies have also observed a relationship between expression patterns and the sub-cellular localisation of the encoded proteins (Carrie et al., 2009; Duncan et al., 2011; Lanfrancotti et al., 2007).

5 Quantitative mass spectrometry

It has been shown that translation is absolutely essential for germination to occur, while transcription is essential for seedling growth and establishment (Rajjou et al., 2004). Thus, given this requirement for both transcription and translation during these early stages of germination and growth, the transcript abundance data was compared with previously published protein abundance data during Arabidopsis germination (Fu et al., 2005; Gallardo et al., 2002; Narsai et al., 2011). A “presence/absence” system was used as a quantitative measure of protein expression and in this way, it was seen that ~80% of the 117 unique proteins profiled (Fu et al., 2005) showed comparable trends to the respective transcriptomic profiles (Narsai, et al., 2011). In order to gain further insight into this relationship, quantitative mass spectrometry analysis was carried out on 4 time points during germination (dry seed, 48 h S, 6 h SL and 48 h SL) and it was seen that ~45% of the 178 proteins analysed showed statistically significant correla-tions between transcript and protein abundance during germination, with many of the most significant correlations seen for genes (and respective proteins) encoding ribosomal proteins and seed storage pro-teins (Law et al., 2012; Narsai et al., 2011). Given that specific and rapid regulation of transcripts encod-ing mitochondrial proteins was seen during Arabidopsis germination, a more focussed study, specifically examining mitochondrial biogenesis was carried out, revealing a roles for nucleotide and RNA metabo-lism in the mitochondria early during germination (Law et al., 2012). Similar to the findings during rice germination (Howell et al., 2006), in Arabidopsis, it was seen that the transcript and protein abundance changes for respiratory chain components showed significant positive correlations, whilst the opposite was observed for import components during germination (Law et al., 2012). While high-throughput quantitative mass spectrometry for the analysis of protein abundances is desirable, the current depth of

coverage, accuracy of quantitation, and the technical difficulty in obtaining such data means that it is lim-ited in large scale projects compared to transcriptomics. Approaches to quantitation such as multiple re-action monitoring-based (MRM) techniques may prove more fruitful in the future (Kuzyk et al., 2009; Miller et al., 2010).

In addition to proteomics, comparison of transcriptomic and metabolomic data in parallel has also been useful in giving insight to the functional relationship between the observed transcriptomic changes and downstream metabolomics effects. For example, in response to hypoxic conditions in Arabidopsis, rice and poplar, it was seen that while the molecular responses showed some conservation at the metabo-lomic level, while the transcriptomic profiles varied in response to hypoxia (Narsai, Rocha, et al., 2011). Unlike microarrays, metabolite profiling is genome independent, meaning metabolite profiling can be carried out without any knowledge of the genome sequence. To date, most metabolomic studies in plants have been carried out in Arabidopsis, as the techniques for extraction and analysis have been optimised for Arabidopsis over the years. However, techniques for metabolite profiling are constantly improving, with recent studies presenting improved methodologies and case studies of metabolomic analysis in other plant species as well as Arabidopsis (Fernie et al., 2011; Tohge et al., 2011).

6 Useful microarray resources and plant databases

In last decade, since the first arrival of microarrays, the tools for microarray analysis have significantly improved to the extent where analyses can be carried out in a matter of hours. For plant research, there are several online tools and resources that greatly facilitated the various transcriptomic analyses of ger-mination in plants (Howell et al., 2009; Narsai et al., 2009; Narsai et al., 2011; Sreenivasulu et al., 2008).

6.1 Microarray analysis

For Affymetrix microarrays, free Affymetrix software such as the Gene Expression Console and the Bio-conductor package in R allows the pre-processing steps and quality checks to be carried out. In addition, the Bioconductor packages in R can also be used for normalisation (both MAS5 and GC-RMA) and dif-ferential expression analysis (using the LIMMA package) (Smyth, 2005).

6.2 Sequence information

For Arabidopsis, TAIR houses the Arabidopsis genome sequence and provides a suite of tools useful for microarray analysis, as well as enabling the bulk download of sequence information, such as the up-stream regions of genes (TAIR; www.arabidopsis.org). For rice, the MSU database and RAP database both provide the rice genome sequence (also enabling the upstream sequences to be downloaded). The Gramene database is also a useful database allowing comparative analysis between plant species and provides sequence information for a number of plant species including rice, maize, Arabidopsis and oth-ers (http://www.gramene.org). Apart from these specific plant databases, it is also possible to gain se-quence information for these and other plant species from NCBI (http://www.ncbi.nlm.nih.gov/guide). As mentioned earlier, tools such as MEME (http://meme.sdsc.edu/meme/intro.html) (Bailey et al., 2006) and RSAT (http://rsat.ulb.ac.be) (Thomas-Chollier et al., 2008) can also be used for the identification of over-represented sequence motifs.

6.3 Visualisation and differential expression analysis tools

As demonstrated in Figure 3A, it is difficult or even impossible to gain insight into trends or expression changes without some form of grouping, such as clustering or differential expression analysis. Although the clustering seen in Figure 3 were carried out using Partek Genomics Suite (version 6.5), several other tools will generate gene expression clusters, including MeV (http://www.tm4.org/mev) (Saeed et al., 2006) or the Heatplus Package in R, which are both open source freeware.

Similarly, for differential expression analysis, the online Cyber-T tool (http://cybert.microarray. ics.uci.edu) can be used to calculate p-values and false discovery rates. For the Arabidopsis germinate study, Cyber-T software was used for differential expression analysis including the PPDE analysis for false discovery rate correction (Narsai et al., 2011). Differential expression analysis and false discovery rate correction can also be carried out using the LIMMA and Qvalue packages in R (Smyth, 2005). Nota-bly, there are several methods that can also be used false discovery rate correction, including the Benja-mini-Hochberg false discovery rate correction (Benjamini & Hochberg, 1995), which was used in the rice germination study (Howell et al., 2009). For Arabidopsis, rice and a number of other plant species, fold-changes generated from the differential expression analysis can also be used with Mapman software (Usadel et al., 2005) to visualise fold-changes of specific genes mapped onto an image of a single path-way or functional category in greater detail.

6.4 Functional over-representation analysis

Once gene-sets have been identified based on clustering and/or differential expression analysis, determin-ing whether there is any relationship between these groupings and the function of the encoded proteins is useful to gain insight into function and regulation. For Arabidopsis, rice and a number of other plant spe-cies, fold-changes generated from the differential expression analysis can be used with the Pageman software (Usadel et al., 2006) to carry out an over-representation analysis. However, it is important to note that even if it is not possible to use Pageman (e.g. for non-plant species or plant species without completed genome sequences), the statistical principles used within Pageman can still be applied i.e. cal-culation of z-scores based on the occurrence of a specific function within a sub-set in comparison the occurrence of that function in the genome. A example of how z-scores were used (without Pageman) is shown in Table 2. In this case, we aimed to determine whether any of the GO functional categories were over-represented, specifically, in terms of the sub-cellular localisation (GO cellular component) (Narsai et al., 2011).

6.5 Other visualisation tools

In addition to tools that allow visualisation and analysis of data generated within a given experiment, a number of tools are also available that visualise expression levels for single or multiple genes in a collec-tion of normalised publically available datasets. Specifically, using the raw microarray data from GEO (http://www.ncbi.nlm.nih.gov/geo) and ArrayExpress (http://www.ebi.ac.uk/arrayexpress), both the eFP browser (http://bar.utoronto.ca/welcome.htm) (Winter et al., 2007) and Genevestigator (https://www. genevestigator.com/gv) (Zimmermann et al., 2005) are tools that enable users to enter gene identifiers for the species of interest and view normalised expression levels over development and/or in response to stress.

6.6 Phenotype databases and other resources

Since the completion of the Arabidopsis genome, many studies have generated and examined loss-of-function mutants on a larger scale. For example, the number of genes seen to result in an embryo lethal phenotype is now over 400 and these genes can be searched at the SeedGenes Database (www.seedgenes.org). Although these are in an online database, it is important to note that there are a number of other studies that have determined phenotypes following loss-of-function mutation analyses; however, given that these are not presented or collated into an online database, these have to be searched individually by publication. For example, a recent study showed that the loss-of-function of an endo-sperm specific gene for a putative xyloglucan endotransglycosylase/hydrolase resulted in earlier germina-tion and suggested that the protein may have a specific role in the cell wall of the endosperm during ger-mination (Endo et al., 2012). In addition to these types of single publication searches, other databases providing phenotype information for antisense/mutant plants in Arabidopsis include the Agrikola data-base (Hilson et al., 2004) and Chloroplast 2010 project database (Ajjawi et al., 2010).

7 Next-generation transcriptomics

As mentioned in the introduction to transcriptomics above, early transcriptomic studies enabled the quan-titative analysis for a single transcript species (e.g. using Northern blot analysis). Sometime later, this was following by dot-blot and high-throughput qRT-PCR before the completed genome sequences led to the development and success of microarrays. In the last decade, the advantage of microarrays has clearly been shown by the thousands of publications that have successfully gained information using this tech-nology that allows almost a genome-wide analysis of transcript abundance for known genes in the ge-nome. However, as with all technologies, there are a number of shortcoming and pitfalls including the fact that microarrays generally have probes that are designed to be 3 prime bias, which can have an influ-ence on accurate quantitation for specific genes. In addition, the probesets on microarrays are typically designed to cover most (but not all) of the genes in the genome. For example, the Ath1 microarray allows quantitation of ~21,000 of the 31,000 genes in the Arabidopsis genome, indicating that one third of genes cannot be quantitated using this microarray. While tiling arrays and other array types have improved on this, allowing quantitation for all ~31,000 genes in the genome, these arrays cannot accurately detect transcripts that are not annotated as genes on the genome sequence.

In the last 5 years, significant improvement of sequencing technologies has seen substantial im-provements in the accuracy and cost-effectiveness of RNA sequencing. In comparison with microarrays, RNA sequencing is not limited by the same factors. Specifically, it is not restricted to the known genome sequence and allows quantitation of all transcripts, regardless of whether the gene has been annotated on the genome sequence. Furthermore, it is not limited by the genomes that have been sequenced i.e. de no-vo transcriptomic studies can be done on any species. In addition, while a limited number of microarrays have been designed to allow quantitation of small RNAs in an mRNA pool, newer technologies in RNA sequencing also allow quantitative analysis of small RNAs in this way. Lastly, the analysis of DNA methylation by sequencing can be highly informative when analysed in parallel with RNA sequencing data. For example, a recent study compared the DNA methylome with the transcriptome in rice, using bisulfite sequencing and RNA sequencing and identified relationships between epigenetic heritability and the expression patterns of specific genes (Chodavarapu et al., 2012). Given the improvements in cost-

effectiveness and data analysis technologies and methodologies, RNA sequencing appears to be the next generation of transcriptomic research.

References

Ajjawi, I., Lu, Y., Savage, L. J., Bell, S. M., & Last, R. L. (2010). Large-scale reverse genetics in Arabidopsis: case studies from the Chloroplast 2010 Project. Plant Physiol, 152(2), 529-540. doi: pp.109.148494 [pii] 10.1104/pp.109.148494

Allison, D. B., Cui, X., Page, G. P., & Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nature reviews. Genetics, 7(1), 55-65. doi: 10.1038/nrg1749

An, Y. Q., & Lin, L. (2011). Transcriptional regulatory programs underlying barley germination and regulatory functions of Gibberellin and abscisic acid. BMC plant biology, 11, 105. doi: 10.1186/1471-2229-11-105

Arabidopsis Genome Initiative. (2000). Nature, 408(6814), 796-815.

Bailey, T. L., Williams, N., Misleh, C., & Li, W. W. (2006). MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res, 34(Web Server issue), W369-373.

Bateman, R. M., Hilton, J., & Rudall, P. J. (2006). Morphological and molecular phylogenetic context of the angiosperms: contrasting the 'top-down' and 'bottom-up' approaches used to infer the likely characteristics of the first flowers. J Exp Bot, 57(13), 3471-3503.

Benjamini, Y., & Hochberg, Y. (1995). Controlling false discovery rate: A practical and powerful approach to multiple testing. . J. R. Stat. Soc. Series B 57, 289-300.

Bewley, J. D. (1997). Seed Germination and Dormancy. Plant Cell, 9(7), 1055-1066.

Carrie, C., Giraud, E., & Whelan, J. (2009). Protein transport in organelles: Dual targeting of proteins to mitochondria and chloroplasts. The FEBS journal, 276(5), 1187-1195. doi: 10.1111/j.1742-4658.2009.06876.x

Chiu, R. S., Nahal, H., Provart, N. J., & Gazzarrini, S. (2012). The role of the Arabidopsis FUSCA3 transcription factor during inhibition of seed germination at high temperature. BMC plant biology, 12, 15. doi: 10.1186/1471-2229-12-15

Chodavarapu, R. K., Feng, S., Ding, B., Simon, S. A., Lopez, D., Jia, Y., . . . Pellegrini, M. (2012). Transcriptome and methylome interactions in rice hybrids. Proceedings of the National Academy of Sciences of the United States of America, 109(30), 12040-12045. doi: 10.1073/pnas.1209297109

Davuluri, R.V., Sun, H., Palaniswamy, S.K., Matthews, N., Molina, C., Kurtz, M., & Grotewold, E. (2003) AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics, 4: 25 doi: 10.1186/1471-2105-4-25

Duncan, O., Taylor, N. L., Carrie, C., Eubel, H., Kubiszewski-Jakubiak, S., Zhang, B., . . . Whelan, J. (2011). Multiple lines of evidence localize signaling, morphology, and lipid biosynthesis machinery to the mitochondrial outer membrane of Arabidopsis. Plant physiology, 157(3), 1093-1113. doi: 10.1104/pp.111.183160

Endo, A., Tatematsu, K., Hanada, K., Duermeyer, L., Okamoto, M., Yonekura-Sakakibara, K., . . . Nambara, E. (2012). Tissue-specific transcriptome analysis reveals cell wall metabolism, flavonol biosynthesis and defense responses are activated in the endosperm of germinating Arabidopsis thaliana seeds. Plant & cell physiology, 53(1), 16-27. doi: 10.1093/pcp/pcr171

Fernie, A. R., Aharoni, A., Willmitzer, L., Stitt, M., Tohge, T., Kopka, J., . . . DeLuca, V. (2011). Recommendations for reporting metabolite data. [Letter]. The Plant cell, 23(7), 2477-2482. doi: 10.1105/tpc.111.086272

Fu, Q., Wang, B. C., Jin, X., Li, H. B., Han, P., Wei, K. H., . . . Zhu, Y. X. (2005). Proteomic analysis and extensive protein identification from dry, germinating Arabidopsis seeds and young seedlings. J Biochem Mol Biol, 38(6), 650-660.

Gallardo, K., Job, C., Groot, S. P., Puype, M., Demol, H., Vandekerckhove, J., & Job, D. (2002). Proteomics of Arabidopsis seed germination. A comparative study of wild-type and gibberellin-deficient seeds. Plant Physiol, 129(2), 823-837. doi: 10.1104/pp.002816

Galuschka, C., Schindler, M., Bulow, L., & Hehl, R. (2007). AthaMap web tools for the analysis and identification of co-regulated genes. Nucleic acids research, 35(Database issue), D857-862. doi: 10.1093/nar/gkl1006

Goff, S. A., Ricke, D., Lan, T. H., Presting, G., Wang, R., Dunn, M., . . . Briggs, S. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 296(5565), 92-100. doi: 10.1126/science.1068275

Gray, J. (1985). The Microfossil Record of Early Land Plants: Advances in Understanding of Early Terrestrialization, 1970-1984. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences (1934-1990) 309(1138), 167-195.

Hansen, M., Friis, C., Bowra, S., Holm, P. B., & Vincze, E. (2009). A pathway-specific microarray analysis highlights the complex and co-ordinated transcriptional networks of the developing grain of field-grown barley. Journal of experimental botany, 60(1), 153-167. doi: 10.1093/jxb/ern270

Hilson, P., Allemeersch, J., Altmann, T., Aubourg, S., Avon, A., Beynon, J., . . . Small, I. (2004). Versatile gene-specific sequence tags for Arabidopsis functional genomics: transcript profiling and reverse genetics applications. Genome Res, 14(10B), 2176-2189. doi: 14/10b/2176 [pii] 10.1101/gr.2544504

Holdsworth, M. J., Bentsink, L., & Soppe, W. J. (2008). Molecular networks regulating Arabidopsis seed maturation, after-ripening, dormancy and germination. New Phytol, 179(1), 33-54.

Howell, K. A., Millar, A. H., & Whelan, J. (2006). Ordered assembly of mitochondria during rice germination begins with pro-mitochondrial structures rich in components of the protein import apparatus. Plant Mol Biol, 60(2), 201-223.

Howell, K. A., Narsai, R., Carroll, A., Ivanova, A., Lohse, M., Usadel, B., . . . Whelan, J. (2009). Mapping metabolic and transcript temporal switches during germination in rice highlights specific transcription factors and the role of RNA instability in the germination process. Plant physiology, 149(2), 961-980. doi: 10.1104/pp.108.129874

Jain, M., Nijhawan, A., Arora, R., Agarwal, P., Ray, S., Sharma, P., . . . Khurana, J. P. (2007). F-box proteins in rice. Genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development, and regulation by light and abiotic stress. Plant Physiol, 143(4), 1467-1483.

Kayala, M. A., & Baldi, P. (2012). Cyber-T web server: differential analysis of high-throughput data. Nucleic acids research, 40(Web Server issue), W553-559. doi: 10.1093/nar/gks420

Kumar, M. B., Martin, R. C., & Nonogaki, H. (2011). Isolation of microRNAs that regulate seed dormancy and germination. Methods in molecular biology, 773, 199-213. doi: 10.1007/978-1-61779-231-1_13

Kuzyk, M. A., Smith, D., Yang, J., Cross, T. J., Jackson, A. M., Hardie, D. B., . . . Borchers, C. H. (2009). Multiple reaction monitoring-based, multiplexed, absolute quantitation of 45 proteins in human plasma. Molecular & cellular proteomics : MCP, 8(8), 1860-1877. doi: 10.1074/mcp.M800540-MCP200

Lanfrancotti, A., Bertuccini, L., Silvestrini, F., & Alano, P. (2007). Plasmodium falciparum: mRNA co-expression and protein co-localisation of two gene products upregulated in early gametocytes. Experimental parasitology, 116(4), 497-503. doi: 10.1016/j.exppara.2007.01.021

Law, S. R., Narsai, R., Taylor, N. L., Delannoy, E., Carrie, C., Giraud, E., . . . Whelan, J. (2012). Nucleotide and RNA metabolism prime translational initiation in the earliest events of mitochondrial biogenesis during Arabidopsis germination. Plant physiology, 158(4), 1610-1627. doi: 10.1104/pp.111.192351

Lu, G., Wu, X., Chen, B., Gao, G., Xu, K., & Li, X. (2006). Detection of DNA methylation changes during seed germination in rapeseed (Brassica napus). Chinese Science Bulletin, 51(2), 182-190.

Ma, L., Chen, C., Liu, X., Jiao, Y., Su, N., Li, L., . . . Wang, J. (2005). A microarray analysis of the rice transcriptome and its comparison to Arabidopsis. Genome Res, 15(9), 1274-1283.

Maia, J., Dekkers, B. J., Provart, N. J., Ligterink, W., & Hilhorst, H. W. (2011). The re-establishment of desiccation tolerance in germinated Arabidopsis thaliana seeds and its associated transcriptome. PLoS ONE, 6(12), e29123. doi: 10.1371/journal.pone.0029123

Meinke, D., Muralla, R., Sweeney, C., & Dickerman, A. (2008). Identifying essential genes in Arabidopsis thaliana. Trends Plant Sci, 13(9), 483-491. doi: S1360-1385(08)00195-7 [pii] 10.1016/j.tplants.2008.06.003

Merchant, S. S., Prochnik, S. E., Vallon, O., Harris, E. H., Karpowicz, S. J., Witman, G. B., . . . Grossman, A. R. (2007). The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science, 318(5848), 245-250. doi: 10.1126/science.1143609

Miller, C., Waddell, K., & Tang, N. (2010). High Throughput Protein Quantitation using MRM Viewer Software and Dynamic MRM on a Triple Quadruple Mass Spectrometer. Journal of Biomolecular Techniques, 21(3), S60.

Nakabayashi, K., Okamoto, M., Koshiba, T., Kamiya, Y., & Nambara, E. (2005). Genome-wide profiling of stored mRNA in Arabidopsis thaliana seed germination: epigenetic and genetic regulation of transcription in seed. Plant J, 41(5), 697-709.

Narsai, R., Howell, K. A., Carroll, A., Ivanova, A., Millar, A. H., & Whelan, J. (2009). Defining core metabolic and transcriptomic responses to oxygen availability in rice embryos and young seedlings. Plant physiology, 151(1), 306-322. doi: 10.1104/pp.109.142026

Narsai, R., Howell, K. A., Millar, A. H., O'Toole, N., Small, I., & Whelan, J. (2007). Genome-wide analysis of mRNA decay rates and their determinants in Arabidopsis thaliana. The Plant cell, 19(11), 3418-3436. doi: 10.1105/tpc.107.055046

Narsai, R., Law, S. R., Carrie, C., Xu, L., & Whelan, J. (2011). In-depth temporal transcriptome profiling reveals a crucial developmental switch with roles for RNA processing and organelle metabolism that are essential for germination in Arabidopsis. Plant physiology, 157(3), 1342-1362. doi: 10.1104/pp.111.183129

Narsai, R., Rocha, M., Geigenberger, P., Whelan, J., & van Dongen, J. T. (2011). Comparative analysis between plant species of transcriptional and metabolic responses to hypoxia. The New phytologist, 190(2), 472-487. doi: 10.1111/j.1469-8137.2010.03589.x

Nettleton, D. (2006). A discussion of statistical methods for design and analysis of microarray experiments for plant scientists. The Plant cell, 18(9), 2112-2121. doi: 10.1105/tpc.106.041616

Ohme-Takagi, M., Taylor, C. B., Newman, T. C., & Green, P. J. (1993). The effect of sequences with high AU content on mRNA stability in tobacco. Proc Natl Acad Sci U S A, 90(24), 11811-11815.

Pagnussat, G. C., Yu, H. J., Ngo, Q. A., Rajani, S., Mayalagu, S., Johnson, C. S., . . . Sundaresan, V. (2005). Genetic and molecular identification of genes required for female gametophyte development and function in Arabidopsis. Development, 132(3), 603-614. doi: dev.01595 [pii] 10.1242/dev.01595

Rajjou, L., Gallardo, K., Debeaujon, I., Vandekerckhove, J., Job, C., & Job, D. (2004). The effect of alpha-amanitin on the Arabidopsis seed proteome highlights the distinct roles of stored and neosynthesized mRNAs during germination. Plant Physiol, 134(4), 1598-1613.

Rensing, S. A., Lang, D., Zimmer, A. D., Terry, A., Salamov, A., Shapiro, H., . . . Boore, J. L. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants.

Science, 319(5859), 64-69. doi: 10.1126/science.1150646

Rueda-Romero, P., Barrero-Sicilia, C., Gomez-Cadenas, A., Carbonero, P., & Onate-Sanchez, L. (2012). Arabidopsis thaliana DOF6 negatively affects germination in non-after-ripened seeds and interacts with TCP14. Journal of experimental botany, 63(5), 1937-1949. doi: 10.1093/jxb/err388

Saeed, A. I., Bhagabati, N. K., Braisted, J. C., Liang, W., Sharov, V., Howe, E. A., . . . Quackenbush, J. (2006). TM4 microarray software suite. [Review]. Methods in enzymology, 411, 134-193. doi: 10.1016/S0076-6879(06)11009-5

Schmid, M., Davison, T. S., Henz, S. R., Pape, U. J., Demar, M., Vingron, M., . . . Lohmann, J. U. (2005). A gene expression map of Arabidopsis thaliana development. Nature genetics, 37(5), 501-506. doi: 10.1038/ng1543

Schmutz, J., Cannon, S. B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., . . . Jackson, S. A. (2010). Genome sequence of the palaeopolyploid soybean. Nature, 463(7278), 178-183. doi: 10.1038/nature08670

Schnable, P. S., Ware, D., Fulton, R. S., Stein, J. C., Wei, F., Pasternak, S., . . . Wilson, R. K. (2009). The B73 maize genome: complexity, diversity, and dynamics. Science, 326(5956), 1112-1115. doi: 10.1126/science.1178534

Shannon, W., Culverhouse, R., & Duncan, J. (2003). Analyzing microarray data using cluster analysis. [Review]. Pharmacogenomics, 4(1), 41-52. doi: 10.1517/phgs.4.1.41.22581

Smyth, G. K. (2005). Limma: Linear Models for microarray data. In R. Gentleman, V. Carey, S. Dudoit, R. Irizarry & W. Huber (Eds.), Bioinformatics and Computational Biology Solutions using R and Bioconductor. New York: Springer.

Sreenivasulu, N., Usadel, B., Winter, A., Radchuk, V., Scholz, U., Stein, N., . . . Wobus, U. (2008). Barley grain maturation and germination: metabolic pathway and regulatory network commonalities and differences highlighted by new MapMan/PageMan profiling tools. Plant Physiol, 146(4), 1738-1758.

Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., . . . Golub, T. R. (1999). Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A, 96(6), 2907-2912.

Thomas-Chollier, M., Sand, O., Turatsinze, J. V., Janky, R., Defrance, M., Vervisch, E., . . . van Helden, J. (2008). RSAT: regulatory sequence analysis tools. Nucleic Acids Res, 36(Web Server issue), W119-127.

Tohge, T., Mettler, T., Arrivault, S., Carroll, A. J., Stitt, M., & Fernie, A. R. (2011). From models to crop species: caveats and solutions for translational metabolomics. Frontiers in plant science, 2, 61. doi: 10.3389/fpls.2011.00061

Tuskan, G. A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., . . . Rokhsar, D. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, 313(5793), 1596-1604. doi: 10.1126/science.1128691

Usadel, B., Nagel, A., Steinhauser, D., Gibon, Y., Blasing, O. E., Redestig, H., . . . Stitt, M. (2006). PageMan: an interactive ontology tool to generate, display, and annotate overview graphs for profiling experiments. BMC Bioinformatics, 7, 535.

Usadel, B., Nagel, A., Thimm, O., Redestig, H., Blaesing, O. E., Palacios-Rojas, N., . . . Stitt, M. (2005). Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of corresponding genes, and comparison with known responses. Plant Physiol, 138(3), 1195-1204.

Wang, K., Wang, Z., Li, F., Ye, W., Wang, J., Song, G., . . . Yu, S. (2012). The draft genome of a diploid cotton Gossypium raimondii. Nature genetics. doi: 10.1038/ng.2371

Winter, D., Vinegar, B., Nahal, H., Ammar, R., Wilson, G. V., & Provart, N. J. (2007). An "electronic fluorescent pictograph" browser for exploring and analyzing large-scale biological data sets. PLoS ONE, 2(1), e718.

Zimmermann, P., Hennig, L., & Gruissem, W. (2005). Gene-expression analysis and network discovery using Genevestigator. Trends in plant science, 10(9), 407-409. doi: 10.1016/j.tplants.2005.07.003