cell reports resource - yale school of medicine

36
Cell Reports Resource Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the Mouse Neocortex Sofia Fertuzinhos, 1,4 Mingfeng Li, 1,4 Yuka Imamura Kawasawa, 1,2 Vedrana Ivic, 1 Daniel Franjic, 1 Darshani Singh, 1 Michael Crair, 1,3 and Nenad Sestan 1,3, * 1 Department of Neurobiology, Yale School of Medicine, New Haven, CT 06510, USA 2 Department of Pharmacology, Penn State College of Medicine, Hershey, PA 17033, USA 3 Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA 4 These authors contributed equally to this work *Correspondence: [email protected] http://dx.doi.org/10.1016/j.celrep.2014.01.036 This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-No Derivative Works License, which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. SUMMARY The hallmark of the cerebral neocortex is its organi- zation into six layers, each containing a character- istic set of cell types and synaptic connections. The transcriptional events involved in laminar develop- ment and function still remain elusive. Here, we employed deep sequencing of mRNA and small RNA species to gain insights into transcriptional dif- ferences among layers and their temporal dynamics during postnatal development of the mouse primary somatosensory neocortex. We identify a number of coding and noncoding transcripts with specific spatiotemporal expression and splicing patterns. We also identify signature trajectories and gene coexpression networks associated with distinct biological processes and transcriptional overlap between these processes. Finally, we provide data that allow the study of potential miRNA and mRNA interactions. Overall, this study provides an inte- grated view of the laminar and temporal expression dynamics of coding and noncoding transcripts in the mouse neocortex and a resource for studies of neurodevelopment and transcriptome. INTRODUCTION The cerebral neocortex (NCX) is stereotypically organized into six distinct layers. Both glutamatergic excitatory projection (aka pyramidal) neurons and GABAergic inhibitory neurons display laminar variations in their morphological, molecular, and functional properties (DeFelipe et al., 2013; Kwan et al., 2012; Leone et al., 2008; Molyneaux et al., 2007). The proper development and function of cortical neurons depend on glial cells and the neurovascular system, whose distribution also ap- pears to vary across layers and areas (Fonta and Imbert, 2002). The formation of layers occurs progressively and requires the orchestrated execution of a series of developmental events. These events include the migration of young neurons into appro- priate positions within the emerging NCX and development of specific neuronal dendritic arbors and axonal projections (Kwan et al., 2012; Leone et al., 2008; Molyneaux et al., 2007), generation and maturation of glial cells (Rowitch and Kriegstein, 2010), development of the neurovascular system (Tam and Watts, 2010), emergence of early spontaneous activity and experience-driven activity (Kilb et al., 2011), and synaptogenesis (West and Greenberg, 2011) and circuit refinement (Espinosa and Stryker, 2012). The formation of cortical layers occurs in an inside out manner, with the deep layers (L) 5 and 6 (infragra- nular layers [IgLs]) being formed first, followed by L4 (granular layer due to the presence of small-sized stellate and pyramidal neurons), and finally the superficial L2/3 (supragranular layers [SgLs]). Studies of transcriptional events involved in the development and function of neocortical layers have been greatly advanced with the emergence of high-throughput transcriptome-profiling techniques. A number of studies have analyzed the transcrip- tome of different mouse neocortical layers and/or areas at spe- cific developmental time points (Arlotta et al., 2005; Belgard et al., 2011; Chen et al., 2005; Dillman et al., 2013; Han et al., 2011; Lein et al., 2007; Lyckman et al., 2008; Rossner et al., 2006; Sugino et al., 2006). Also, these studies have largely focused on the expression of protein-coding mRNA, providing limited information on noncoding RNAs (ncRNAs), which play an important role in neural development and function (McNeill and Van Vactor, 2012). In an attempt to profile the spatiotem- poral transcriptome dynamics of both coding and ncRNA tran- scripts, we deep sequenced mRNA (mRNA-seq hereafter) and small ncRNA (smRNA-seq hereafter) transcripts from the IgL, L4, and SgL of the mouse somatosensory cortex (S1C hereafter) across multiple early postnatal time points and adult. After testing different RNA collection methods such as laser capture microdissection and fluorescence-activated cell sorting (data not shown), we opted to microdissect distinct cortical layers from tissue sections of the Dcdc2a-Gfp transgenic mouse (Heintz, 2004), which expressed GFP in L4 of the S1C. This approach allowed us to distinguish IgL, L4, and SgL across different time points, sequence transcripts expressed in all Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 1 Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the Mouse Neocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

Upload: others

Post on 21-Jul-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cell Reports Resource - Yale School of Medicine

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

Cell Reports

Resource

Laminar and Temporal Expression Dynamics of Codingand Noncoding RNAs in the Mouse NeocortexSofia Fertuzinhos,1,4 Mingfeng Li,1,4 Yuka Imamura Kawasawa,1,2 Vedrana Ivic,1 Daniel Franjic,1 Darshani Singh,1

Michael Crair,1,3 and Nenad �Sestan1,3,*1Department of Neurobiology, Yale School of Medicine, New Haven, CT 06510, USA2Department of Pharmacology, Penn State College of Medicine, Hershey, PA 17033, USA3Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA4These authors contributed equally to this work*Correspondence: [email protected]

http://dx.doi.org/10.1016/j.celrep.2014.01.036

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-No Derivative WorksLicense, which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are

credited.

SUMMARY

The hallmark of the cerebral neocortex is its organi-zation into six layers, each containing a character-istic set of cell types and synaptic connections. Thetranscriptional events involved in laminar develop-ment and function still remain elusive. Here, weemployed deep sequencing of mRNA and smallRNA species to gain insights into transcriptional dif-ferences among layers and their temporal dynamicsduring postnatal development of the mouse primarysomatosensory neocortex. We identify a numberof coding and noncoding transcripts with specificspatiotemporal expression and splicing patterns.We also identify signature trajectories and genecoexpression networks associated with distinctbiological processes and transcriptional overlapbetween these processes. Finally, we provide datathat allow the study of potential miRNA and mRNAinteractions. Overall, this study provides an inte-grated view of the laminar and temporal expressiondynamics of coding and noncoding transcripts inthe mouse neocortex and a resource for studies ofneurodevelopment and transcriptome.

INTRODUCTION

The cerebral neocortex (NCX) is stereotypically organized into

six distinct layers. Both glutamatergic excitatory projection

(aka pyramidal) neurons and GABAergic inhibitory neurons

display laminar variations in their morphological, molecular,

and functional properties (DeFelipe et al., 2013; Kwan et al.,

2012; Leone et al., 2008; Molyneaux et al., 2007). The proper

development and function of cortical neurons depend on glial

cells and the neurovascular system, whose distribution also ap-

pears to vary across layers and areas (Fonta and Imbert, 2002).

The formation of layers occurs progressively and requires the

orchestrated execution of a series of developmental events.

These events include the migration of young neurons into appro-

priate positions within the emerging NCX and development

of specific neuronal dendritic arbors and axonal projections

(Kwan et al., 2012; Leone et al., 2008; Molyneaux et al., 2007),

generation and maturation of glial cells (Rowitch and Kriegstein,

2010), development of the neurovascular system (Tam and

Watts, 2010), emergence of early spontaneous activity and

experience-driven activity (Kilb et al., 2011), and synaptogenesis

(West and Greenberg, 2011) and circuit refinement (Espinosa

and Stryker, 2012). The formation of cortical layers occurs in

an inside out manner, with the deep layers (L) 5 and 6 (infragra-

nular layers [IgLs]) being formed first, followed by L4 (granular

layer due to the presence of small-sized stellate and pyramidal

neurons), and finally the superficial L2/3 (supragranular layers

[SgLs]).

Studies of transcriptional events involved in the development

and function of neocortical layers have been greatly advanced

with the emergence of high-throughput transcriptome-profiling

techniques. A number of studies have analyzed the transcrip-

tome of different mouse neocortical layers and/or areas at spe-

cific developmental time points (Arlotta et al., 2005; Belgard

et al., 2011; Chen et al., 2005; Dillman et al., 2013; Han et al.,

2011; Lein et al., 2007; Lyckman et al., 2008; Rossner et al.,

2006; Sugino et al., 2006). Also, these studies have largely

focused on the expression of protein-coding mRNA, providing

limited information on noncoding RNAs (ncRNAs), which play

an important role in neural development and function (McNeill

and Van Vactor, 2012). In an attempt to profile the spatiotem-

poral transcriptome dynamics of both coding and ncRNA tran-

scripts, we deep sequenced mRNA (mRNA-seq hereafter) and

small ncRNA (smRNA-seq hereafter) transcripts from the IgL,

L4, and SgL of the mouse somatosensory cortex (S1C hereafter)

across multiple early postnatal time points and adult. After

testing different RNA collection methods such as laser capture

microdissection and fluorescence-activated cell sorting (data

not shown), we opted to microdissect distinct cortical layers

from tissue sections of the Dcdc2a-Gfp transgenic mouse

(Heintz, 2004), which expressed GFP in L4 of the S1C. This

approach allowed us to distinguish IgL, L4, and SgL across

different time points, sequence transcripts expressed in all

Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 1

Page 2: Cell Reports Resource - Yale School of Medicine

Figure 1. Study Design and Quality Control Measures(A) Representative sagittal tissue section of the Dcdc2a-Gfpmouse forebrain showingGfp expression in L4 of the primary S1C. Dashed lines outline the IgLs, L4,

and SgLs. HIP, hippocampus.

(B) qRT-PCR analysis of the expression of well-established layer-enriched genes. Error bars show the SEM.

(C) Gfp expression across layers

(D) Expression of laminar markers depicted in a heatmap of the log ratio RPKM data.

(E) Box plots representing uniquely mapped reads for either miRNA or mRNA transcriptomes in each sample.

(F) Violin plots representing the distribution of the transcribed ratios of the genome (black) and the transcriptome (gray). chr, chromosome; Fchr, female chro-

mosome; Mchr, male chromosome; chrM, mitochondrial chromosome.

(G) Violin plots representing percent distribution of smRNA reads across different length of reads.

See also Figure S1 and Table S1.

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

neural and nonneural cell types present in these layers in vivo,

and analyze transcripts present not only in cell somata but

also those localized to neuronal dendrites and axons (Hengst

and Jaffrey, 2007). Furthermore, this approach does not expose

cells to substantial chemical and mechanical manipulations

or stress, which can distort RNA integrity and transcriptional

states (Okaty et al., 2011). Our initial data analysis provides

functionally relevant insights into transcriptome dynamics of

NCX layers and their relationship to specific neurodevelopmental

processes.

RESULTS

Study Design, Data Generation, and Quality AssessmentIn order to obtain a global and unbiased view of the transcrip-

tional dynamics in the postnatal NCX, we analyzed the transcrip-

tome of the IgL, L4, and SgL from the S1C of mouse brain at

2 Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors

postnatal day 4 (P4) P6, P8, P10, P14, and P180 (adult). All

experiments using animals were carried out in accordance with

a protocol approved by Yale University’s Committee on Animal

Research and NIH guidelines. To delineate the layers, we used

the Dcdc2a-Gfp reporter mouse that expressed GFP selectively

in L4 of the S1C starting from around P2 (Figure 1A). We devel-

oped a microdissection protocol that lasted less than 2 hr and

resulted in high yield and quality of RNA (RNA integrity number

greater than eight) (Supplemental Experimental Procedures;

Table S1A).

We extracted total RNA from laminar samples microdissected

from twomouse brains (onemale and one female) per time point,

for a total of 12 mice and 36 samples (Table S1A). We analyzed

the expression of several known layer-specific markers by

quantitative real-time PCR to verify the accuracy of our laminar

microdissection (Figure 1B). The mRNA-seq and smRNA-seq

libraries, containing spike-in RNAs to tag samples and assess

Page 3: Cell Reports Resource - Yale School of Medicine

Figure 2. Spatiotemporal Dynamics of

Mouse Neocortical Transcriptome

(A and B) PCAs of mRNA (A) and smRNA (B)

transcriptomes.

(C and D) Venn diagrams representing the number

of DEX protein-codingmRNAs (C) andmiRNAs (D).

See also Figures S2 and S3 and Table S2.

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

the quality of sequencing, were prepared according to manufac-

turer’s instructions (Table S1B). On average, we observed less

than 2% of mismatches per read, indicative of high-quality

sequenced reads (Figure S1A). Because we used a GFP reporter

mouse, we mapped the mRNA-seq reads to the Gfp sequence

and confirmed that L4 samples had consistently higher RPM

(reads per million sequenced reads) values (Figure 1C). In

addition, a list of known and well-characterized layer-specific

marker genes was used to verify the identity of the samples after

mRNA-seq (Figures 1D, S1B, and S1C).

We obtained 611 million and 143million single-end reads from

the mRNA-seq and smRNA-seq samples, respectively. We

found that approximately 431 million reads from the mRNA-

seq libraries uniquely matched to mouse reference genome,

i.e., an average of >10 million reads per sample, with no partic-

ular laminar bias (Figures 1E and S1D; Table S1A). On average,

only 3% of each autosomal chromosome was transcribed,

slightly lower in sex chromosomes, whereas most of the mito-

chondrial chromosome was transcribed (�70%, Figure 1F). As

expected, all female samples had 0% hits on the reference

genome for the Y chromosome. Next, we determined the

number of exons of known protein-coding genes matched by

uniquely mapped reads (i.e., the fraction of exons transcribed).

On average, the ratio of transcribed exons in annotated pro-

tein-coding genes was �50% for the autosomal chromosomes,

�38% for the X chromosome, �35% for the Y chromosome,

and �70% of the mitochondrial chromosome (Figure 1F). The

transcribed exons occupied the majority of mRNA-seq reads

(�88.6%), and only �4.1% of reads were within intronic regions

of known protein-coding genes (Figure S1E). Therefore, we

found that 12,729 of protein-coding genes were reliably ex-

pressed (reads per kilobase of transcript per million mapped

reads [RPKM] R1 in at least two samples) in any layer or time

Cell Reports 6, 1

point analyzed. The remaining �7.3% of

reads aligned within intergenic regions

(Figure S1E), suggesting the expres-

sion of novel transcripts including long

intergenic noncoding RNAs (lincRNAs),

a potentially interesting discovery in light

of recent work on the possible regulatory

function of lincRNA (Mercer and Mattick,

2013).

As for smRNA-seq samples, the length

of the 30 adaptor-clipped reads was

clearly enriched for the microRNA

(miRNA) length (i.e., 22 nt) (Figure 1G).

Approximately 50 million high-quality

readswereuniquelymapped (FigureS1F),

i.e., an average of >1 million reads per

sample (Figure 1E; Table S1A). We found that 436 miRNAs

were reliably expressed (reads count were ten or more in at

least two samples). Among these, >80% of reads matched

the top ten reliably expressed miRNAs, which belong to eight

families (i.e., let-7, mir-25, mir-125, mir-28, mir-151, mir-127,

mir-181, and mir-486) (Figure S1G).

To assess similarities and differences among samples, we

performed principal component analysis (PCA) for both mRNA-

seq and smRNA-seq samples. The results demonstrate that

mRNA-seq samples cluster first according to the age of the

mouse (PC1, 54.98%) and second to their laminar location

(PC2, 11.76%) (Figures 2A and S2A–S2D). In contrast, the con-

tribution of age and laminar location to smRNA-seq samples

was smaller (Figures 2B and S2E–S2H).

We also performed hierarchical cluster analysis of mRNA-seq

samples and found that samples from P6–P10 mice cluster

together, whereas P4 samples cluster alone in one extreme,

and P14 and adult samples cluster together in the other extreme

(Figure S3A). These results suggest that P14 samples are mole-

cularly more similar to adults than to any other early postnatal

samples, whereas the molecular signature of P4 samples ap-

peared to be distinct from P6–P10 and P14–adult mice. We also

found that transcriptome differs more prominently across time

and layers than it does between males and females (Figure S2).

Spatiotemporal Expression of Coding RNAs andmiRNAsTo identify differentially expressed (DEX) protein-coding genes

andmiRNAs, we used the R package DESeq (Anders and Huber,

2010). DEX genes were split into three categories according

to their differential expression across layers (spatial DEX

[sDEX]), age (temporal DEX [tDEX]), or both (spatiotemporal

DEX [stDEX]). For these analyses, we only considered reliably

expressed protein-coding genes and miRNA (Figures S3B and

–13, March 13, 2014 ª2014 The Authors 3

Page 4: Cell Reports Resource - Yale School of Medicine

(legend on next page)

4 Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

Page 5: Cell Reports Resource - Yale School of Medicine

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

S3C). Additionally, only transcripts found to have a false dis-

covery rate (FDR) <0.01 were considered as significant DEX

transcripts.

Of all reliably expressed protein-coding genes (12,729),�10%

(1,321) were tDEX, �5% (662) were sDEX, and �8% (1,051)

were stDEX (Figure 2C; Table S2A). Our analysis of the reliably

expressed miRNAs (436) revealed that most were tDEX (86,

�20%), whereas only �1% were sDEX, and �3% were stDEX

(Figure 2D; Table S2B). These results are in agreement with the

PCA, which indicated that time had more influence on miRNA

expression than location within NCX.

Gene function enrichment analysis of the protein-coding

genes that were strictly tDEX revealed that these genes are

highly associated with categories related to the cell membrane,

synapses, and cell junctions and are mostly involved in ion

channel activity or potassium ion transport (Bonferroni adjusted,

p < 0.01) (Table S3A) (Huang et al., 2009). On the other hand,

protein-coding genes that were solely spatially regulated are

mostly associated with categories related to the regulation of

cell and vascular development, as well as neuronal differentia-

tion (Bonferroni adjusted, p < 0.0001) (Table S3B). Protein-cod-

ing genes that were both spatially and temporally DEX are most

associated with categories related to the development of pro-

jection neurons, synapses, and cell-cell signaling (Bonferroni

adjusted, p < 0.00001) (Table S3C). These analyses suggest

that we can use these data to explore molecular correlates of

developmental events occurring during laminar maturation.

Spatiotemporal Alternative SplicingAlternative splicing (AS) is a process by which one precursor

mRNA (pre-mRNA) can give rise to more than one distinct

mature mRNA via combination of alternative exon usage (Nilsen

and Graveley, 2010). We analyzed five basic splicing modalities:

cassette exon(s), mutually exclusive exon(s), alternative donors

or acceptors, alternative first or last exon, and intron retention

(Figures 3A, S3D, and S3E; Table S4). Similar to DEX analysis,

we divided the differentially alternative splicing (DAS) events

into three categories: spatial (sDAS), temporal (tDAS), and

spatiotemporal (stDAS). We found that tDAS events were more

abundant than sDAS or stDAS events (Figure 3A). Additionally,

DAS events were split into known and newly identified groups

(Figure 3A; Table S4). It is noteworthy that the apparent enrich-

ment of newly identified DAS of alternative donors/acceptors

and alternative first exon modalities may be skewed by the

existence of more reads matching the 30 end of transcripts

than the 50 end, as well as by reads that cannot distinguish

introns or adjacent exons.

mRNA-seq can reliably identify cassette exon events. As an

example, we confirmed a predicted cassette exon of Dlg2 (aka

Figure 3. Spatiotemporal Dynamics of AS Events

(A) Numbers of known (dark blue) and newly identified (red) splicing events.

(B) Reads coverage in the Dlg2 gene region correspondent to exons present in

the temporal coverage of mRNA-seq reads mapped to exon 9. Black bars/boxe

depict location of exon-specific PCR primers.

(C and D) Exon-specific PCR of the cassette exons ‘‘X-9-Y’’ in themouse IgL (C) an

are indicated as ‘‘a’’ or ‘‘b.’’

See also Figure S3 and Table S4.

Psd-93) gene and found that it was temporally regulated (Figures

3B and 3C). There are three mouse RefSeq RNAs registered for

mouse Dlg2 (isoform 1–3). During the first week of postnatal

development, we detected reads that span an exon common

toDlg2 isoforms 1 and 3 (exons 17 and 5, respectively), hereafter

referred to as exon X, to exon 9 present in Dlg2 isoform 2. We

also detected reads that span exon 9 to another exon common

to Dlg2 isoforms 1, 2, and 3 (exons 18, 10, and 6, respectively),

hereafter referred to as exon Y. These observations suggest

that the exons ‘‘X-9-Y’’ are being coexpressed in the same tran-

script. Interestingly, around P6, we can also detect a junction

that connects exons X and Y, which become the main junction

present at P14 and the only junction expressed in adulthood.

We confirmed this observation by PCR analysis and observed

that the cassette exons ‘‘X-9-Y’’ and ‘‘X-Y’’ have inverse ex-

pression profiles with the first peaking at P6 and fading out

after and the second appearing around P6 and increasing into

adulthood (Figure 3D).

In humans, the orthologous ‘‘X-9-Y’’ cassette exons are

described inDLG2 gene isoforms 2 and 4. However, these exons

are not included in the RefSeq RNA, and thus, their isoform

expression was not reported in previous exon array studies of

the developing human brain (Johnson et al., 2009; Kang et al.,

2011; Pletikos et al., 2014). We performed PCR analysis in

human S1C of equivalent developmental periods (http://www.

translatingtime.net; Table S1C). We observed that the human

‘‘X-9-Y’’ versus ‘‘X-Y’’ cassette exons were expressed in similar

pattern to that detected in mouse tissue samples (Figure 3D),

indicating that this is a conserved developmental variant.

DLG2 encodes a postsynaptic density protein known as PSD-

93 and is a member of the membrane-associated guanylate

kinase (MAGUK) family (Hough et al., 1997). The Havana project

has predicted, based on expressed sequence tags, the exis-

tence of a processed transcript containing the cassette exon

‘‘X-9-Y.’’ Our data provide direct evidence of the expression

of the cassette exon ‘‘X-9-Y.’’ Because we do not know the

entire structure of the mouse transcript containing the exon

cassette ‘‘X-9-Y’’ and resulting changes in the protein se-

quence, it is difficult to predict the functional consequence of

its expression. However, it is noteworthy that the cassette

exon ‘‘X-9-Y’’ matches a genomic region of Dlg2 isoform 1

that sits between the hook and GK protein domains that are

thought to be involved in defining the subcellular localization

of MAGUK proteins (Hough et al., 1997), as well as their ability

to bind other proteins (Brenman et al., 1998; Paarmann et al.,

2002). Moreover, a recent report described a similar AS

event in Dlg4 (aka Psd-95) that seems to be important for

the regulation of neuronal synapse maturation (Zheng et al.,

2012). Together, these findings illuminate the spatiotemporal

X and Y (isoforms 1 and 3) and 9 and Y (isoform 2). The yellow box highlights

s underneath the exonic read distribution indicate exon junctions. Red arrows

d human S1C of equivalent developmental time points (D). Biological replicates

Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 5

Page 6: Cell Reports Resource - Yale School of Medicine

Figure 4. Weighted Gene Coexpression Networks

(A) Heatmap matrix showing modular eigengenes across ages for each layer.

(B) Pairwise Pearson correlations among modules.

(C) Developmental trajectories of modules M5 and M7 (top panels), and proportion of genes reported to be enriched in different neural cell types (bottom bar

graphs).

(D) Proportion of genes reported to be enriched in different neural cell types in spatial clusters II–IV.

(E) Proportion of genes reported to be enriched in immature and mature astrocytes in spatial clusters II–IV.

See also Figures S4 and S5 and Table S5.

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

complexity of AS and provide examples of likely functionally

relevant DAS events.

Developmental and Laminar Specificity of CoexpressionNetworksCoexpressed genes share spatiotemporal localization and often

participate in the same biological processes (Barabasi and

Oltvai, 2004; Oldham et al., 2008; Johnson et al., 2009; Kang

et al., 2011). We therefore performed weighted gene coex-

pression network analysis (WGCNA) to construct and visualize

modules of coexpressed genes across all samples (Langfelder

6 Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors

and Horvath, 2008). We identified 40 modules with distinct

spatiotemporal patterns that are characterized by the trajec-

tories of module’s eigengenes (Figure S4). M40 (n = 14 genes)

shows sex-biased expression of genes (i.e., Y chromosome-

enriched genes) and was thus excluded from downstream

analyses. Of the other 39 modules, we performed hierarchical

cluster analysis and found that they can be organized into five

clusters (Figure 4A). The first (cluster I) and fifth (cluster V) are

temporal clusters, in which the expression of clustered genes

is increasing and decreasing along time points, respectively, in

an anticorrelated fashion (Figure 4B). The remaining clusters

Page 7: Cell Reports Resource - Yale School of Medicine

Figure 5. Spatial Cluster II Inter- and Intramodular Connectivity

(A) Developmental trajectories (left column) and neural cell-type enrichment (right column) of cluster II modules.

(B) Connectivity of cluster II inter- and intramodular hub genes.

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

are mainly spatially defined (i.e., cluster II is IgL enriched, cluster

III is SgL enriched, and cluster IV is L4 enriched).

Functional Annotation of Coexpression NetworksGene function enrichment analysis revealed that a large fraction

of genes grouped in a given module is associated with a specific

biological process (Table S5). Interestingly, some biological

categories were significantly enriched (Bonferroni adjusted,

p < 0.01) in more than one module. For example, cluster III mod-

ules M5 andM7were associated with blood vessel development

(Table S5). However, in M5, we found genes involved in the

initial stages of angiogenesis (Notch1, Robo4, and Angpt2),

whereas in M7, we found genes involved in later stages of angio-

genesis (Tek and EphB4) (Chung and Ferrara, 2011).

Due to the presence of multiple cell types in our tissue sam-

ples, we wanted to investigate whether the genes clustered in

our 40 modules could be associated with the developmental

program of a specific neural cell type (i.e., neurons, astrocytes,

and oligodendrocytes). For this analysis, we intersected each

module with lists of genes enriched in each neural cell type

(Cahoy et al., 2008). We found that each module tends to be

enriched in a defined neural cell type (Figure S5). Interestingly,

modules M5 and M7 were enriched in astrocyte-enriched genes

(binomial test, p = 6.43 10�51 and p = 4.73 10�5, respectively),

which is in agreement with known interaction between astrocyte

and blood vessel development (Zerlin and Goldman, 1997)

(Figure 4C). Additionally, we found that cluster II (IgL enriched)

and cluster IV (L4 enriched) had a higher percentage of oligoden-

drocyte- and neuron-enriched genes than astrocyte-enriched

genes (chi-square test, p = 1 3 10�35 and p = 4.4 3 10�24,

respectively), which is in contrast to cluster III (SgL enriched)

(Figure 4D). Interestingly, the ratio of immature/mature astro-

cytes was significantly higher in cluster III compared to clusters

II and IV (chi-square test, p = 9.2 3 10�7, Figure 4E). Together,

these observations are consistent with the inside outside pro-

gressivematuration of neural cell types in developing neocortical

layers.

We also analyzed the degree of connectivity among the top ten

hub genes within each module (M1–M39) and among different

modules within each cluster (I–V) (Figures 5 and S6). The hub

Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 7

Page 8: Cell Reports Resource - Yale School of Medicine

Figure 6. miRNA-mRNA Regulation Prediction

(A) Normalized expression trajectory of miR-92b to the expression profile of its putative target mRNAs (n = 119). Data are expressed as mean ± 95% confident

intervals for target mRNAs.

(B) Developmental trajectories of Foxp2 expression in different layers.

(C) miRNAs predicted to regulate Foxp2 mRNA.

See also Figure S7 and Table S6.

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

genes were selected based on gene expression profile proximity

to that of the respective eigengene, and the degree of connectiv-

ity was assessed based on the coexpression correlation coeffi-

ciency (Table S5). We found that modules enriched in genes

highly expressed in a specific neural cell type were more inter-

connected than they were connected with modules of genes

enriched in a different neural cell type (Figure 5A). This obser-

vation suggests that some modules are associated with the

development of a distinct neural cell type.

Interestingly, we also observed that intermodular connections

could happen between different cell types. For example, in

spatial cluster II, M13 genes were highly connected to four hub

genes in M27 (Slc25A22, Tmem163, Trbj2-3, and Sema5A) and

two hub genes in M31 (Prkce and EphA7) (Figure 5B). M13

showed functional enrichment in genes associated with behavior

and cell morphogenesis in neuronal differentiation, whereas

M27 shows functional enrichment in genes associated with

oligodendrocyte development (Table S5). Of the four genes in

M27 interconnected with the M13 genes, Sema5A was enriched

in the IgL and previously reported to be expressed in early post-

natal oligodendrocytes (Cahoy et al., 2008). However, consistent

with our finding of association of Sema5a with both infragranular

oligodendrocytic and neuronal modules (M27 and M13, res-

pectively), examination of its expression pattern in early post-

natal mice in the Allen Brain Atlas (http://developingmouse.

brain-map.org) revealed that Sema5a could be expressed by

both oligodendrocytes and neurons in the IgL. Furthermore,

analyses of whole-body and retinal oligodendrocyte conditional

Sema5A knockout mice indicated a key role of this semaphorin

in retinal axon outgrowth (Goldberg et al., 2004; Matsuoka

et al., 2011). Interestingly, in humans, SEMA5A is an autism

risk gene, and epigenetic studies show that it is highly methyl-

ated in nonneuronal cells of autistic patients (Shulha et al.,

2012; Weiss et al., 2009). Thus, Sema5a may play an important

role in linking different gene modules, and alterations in its

function may have pleiotropic effects in the developing NCX.

Furthermore, these and other related findings suggest that

there is considerable overlap in transcriptional programs and

8 Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors

possible crosstalk between developing neural cell types under-

going distinct biological processes.

miRNA-mRNA RelationshipsProtein-coding transcript levels can be posttranscriptionally

regulated by the activity of miRNA. These small RNA molecules

of 20–24 nt are known to silence mRNA translation through

sequence-specific targeting (Bartel, 2009). This prompted us

to analyze miRNA-mRNA relationships in the context of spatio-

temporal dynamics across neocortical layers and development

ages.

We first used TargetScan to find putative miRNA targets

based on sequence complementarity to mRNA 30 UTR of reliably

expressed protein-coding genes (Figure S7A) (Grimson et al.,

2007). Next, we compared the expression profile between

miRNA-mRNA pairs and selected the top anticorrelated pairs

with rank scores R80 because these most likely represent

downregulation of the target mRNAs (Table S6) (Lu et al.,

2011). To determine whether one miRNA could have enhanced

activity toward a particular spatial or temporal cluster or certain

neural cell type, we looked for the distribution of the targeted

mRNAs throughout the spatiotemporal clusters previously

analyzed and filtered them for neural cell-type enrichment.

Overall, there were no significant differences in enrichment of

targeted mRNAs clustered in temporal, spatial, or spatiotem-

poral modules (Figure S7B). However, we found that 32 miRNAs

preferentially targeted mRNAs that cluster in predominantly

temporal modules (i.e., clusters I and V), and 8 miRNAs targeted

preferentially mRNAs that cluster in predominantly spatial mod-

ules (i.e., clusters II–IV) (Figure S7C). We focused our attention

on the putative targets of the most reliably expressed miRNAs

to increase our confidence in any possible miRNA-mRNA regu-

latory interaction. One of these miRNAs is miR-92b, whose 119

anticorrelated mRNA targets were enriched in the IgL cluster

(binomial test, FDR < 0.01) (Figures 6A and S7C) and in neurons

and oligodendrocytes (binomial test, p = 0.01 and p = 0.002,

respectively; Figure 6C). There was a strong anticorrelation

between the normalized expression trajectory of miR-92b and

Page 9: Cell Reports Resource - Yale School of Medicine

Figure 7. Transcriptional Correlates of Neurodevelopmental Events

(A) Developmental trajectories of genes associated with selected major neurodevelopmental processes in different laminar compartments.

(B) Schematic of changes in dendritic morphology of L4 SSCs (dark green) and the formation of L4 barrels (light green).

(C) Expression trajectories of genes correlated with developmental changes in dendritic morphology of L4 (green) SSCs.

See also Table S7.

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

the expression profile of its putative target mRNAs (Pearson

correlation, R = �0.74; Figure 6A). One of the mRNAs predicted

as a target of miR-92b is Foxp2, which is enriched in IgL pro-

jection neurons (Figures 1D, and 6B; Table S5). In addition, we

also predict that Foxp2 is targeted by other miRNAs (Figure 6C),

suggesting that the regulation of Foxp2 levels in neocortical cells

occurs through multiple miRNAs.

By combining the temporal and spatial expression profiles

of putative targeted mRNAs, we observed patterns of miRNA

expression that revealed spatiotemporal aspects of gene regu-

lation that were previously obscure. In addition, analysis of

miRNAs that target mRNAs enriched in specific neural cell

types may help to predict the mechanism of action of particular

miRNAs.

Transcriptional Correlates of NeurodevelopmentalEventsTo gain insights into progression of major neurodevelopmental

processes across different layers and time points, we explored

expression trajectories of genes associated with the develop-

ment of neuronal morphology (i.e., axonogenesis and dendrito-

genesis), synaptogenesis, and neural activity (i.e., excitatory

and inhibitory neurotransmissions, and experience-driven activ-

ity) (Figure 7A; Table S7). As expected, the expression patterns

of genes associated with axonogenesis were generally downre-

gulated across time, whereas genes involved in synaptogenesis

and neural activity were upregulated. Genes previously asso-

ciated with dendritogenesis did not seem to have a signature

trajectory (Figure 7A). This might be due to the multiple roles

Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 9

Page 10: Cell Reports Resource - Yale School of Medicine

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

that these molecules have in development. Interestingly, the

expression of the majority of genes associated with experi-

ence-driven activity (i.e., immediate early genes [IEGs]) in-

creased dramatically after P8 first in L4, followed closely in IgL,

and more robustly after P10 in SgL (Figure 7A, gray shade).

This increment in expression of IEGs coincides with the emer-

gence of exploratory whisking around P11–P14 (Takatoh et al.,

2013). The fact that not all layers respond at the same time to

sensory input is reflective of the sequential maturation of the

neocortical layers in an inside first-outside last fashion. Indeed,

the same trend was consistently observed with the other neuro-

developmental processes.

We also used our data set to identify genes that may be

involved in developmental sculpting of dendritic morphology of

L4 spiny stellate cells (SSCs) in S1C. These L4 excitatory

neurons aggregate to form barrel-like structures that surround

clusters of thalamocortical afferents (TCAs) relaying information

from individual facial whiskers (Li and Crair, 2011; Li et al., 2013).

During early postnatal development, twomorphological changes

occur in immature L4 excitatory neurons: they cease to have

a distinct apical dendrite, growing dendrites of roughly equal

size; and they direct their dendrites toward one specific TCA

bundle (Figure 7B). Multiple lines of evidence suggest that

thalamic activity plays an important role in regulating these

morphological transformations of L4 neurons (Callaway and

Borrell, 2011; Li et al., 2013). However, the molecular mecha-

nisms underlying these changes are unknown.

We hypothesized that genes involved in SSC dendritic devel-

opment should be enriched in L4 between P4 and P6, after which

their expressionmay decrease. Expression profiles of four genes

matched this expression pattern: Fat3, Lhfp, Ptgfrn, and Tgfbr1

(Figure 7C). Interestingly, all four genes encode transmembrane

proteins involved in signal transduction that lead to a plethora

of physiological (e.g., cell differentiation) and pathological (e.g.,

schizophrenia) processes. Analysis of Fat3 expression in the

Allen Brain Atlas (http://developingmouse.brain-map.org; Lein

et al., 2007) confirms its enrichment in S1C at P4, but not

afterward. Additionally, the atypical cadherin FAT3 has recently

been shown to control the dendritic morphology of retinal neu-

rons (Deans et al., 2011). Thus, Fat3 is a plausible candidate

for regulating SSC dendritic development.

DISCUSSION

Here, we profiled the neocortical developmental transcriptome

by deep sequencing both mRNAs and smRNAs in the

S1C, across different layers and multiple developmental time

points. All generated data are publicly available in the

Mouse NCX Transcriptome database (http://medicine.yale.

edu/lab/sestan/resources/index.aspx), providing the basis for a

variety of future investigations and comparisons with other tran-

scriptome-related data sets.

Our analysis revealed that time, more than space (layers) or

sex, defines the dynamics of coding and noncoding transcripts

in the mouse S1C. In particular, P4 neocortical transcriptome

was substantially different from the one at P6, P8, and P10,

which were similar to each other; and P14 was more similar to

adult than to P6–P10 or P4. This segregation likely reflects the

10 Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors

progression of major neurodevelopmental processes. Neuronal

migration of the last neurons to the SgL is believed to cease

around P3–P4 (Kwan et al., 2012). Interestingly, Mdga1, which

has been specifically implicated in the migration of SgL neurons

(Takeuchi and O’Leary, 2006), shows maximal expression at

P4 in the SgL and L4, with subsequent decline (Figure 1D). After

P4, there is a period of approximately 1.5 weeks of intensive

elaboration of axons and dendrites, development of the vascular

system, and overproduction of synapses and spines (Ashby and

Isaac, 2011; Yu et al., 1994; Figure 7A). Around P14, mice open

their eyes and begin to exhibit exploratory behavior and co-

ordinated movement of their whiskers, more indicative of inde-

pendent adult behaviors.

In addition to general temporal transcriptional change, there

is also a spatial gradient in maturation among layers due to the

inside first-outside last nature of laminar differentiation. Accord-

ingly, at the level of protein-coding genes, our samples also

segregated according to their laminar identities (i.e., IgL, L4,

and SgL). Gene function enrichment analysis of the sDEX genes

as well as the analysis of the signature trajectory of neocortical

developmental events reflected the inside out gradient of laminar

maturation (Table S3D). Genes enriched in the IgL (inside) were

associated with the development of neuronal morphology

because these layers are the first to go through the elaboration

of dendritic trees and long-reaching axons, whereas those en-

riched in L4 were genes associated with signal transduction

(e.g., G protein-coupled receptors) and synapse and channel

activity, consistent with the specific function of this layer as

main recipient of thalamic afferents and its dependence on

thalamocortical neurotransmission (Li et al., 2013). On the other

hand, the genes enriched in SgL (outside) were those associated

with cell adhesion, suggestive of the ongoing processes of cell

migration and blood vessel development, and there was no

enrichment in genes associated with neuronal development, a

hint at its less mature state.

Both WCGNA and hypothesis-driven analysis reinforced their

value in identifying genes and networks associated with distinct

biological processes. We were able to uncover transcriptional

overlaps between known phenotypic interactions (e.g., blood

vessel development and astrogliogenesis) and provide mecha-

nistic insights into related biological processes (e.g., neuronal

activity and gene expression). Furthermore, we also identified

gene candidates for regulating neurodevelopmental processes

specific to a layer or area, such as the developmental sculpting

of dendritic morphology of L4 SSCs. Finally, we also demon-

strated that one can embrace the cellular and molecular

complexity of distinct layers for integrated data analysis by

combining our data with available data on specific neural cell

types.

In addition, this data set is also helpful in analyzing the laminar

and temporal expressions of genes linked to psychiatric and

neurological disorders. Our initial analysis revealed that some

genes linked to schizophrenia (i.e., Zfp804a) or autism (i.e.,

Fezf2, Sema5a, Sox5, and Tbr1) are enriched in IgL during devel-

opment (Figure 1D). Together with similar findings from a recent

study on the expression of autism-related genes in the human

NCX (Willsey et al., 2013), the development of IgL projection

neurons and their circuits might be affected in these disorders.

Page 11: Cell Reports Resource - Yale School of Medicine

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

The mRNA-seq data set further allows the study of the ex-

pression profile of new gene isoforms (e.g., Dlg2), which

should prove especially valuable for the study of gene ex-

pression and AS across development and species (Keren

et al., 2010). Additionally, the smRNA-seq data set enables

the identification of distinct spatiotemporal profiles of miRNAs

and potential miRNA-mRNA interactions (e.g., miR-92b and

Foxp2). The data presented here and other neural cell-type-

specific miRNA data sets (see Jovi�ci�c et al., 2013) should

provide valuable resources for interpreting these putative

interactions.

EXPERIMENTAL PROCEDURES

Laminar Microdissection

Dcdc2a-Gfp reporter mice were acutely sacrificed, and the brain was sub-

merged in ice-cold oxygenated artificial cerebrospinal fluid (ACSF) for 5 min.

Using a vibratome, we prepared live sagittal brain slices (250 mm), kept in

ice-cold and oxygenated ACSF at all times. Layers were dissected under a

fluorescence stereoscope and collected into separate safe-lock microcentri-

fuge tubes with 30 ml of RNAlater.

RNA Extraction and Library Preparation

Tissue homogenization was performed by adding stainless steel beads

(Next Advance) and 2 vol of lysis buffer to the tube with tissue and homoge-

nized in the Bullet Blender (Next Advance) for 1 min at speed 6. Total RNA

was extracted using RNeasy Plus Mini Kit (QIAGEN). cDNA libraries were

prepared using the TruSeq mRNA and TruSeq SmallSample Prep Kits

(Illumina), as per the manufacturer’s instructions with some modifications

(see Supplemental Experimental Procedures).

Read Filtering, Processing, and Alignment

The mRNA-seq reads were aligned to mouse reference genome (NCBI37/

mm9) using TopHat (v.1.0.13) (Trapnell et al., 2009) with up to twomismatches.

The uniquely mapped reads were used to calculate the expression level of

genes annotated in Ensembl (NCBI37/mm9, released version 63) using

RSEQTools (Habegger et al., 2011). The smRNA-seq reads were clipped

and aligned to miRNA and pre-miRNA retrieved from miRBase (released

version 18) using miRanalyzer (released version 0.2) with up to one mismatch

(Hackenberg et al., 2011; Kozomara and Griffiths-Jones, 2011). The reads

uniquely aligned to either library were taken to calculate the reads count for

miRNA. Reads aligned to pre-miRNA but not miRNA were attributed to

divergence from the consensus sequence and then were assigned to miRNA

based on their aligned locus. See Supplemental Experimental Procedures

for more details.

Data Quality Assessment of mRNA-Seq and smRNA-Seq

The spike-in RNAs were added to the libraries to tag samples. The percentage

of mismatches within sequenced spike-in RNA reads was considered

sequencing errors. The sequencing quality score of smRNA-seq reads was

evaluated by FastQC. We surveyed the expression landscape across chromo-

somes by determining the fraction of genome and exons transcribed for each

chromosome. To investigate the relationship between samples, we used R

package prcomp to perform PCA for mRNA-seq and smRNA-seq samples.

In addition, we used WGCNA to perform average-linkage hierarchical clus-

tering for mRNA-seq samples. See Supplemental Experimental Procedures

for more details.

Spatiotemporal DEX Analysis

We used R package DESeq to identify DEX genes and miRNAs between

different layers and between different ages (Anders and Huber, 2010).

FDR <0.01 was chosen to detect statistically significant DEX transcripts.

See Supplemental Experimental Procedures for more details.

Alternative Splicing and Intron Retention

We used JuncBASE to identify AS events (Brooks et al., 2011). We per-

formed pairwise comparisons between different layers and between

different ages. For the AS events expressed in both variables, we used a

threshold of FDR <0.01. For AS events expressed only in one variable, we

used the supported reads count >25 as the threshold. We used RSEQtools

to build intron annotations and calculated the RPKM and reads count

(Habegger et al., 2011). The combination of RPKM R0.5 and raw reads

count of ten or more was used as the threshold. We used an RPKM fold

change of greater than two to detect spatial and temporal differences in

intron retention events. See Supplemental Experimental Procedures for

more details.

qRT-PCR and Exon-Specific PCR

One microgram of total RNA of each sample was used for cDNA synthesis

using SuperScript III First-Strand Synthesis SuperMix (Invitrogen). TaqMan

Gene Expression Assay was used for each gene of interest along with TaqMan

Universal Master Mix (Applied Biosystems). Exon-specific high-melting

temperature primers were designed using NCBI/Primer-BLAST (http://www.

ncbi.nlm.nih.gov/tools/primer-blast/) (see Supplemental Experimental Proce-

dures). PCR was performed using Phusion High-Fidelity DNA Polymerase

(New England Biolabs), as per manufacturer’s instruction.

Weighted Gene Coexpression Network Analysis

We used R package WGCNA to perform WGCNA (Langfelder and Horvath,

2008). In each module, the top ten genes highly correlated with modular

eigengene were selected as modular hub genes to build and visualize

inter- and intramodule connectivity using Cytoscape (Smoot et al., 2011).

See Supplemental Experimental Procedures for more details.

miRNA-mRNA Regulation Prediction

We retrieved conserved miRNA-mRNA regulation pairs from TargetScan

database (released version 6.2). We then used lasso_mir.R to predict the regu-

lation between mRNA and miRNA expression levels (Lu et al., 2011) with rank

score R80. Finally, we used TargetScan database and Lasso prediction to

calculate the proportion of targetedmRNAs.We used FDR <0.01 to detect reli-

able cluster-enriched miRNAs. See Supplemental Experimental Procedures

for more details.

ACCESSION NUMBERS

The raw sequencing reads have been deposited in NCBI Sequence Read

Archive under the accession number SRP031888.

SUPPLEMENTAL INFORMATION

Supplemental Information includes Supplemental Experimental Procedures,

seven figures, and seven tables and can be found with this article online at

http://dx.doi.org/10.1016/j.celrep.2014.01.036.

ACKNOWLEDGMENTS

We thank A. Sousa and M. Pletikos for helping with human tissue processing

and data analyses and the members of the N.S. laboratory for valuable

comments. Human postmortem tissue was obtained from sources listed in

Supplemental Experimental Procedures. This work was supported by grants

from the NIH (MH081896, MH062639, and NS051869), the Kavli Foundation

(to M.C. and N.S.), and by a James S. McDonnell Foundation Scholar Award

(to N.S.).

Received: June 6, 2013

Revised: October 10, 2013

Accepted: January 27, 2014

Published: February 20, 2014

Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 11

Page 12: Cell Reports Resource - Yale School of Medicine

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

REFERENCES

Anders, S., and Huber,W. (2010). Differential expression analysis for sequence

count data. Genome Biol. 11, R106.

Arlotta, P., Molyneaux, B.J., Chen, J., Inoue, J., Kominami, R., and Macklis,

J.D. (2005). Neuronal subtype-specific genes that control corticospinal motor

neuron development in vivo. Neuron 45, 207–221.

Ashby, M.C., and Isaac, J.T. (2011). Maturation of a recurrent excitatory

neocortical circuit by experience-dependent unsilencing of newly formed den-

dritic spines. Neuron 70, 510–521.

Barabasi, A.L., and Oltvai, Z.N. (2004). Network biology: understanding the

cell’s functional organization. Nat. Rev. Genet. 5, 101–113.

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions.

Cell 136, 215–233.

Belgard, T.G., Marques, A.C., Oliver, P.L., Abaan, H.O., Sirey, T.M., Hoerder-

Suabedissen, A., Garcıa-Moreno, F., Molnar, Z., Margulies, E.H., and Ponting,

C.P. (2011). A transcriptomic atlas of mouse neocortical layers. Neuron 71,

605–616.

Brenman, J.E., Topinka, J.R., Cooper, E.C., McGee, A.W., Rosen, J., Milroy,

T., Ralston, H.J., and Bredt, D.S. (1998). Localization of postsynaptic den-

sity-93 to dendritic microtubules and interaction with microtubule-associated

protein 1A. J. Neurosci. 18, 8805–8813.

Brooks, A.N., Yang, L., Duff, M.O., Hansen, K.D., Park, J.W., Dudoit, S., Bren-

ner, S.E., and Graveley, B.R. (2011). Conservation of an RNA regulatory map

between Drosophila and mammals. Genome Res. 21, 193–202.

Cahoy, J.D., Emery, B., Kaushal, A., Foo, L.C., Zamanian, J.L., Christopherson,

K.S., Xing, Y., Lubischer, J.L., Krieg, P.A., Krupenko, S.A., et al. (2008). A tran-

scriptome database for astrocytes, neurons, and oligodendrocytes: a new

resource for understanding brain development and function. J. Neurosci. 28,

264–278.

Callaway, E.M., and Borrell, V. (2011). Developmental sculpting of dendritic

morphology of layer 4 neurons in visual cortex: influence of retinal input.

J. Neurosci. 31, 7456–7470.

Chen, J.G., Rasin, M.R., Kwan, K.Y., and Sestan, N. (2005). Zfp312 is required

for subcortical axonal projections and dendritic morphology of deep-layer

pyramidal neurons of the cerebral cortex. Proc. Natl. Acad. Sci. USA 102,

17792–17797.

Chung, A.S., and Ferrara, N. (2011). Developmental and pathological angio-

genesis. Annu. Rev. Cell Dev. Biol. 27, 563–584.

Deans, M.R., Krol, A., Abraira, V.E., Copley, C.O., Tucker, A.F., and Goodrich,

L.V. (2011). Control of neuronal morphology by the atypical cadherin Fat3.

Neuron 71, 820–832.

DeFelipe, J., Lopez-Cruz, P.L., Benavides-Piccione, R., Bielza, C., Larranaga,

P., Anderson, S., Burkhalter, A., Cauli, B., Fairen, A., Feldmeyer, D., et al.

(2013). New insights into the classification and nomenclature of cortical

GABAergic interneurons. Nat. Rev. Neurosci. 14, 202–216.

Dillman, A.A., Hauser, D.N., Gibbs, J.R., Nalls, M.A., McCoy, M.K., Rudenko,

I.N., Galter, D., and Cookson, M.R. (2013). mRNA expression, splicing and

editing in the embryonic and adult mouse cerebral cortex. Nat. Neurosci. 16,

499–506.

Espinosa, J.S., and Stryker, M.P. (2012). Development and plasticity of the

primary visual cortex. Neuron 75, 230–249.

Fonta, C., and Imbert, M. (2002). Vascularization in the primate visual cortex

during development. Cereb. Cortex 12, 199–211.

Goldberg,J.L.,Vargas,M.E.,Wang,J.T.,Mandemakers,W.,Oster,S.F.,Sretavan,

D.W., and Barres, B.A. (2004). An oligodendrocyte lineage-specific semaphorin,

Sema5A, inhibits axon growth by retinal ganglion cells. J. Neurosci. 24, 4989–

4999.

Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and

Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants

beyond seed pairing. Mol. Cell 27, 91–105.

Habegger, L., Sboner, A., Gianoulis, T.A., Rozowsky, J., Agarwal, A., Snyder,

M., and Gerstein, M. (2011). RSEQtools: a modular framework to analyze

12 Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors

RNA-Seq data using compact, anonymized data summaries. Bioinformatics

27, 281–283.

Hackenberg, M., Rodrıguez-Ezpeleta, N., and Aransay, A.M. (2011).

miRanalyzer: an update on the detection and analysis of microRNAs in high-

throughput sequencing experiments. Nucleic Acids Res. 39 (Web Server

issue), W132–W138.

Han, W., Kwan, K.Y., Shim, S., Lam, M.M., Shin, Y., Xu, X., Zhu, Y., Li, M., and

Sestan, N. (2011). TBR1 directly represses Fezf2 to control the laminar origin

and development of the corticospinal tract. Proc. Natl. Acad. Sci. USA 108,

3041–3046.

Heintz, N. (2004). Gene expression nervous system atlas (GENSAT). Nat.

Neurosci. 7, 483.

Hengst, U., and Jaffrey, S.R. (2007). Function and translational regulation of

mRNA in developing axons. Semin. Cell Dev. Biol. 18, 209–215.

Hough, C.D., Woods, D.F., Park, S., and Bryant, P.J. (1997). Organizing a

functional junctional complex requires specific domains of the Drosophila

MAGUK Discs large. Genes Dev. 11, 3242–3253.

Huang, W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integra-

tive analysis of large gene lists using DAVID bioinformatics resources. Nat.

Protoc. 4, 44–57.

Johnson, M.B., Kawasawa, Y.I., Mason, C.E., Krsnik, Z., Coppola, G.,

Bogdanovi�c, D., Geschwind, D.H., Mane, S.M., State, M.W., and Sestan, N.

(2009). Functional and evolutionary insights into human brain development

through global transcriptome analysis. Neuron 62, 494–509.

Jovi�ci�c, A., Roshan, R., Moisoi, N., Pradervand, S., Moser, R., Pillai, B., and

Luthi-Carter, R. (2013). Comprehensive expression analyses of neural cell-

type-specific miRNAs identify new determinants of the specification and

maintenance of neuronal phenotypes. J. Neurosci. 33, 5127–5137.

Kang, H.J., Kawasawa, Y.I., Cheng, F., Zhu, Y., Xu, X., Li, M., Sousa, A.M.,

Pletikos, M., Meyer, K.A., Sedmak, G., et al. (2011). Spatio-temporal transcrip-

tome of the human brain. Nature 478, 483–489.

Keren, H., Lev-Maor, G., and Ast, G. (2010). Alternative splicing and evolution:

diversification, exon definition and function. Nat. Rev. Genet. 11, 345–355.

Kilb, W., Kirischuk, S., and Luhmann, H.J. (2011). Electrical activity patterns

and the functional maturation of the neocortex. Eur. J. Neurosci. 34, 1677–

1686.

Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA

annotation and deep-sequencing data. Nucleic Acids Res. 39 (Database

issue), D152–D157.

Kwan, K.Y., Sestan, N., and Anton, E.S. (2012). Transcriptional co-regulation

of neuronal migration and laminar identity in the neocortex. Development

139, 1535–1546.

Langfelder, P., and Horvath, S. (2008). WGCNA: an R package for weighted

correlation network analysis. BMC Bioinformatics 9, 559.

Lein, E.S., Hawrylycz, M.J., Ao, N., Ayres, M., Bensinger, A., Bernard, A., Boe,

A.F., Boguski, M.S., Brockway, K.S., Byrnes, E.J., et al. (2007). Genome-wide

atlas of gene expression in the adult mouse brain. Nature 445, 168–176.

Leone, D.P., Srinivasan, K., Chen, B., Alcamo, E., and McConnell, S.K. (2008).

The determination of projection neuron identity in the developing cerebral cor-

tex. Curr. Opin. Neurobiol. 18, 28–35.

Li, H., and Crair, M.C. (2011). How do barrels form in somatosensory cortex?

Ann. N Y Acad. Sci. 1225, 119–129.

Li, H., Fertuzinhos, S., Mohns, E., Hnasko, T.S., Verhage, M., Edwards, R.,

Sestan, N., and Crair, M.C. (2013). Laminar and columnar development of

barrel cortex relies on thalamocortical neurotransmission. Neuron 79,

970–986.

Lu, Y., Zhou, Y., Qu, W., Deng, M., and Zhang, C. (2011). A Lasso regression

model for the construction of microRNA-target regulatory networks. Bioinfor-

matics 27, 2406–2413.

Lyckman, A.W., Horng, S., Leamey, C.A., Tropea, D., Watakabe, A., Van Wart,

A., McCurry, C., Yamamori, T., and Sur, M. (2008). Gene expression patterns

Page 13: Cell Reports Resource - Yale School of Medicine

Please cite this article in press as: Fertuzinhos et al., Laminar and Temporal Expression Dynamics of Coding and Noncoding RNAs in the MouseNeocortex, Cell Reports (2014), http://dx.doi.org/10.1016/j.celrep.2014.01.036

in visual cortex during the critical period: synaptic stabilization and reversal by

visual deprivation. Proc. Natl. Acad. Sci. USA 105, 9409–9414.

Matsuoka, R.L., Chivatakarn, O., Badea, T.C., Samuels, I.S., Cahill, H., Ka-

tayama, K., Kumar, S.R., Suto, F., Chedotal, A., Peachey, N.S., et al. (2011).

Class 5 transmembrane semaphorins control selective Mammalian retinal

lamination and function. Neuron 71, 460–473.

McNeill, E., and Van Vactor, D. (2012). MicroRNAs shape the neuronal land-

scape. Neuron 75, 363–379.

Mercer, T.R., and Mattick, J.S. (2013). Structure and function of long noncod-

ing RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 20, 300–307.

Molyneaux, B.J., Arlotta, P., Menezes, J.R.L., and Macklis, J.D. (2007).

Neuronal subtype specification in the cerebral cortex. Nat. Rev. Neurosci. 8,

427–437.

Nilsen, T.W., and Graveley, B.R. (2010). Expansion of the eukaryotic proteome

by alternative splicing. Nature 463, 457–463.

Okaty, B.W., Sugino, K., and Nelson, S.B. (2011). A quantitative comparison of

cell-type-specific microarray gene expression profiling methods in the mouse

brain. PLoS One 6, e16493.

Oldham, M.C., Konopka, G., Iwamoto, K., Langfelder, P., Kato, T., Horvath, S.,

and Geschwind, D.H. (2008). Functional organization of the transcriptome in

human brain. Nat. Neurosci. 11, 1271–1282.

Paarmann, I., Spangenberg, O., Lavie, A., and Konrad, M. (2002). Formation of

complexes between Ca2+.calmodulin and the synapse-associated protein

SAP97 requires the SH3 domain-guanylate kinase domain-connecting

HOOK region. J. Biol. Chem. 277, 40832–40838.

Pletikos, M., Sousa, A.M.M., Sedmak, G., Meyer, K.A., Zhu, Y., Cheng, F., Li,

M., Kawasawa, Y.I., and Sestan, N. (2014). Temporal specification and bilater-

ality of human neocortical topographic gene expression. Neuron 81, 321–332.

Rossner, M.J., Hirrlinger, J., Wichert, S.P., Boehm, C., Newrzella, D., Hie-

misch, H., Eisenhardt, G., Stuenkel, C., von Ahsen, O., and Nave, K.A.

(2006). Global transcriptome analysis of genetically identified neurons in the

adult cortex. J. Neurosci. 26, 9956–9966.

Rowitch, D.H., and Kriegstein, A.R. (2010). Developmental genetics of

vertebrate glial-cell specification. Nature 468, 214–222.

Shulha, H.P., Cheung, I., Whittle, C., Wang, J., Virgil, D., Lin, C.L., Guo, Y., Les-

sard, A., Akbarian, S., and Weng, Z. (2012). Epigenetic signatures of autism:

trimethylated H3K4 landscapes in prefrontal neurons. Arch. Gen. Psychiatry

69, 314–324.

Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., and Ideker, T. (2011).

Cytoscape 2.8: new features for data integration and network visualization.

Bioinformatics 27, 431–432.

Sugino, K., Hempel, C.M., Miller, M.N., Hattox, A.M., Shapiro, P., Wu, C.,

Huang, Z.J., and Nelson, S.B. (2006). Molecular taxonomy of major neuronal

classes in the adult mouse forebrain. Nat. Neurosci. 9, 99–107.

Takatoh, J., Nelson, A., Zhou, X., Bolton, M.M., Ehlers, M.D., Arenkiel, B.R.,

Mooney, R., and Wang, F. (2013). New modules are added to vibrissal premo-

tor circuitry with the emergence of exploratory whisking. Neuron 77, 346–360.

Takeuchi, A., and O’Leary, D.D. (2006). Radial migration of superficial layer

cortical neurons controlled by novel Ig cell adhesion molecule MDGA1.

J. Neurosci. 26, 4460–4464.

Tam, S.J., and Watts, R.J. (2010). Connecting vascular and nervous system

development: angiogenesis and the blood-brain barrier. Annu. Rev. Neurosci.

33, 379–408.

Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice

junctions with RNA-Seq. Bioinformatics 25, 1105–1111.

Weiss, L.A., Arking, D.E., Daly, M.J., and Chakravarti, A.; Gene Discovery

Project of Johns Hopkins & the Autism Consortium (2009). A genome-wide

linkage and association scan reveals novel loci for autism. Nature 461,

802–808.

West, A.E., and Greenberg, M.E. (2011). Neuronal activity-regulated gene

transcription in synapse development and cognitive function. Cold Spring

Harb. Perspect. Biol. 3, a005744.

Willsey, A.J., Sanders, S.J., Li, M., Dong, S., Tebbenkamp, A.T., Muhle, R.A.,

Reilly, S.K., Lin, L., Fertuzinhos, S., Miller, J.A., et al. (2013). Coexpression

networks implicate human midfetal deep cortical projection neurons in the

pathogenesis of autism. Cell 155, 997–1007.

Yu, B.P., Yu, C.C., and Robertson, R.T. (1994). Patterns of capillaries in

developing cerebral and cerebellar cortices of rats. Acta Anat. (Basel) 149,

128–133.

Zerlin, M., and Goldman, J.E. (1997). Interactions between glial progenitors

and blood vessels during early postnatal corticogenesis: blood vessel contact

represents an early stage of astrocyte differentiation. J. Comp. Neurol. 387,

537–546.

Zheng, S., Gray, E.E., Chawla, G., Porse, B.T., O’Dell, T.J., and Black, D.L.

(2012). PSD-95 is post-transcriptionally repressed during early neural develop-

ment by PTBP1 and PTBP2. Nat. Neurosci. 15, 381–388, S1.

Cell Reports 6, 1–13, March 13, 2014 ª2014 The Authors 13

Page 14: Cell Reports Resource - Yale School of Medicine

Supplemental Information

Laminar and Temporal Dynamics of Coding and Noncoding RNAs in the Mouse Neocortex Sofia Fertuzinhos, Mingfeng Li, Yuka Imamura Kawasawa, Vedrana Ivic, Daniel Franjic, Darshani Singh, Michael Crair and Nenad Šestan

Page 15: Cell Reports Resource - Yale School of Medicine

SUPPLEMENTAL FIGURES

Figure S1. Quality Assessment of mRNA-seq and smRNA-seq Sequencing Data, Related to Figure 1 (A) Sequencing error rate as a function of cycle number for mRNA-seq data. The sequencing error rate was estimated by the ratio of the mismatch of the aligned spike-in reads from the reference spike-in RNA sequences. The distribution of the error rates in each sequencing cycle was shown by box plots across each sequencing cycle. (B) The 20 novel significant layer-pattern genes reported in Belgard et al. 2011 were plotted by using 6 adult samples, showing consistent patterns. (C) Heat map of novel layer-patterned genes reported in Belgard et al. 2011. These genes were re-plotted using the 36 samples from all ages. Most genes showed temporal dynamics of

Page 16: Cell Reports Resource - Yale School of Medicine

enrichment. To plot the heat map, the log2 transformed gene RPKM+1 was normalized by normalize function in R software. (D) Reads distribution across chromosomes. The box plots represent the proportions of reads uniquely mapped to each chromosome in SgL (blue), L4 (green) and IgL (red) samples. (E) Reads classification to exon, intron and intergenic regions. Reads were assigned to exon, intron and intergenic regions according to their mapping coordinates. Bar plots represent the proportions of reads residing in each annotation entry. Error bars represent s.e.m. (n=36). (F) Example of smRNA-seq base-quality distribution. The SgL of MMB1 smRNA-seq data was chosen as an example. The quality scores of the smRNA-seq reads were summarized by software FastQC. The yellow box plots show an overview of the range of quality scores across all bases at each position. The background of the graph divides the y axis into very good quality (green), reasonable quality (orange), and poor quality (red). Notably, all 36 smRNA-seq samples show a pattern similar to this example. nt, nucleotides. (G) Cumulative dependence between smRNA-seq reads and miRNAs. The accumulated percentage of uniquely mapped reads was plotted against the accumulated percentage of miRNAs ranked from highest to lowest expression for 36 samples. The vertical red line indicates the top 10 highly expressed miRNAs and the horizontal red line indicates the 80% uniquely mapped reads.

Page 17: Cell Reports Resource - Yale School of Medicine

Figure S2. Principal Component Analyses of mRNA-seq and smRNA-seq Samples, Related to Figure 2. (A and E) Distribution of variances as a function of principal components. (B and F) Two-dimensional plot of PC1 and PC2 representing the PCA analysis of 36 mRNA-seq (B) and smRNA-seq (F) samples. Each character string represents one sample and indicates its age. The male (blue) and female (red) samples were differentially colored.

Page 18: Cell Reports Resource - Yale School of Medicine

(C and G) Two-dimensional plot of PC1 and PC3 representing the PCA analysis of 36 mRNA-seq (C) and smRNA-seq (G) samples. Each character string represents one sample and indicates its age. The SgL (blue), L4 (green) and IgL (red) samples were differentially colored. (D and H) Two-dimensional plot of PC1 and PC3 representing the PCA analysis of 36 mRNA-seq (D) and smRNA-seq (H) samples. Each character string represents one sample and indicates its age. The male samples are blue, and female samples are red.

Page 19: Cell Reports Resource - Yale School of Medicine
Page 20: Cell Reports Resource - Yale School of Medicine

Figure S3. Dendrogram of mRNA-seq Samples and Expression Distribution of Genes, Introns and miRNAs, Related to Figure 2 and Figure 3 (A) Dendrogram of mRNA-seq samples in terms of sex, layer, age and RIN. Dendrogram branches show hierarchical clustering of 36 mRNA-seq samples. The colors underneath label the potential confounders: sex (male - blue, female - pink), layer (SgL - blue, L4 - green, IgL - red), age (P4 - orange, P6 - yellow, P8 - green, P10 - blue, P14 - violet, adult - black), RIN (low to high representing pink to red, ranging from 8 to 10). The samples were clustered predominately by layer and age, but not by sex or RIN. (B) The density plot of log2 transformed RPKM+1 for all annotated genes in mm9 Ensembl v63. The red vertical line indicates the threshold used for choosing reliably expressed protein-coding mRNA genes, RPKM ≥ 1 in at least 2 samples. (C) The density plot of log2 transformed reads count+1 for all annotated miRNAs in mm9 miRBase v18. The red vertical line indicates the threshold used for choosing reliably expressed miRNAs, reads count ≥ 10 in at least 2 samples. (D) The density plot of log2 transformed RPKM+1 for all annotated introns in mm9 Ensembl v63. (E) The density plot of log2 transformed read count+1 for all annotated introns in mm9 Ensembl v63. The definition of reliably expressed intron was based on the combination of RPKM ≥ 0.5 and reads count ≥ 10 in at least 2 samples, being indicated by the red vertical lines in D and E.

Page 21: Cell Reports Resource - Yale School of Medicine

Figure S4. Trajectories of Modular Eigengenes, Related to Figure 4 The spatiotemporal pattern for each module was summarized by the trajectory of the modular eigengene values, in which the eigengene was plotted against age. The trajectories for SgL (blue), L4 (green) and IgL (red) samples were separated and smoothed by loess function in R software. The full and empty circles represent the male and female samples respectively, and are differentially colored according to layers. The module index and the number of modular genes were indicated above the plot.

Page 22: Cell Reports Resource - Yale School of Medicine

Figure S5. Neural Cell Type Enrichment per Module, Related to Figure 4 In Cahoy et al. 2008, around six thousand genes were reported to be enriched in neurons, oligodendrocytes and astrocytes. These genes were intersected with our gene set of each module. The proportions of intersected genes were normalized to the same scale across cell types to facilitate the visualization of cell type specific enrichment per module. Additionally, the enrichment for each and between cell types were estimated by binomial and chi-square statistic test.

Page 23: Cell Reports Resource - Yale School of Medicine

Figure S6. Inter- and Intra-Modular Hub Genes Connectivity per Cluster, Related to Figure 5 (A-D) The inter- and intra-modular hub genes connectivity for four additional clusters as labeled. In each cluster, we calculated the Pearson correlation between hub genes. Gene pairs with correlation coefficients larger than 0.7 were chosen for network visualization. Cytoscape (Smoot et al., 2011) was used to visualize this network, in which genes were depicted as circles and the correlated genes were connected by lines. The "un-weighted force-directed layout" parameter was used to optimize the network visualization.

Page 24: Cell Reports Resource - Yale School of Medicine

Figure S7. miRNA-mRNA Regulation Prediction, Related to Figure 6 (A) Pipeline for miRNA-mRNA regulation prediction. The white squares indicate the required data for running pipeline. The gray squares represent the integrated software packages. (B) Percentage of mRNAs targeted by miRNA per cluster. The black-colored bars show the proportions of mRNAs targeted by miRNA in each cluster using TargetScan database. The grey-colored bars show the proportions of mRNAs targeted by miRNA in each cluster using software lasso-mir.R, which predicted the miRNA-mRNA regulation in the expression level with the combination of sequence-based prediction from TargetScan database. (C) miRNA cluster enrichment. The red-colored squares correspond to the enriched miRNA in vertical axis and to the associated cluster in horizontal axis.

Page 25: Cell Reports Resource - Yale School of Medicine

SUPPLEMENTAL TABLES Supplemental Tables are provided in a single Microsoft Excel file Table S1. Sample Metadata, Spike-In RNAs and Human Donor Metadata, Related to Figure 1 (A)This table summarizes the information for 36 samples and high throughput sequencing, providing age (range is P4 to adult), layer (SgL, L4 and IgL), sex (6 males and 6 females), and RNA integrity number (RIN, range is 8.5 to 10) for each individual sample. For the sequencing information, it provides the spike-in master (tagging mRNA-seq samples), TruSeq index (barcode), HiSeq lane and uniquely mapping reads for mRNA-seq (median > 10 million) and smRNA-seq (median > 1 million) data. (B) List of spike-in RNAs used for mRNA-seq sample tagging and sequencing quality control. In total, 10 spike-in RNAs with different concentrations were used. The length and GC content were provided for each spike-in RNA. Additionally, the 18 pairs of spike-in RNAs used for sample tagging were provided. (C) Metadata from human tissue is presented, including specimen ID, age (range is 12PCW to 30 year), sex (4 males and 6 females), postmortem interval (PMI, which ranged from 2 to 20 hours), pH of cerebellum (range is 6 to 6.92), ethnicity, cause of death, and medical history. Table S2. Differentially Expressed Genes and miRNAs, Related to Figure 2 (A) Differentially expressed protein-coding genes between layers at any age and between ages at any layer. These DEX genes were identified from reliably expressed protein-coding genes (RPKM ≥ 1 in at least 2 samples) and the difference threshold is adjusted p-value (FDR < 0.01). (B) Differentially expressed miRNAs between layers at any age and between ages at any layer. These DEX miRNAs were identified from reliably expressed miRNAs (reads count ≥ 10 in at least 2 samples) and the difference threshold is adjusted p-value (FDR < 0.01). Table S3. Gene Function Enrichment Analyses for sDEX, tDEX, stDEX and DEX for Three Stages, Related to Results and Discussion (A-C) The tables summarize the gene function enrichment analyses for 3034 spatiotemporal differentially expressed genes. It was generated using DAVID Bioinformatics Resources (http://david.abcc.ncifcrf.gov/). The associated gene list and statistical enrichment significance assessment were provided for each function term. (D)The hierarchical sample clustering analysis showed three isolated stages: P4, P6 to P10, and P14 and adult (Figure S3A). The gene function enrichment analyses, generated using DAVID Bioinformatics Resources, were performed for spatial DEX genes in each layer and each stage. The associated gene list and statistic enrichment significance assessment were provided for each function term. Table S4. Alternative Spliced Genes, Related to Figure 3 Investigation of five basic splicing modalities: cassette exon(s), mutually exclusive exon(s), alternative donors or acceptors, alternative first or last exon, and intron retention. The first four modalities were identified by software JuncBASE with the requirement of > 25 supported reads and FDR < 0.01 as the difference threshold. The intron retention events were identified by software RSEQtools with the requirement of intronic RPKM ≥ 0.5, intronic reads count ≥ 10, and RPKM fold change > 2 as the difference threshold. The differentially alternative spliced (DAS) genes were surveyed between layers at any age and between ages at any layer. Table S5. WGCNA Modules, Related to Figure 4

limingfeng
Sticky Note
Space line
Page 26: Cell Reports Resource - Yale School of Medicine

List of 40 modules identified by weighted gene co-expression network analysis (WGCNA). The reliability of detected modules was checked/re-organized by custom R script to ensure that a gene had its largest correlation with its modular eigengene. The top 10 genes highly correlated with modular eigengenes were chosen as the modular hub genes. The gene function enrichment analyses for 40 modules generated using DAVID Bioinformatics Resources. The associated gene list and statistic enrichment significance assessment were provided for each function term. Table S6. miRNA-mRNA Interaction Pairs, Related to Figure 6 List of predicted interaction pairs of miRNA-mRNA by software lasso_mir.R with using the sequenced-based prediction from TargetScan database and the expression values for mRNA and miRNA. The pairs with rank score ≥ 80 were chosen as reliable miRNA-mRNA interactions.

Table S7. Signature Trajectory of Neocortical Developmental Events, Related to Figure 7

List of genes used to study the signature trajectory of neocortex developmental events concerning neuronal morphology, synaptogenesis and activity. Genes found to be enriched in certain layer(s) are indicated accordingly.

Page 27: Cell Reports Resource - Yale School of Medicine

EXTENDED EXPERIMENTAL PROCEDURES

1. Introduction In this Supplemental Information we provided technical descriptions of data generation and analyses. We also made available additional data that were discussed in the main manuscript. Finally, we presented supplemental figures and tables generated from sample metadata and specific gene lists.

2. Laminar Microdissection We took advantage of the Dcdc2a-Gfp reporter mouse generated by the GENSAT project (Heintz, 2004) in which GFP was preferentially expressed in the postnatal L4 starting around P2. Mice were decapitated and the brain was removed immediately from the skull and submerged in ice-cold oxygenated artificial cerebrospinal fluid (ACSF) for 5 min. Using a

vibratome (Leica VT 1200 S) we prepared live sagittal brain slices (250 m thick), keeping the cutting brain and cut slices submerged in oxygenated ACSF at ice-cold temperature at all times to reduce enzymatic activity and RNA degradation. The dissection of SgL (L2/3 including marginal zone or L1, and pia), L4 and IgL (L5/6 including subplate or white matter) was

performed on each 250 m slice, while keeping the section submerged in ice-cold ACSF inside a petri dish, using a scalpel and a dissecting needle under a fluorescence stereoscope (Discovery V8, Zeiss). Each sample was collected into a safe-lock microcentrifuge tube

(Eppendorf) with 30 L of RNAlater®, for subsequent total RNA extraction.

3. RNA Isolation, Library Preparation and Sequencing 3.1 RNA Extraction A bead mill homogenizer (Bullet Blender, Next Advance) was used to homogenize the tissue. Each tissue sample was transferred to a safe-lock microcentrifuge tube (Eppendorf). A mass of stainless steel beads (Next Advance, cat# SSB14B) equal to the mass of the tissue was added to the tube. Two volumes of lysis buffer were added to the tube. Samples were mixed in the Bullet Blender for 1 min at a speed of six. Samples were visually inspected to confirm desired homogenization and then incubated at 37 °C for 5 min. The lysis buffer was added up to 0.6 ml, and samples were mixed in the Bullet Blender for 1 min. Total RNA was extracted using RNeasy Plus Mini Kit (Qiagen). Optical density values of extracted RNA were measured using NanoDrop (Thermo Scientific) to confirm an A260:A280 ratio above 1.9. RIN was determined for each sample using Bioanalyzer RNA 6000 Nano Kit or Bioanalyzer RNA 6000 Pico Kit (Agilent) (Table S1A), depending upon the total amount of RNA. 3.2 Library Preparation and Sequencing for mRNA The cDNA libraries were prepared using the TruSeq mRNA Sample Prep Kit (Illumina) as per the manufacturer’s instructions with some modifications. Briefly, polyA RNA was purified from 1 µg of total RNA using oligo (dT) beads. Quaint-IT RiboGreen RNA Assay Kit (Invitrogen) was used to quantitate purified mRNA with the NanoDrop 3300 (Thermo Scientific). Following mRNA quantitation, 2.5 µl spike-in master mixes, containing five different types of RNA molecules at varying amount (2.5 × 10-7 to 2.5 × 10-14 mol), were added per 100 ng of mRNA (Jiang et al., 2011), (Tables S1A and S1B). The spike-in RNAs were synthesized by External RNA Control Consortium (ERCC) by in vitro transcription of de novo DNA sequences or of DNA derived from the B. subtilis or the deep-sea vent microbe M. jannaschii. Both genomes were generous gifts of Dr. Mark Salit at The National Institute of Standards and Technology (NIST). Each sample was tagged by adding a pair of spike-in RNAs unique to the region from which the sample was taken. Also, an additional three common spike-ins were added for controlling sequencing error rates. Spike-in sequences are available at http://www.hbatlas.org/files/spike_in.fa. The mixture

Page 28: Cell Reports Resource - Yale School of Medicine

of mRNA and spike-in RNAs was subjected to fragmentation, reverse transcription, end repair, 3’– end adenylation, and adaptor ligation, followed by PCR amplification and SPRI bead purification (Beckman Coulter). The unique barcode sequences were incorporated in the adaptors for multiplexed high-throughput sequencing. The final product was assessed for its size distribution and concentration using Bioanalyzer DNA 1000 Kit (Agilent). For mRNA-sequencing, 6 libraries were pooled per HiSeq lane sequencing (Table S1A) and diluted to 10 nM in EB buffer (Qiagen) and then denatured using the Illumina protocol. The denatured libraries were diluted to 15 pM, followed by cluster generation on a single-end HiSeq flow cell (v1.5) using an Illumina cBOT, according to the manufacturer's instructions. The HiSeq flow cell was run for 75 cycles using a single-read recipe (v2 sequencing kit) according to the manufacturer's instructions. 3.3 Library Preparation and Sequencing for Small RNA The TruSeq Small RNA Sample Kit (Illumina) was used to prepare cDNA libraries as per the manufacturer’s instructions. Briefly, 1 µg of total RNA was ligated with 3’– and then 5’– adapters, followed by reverse transcription and PCR amplification. The PCR utilizes 48 different types of primer that will add 48 different index sequences to the adapters. Each library was assessed for the presence of desired miRNA population by Bioanalyzer High Sensitivity DNA Kit. We pooled 20 and 19 samples (Table S1A) with distinct indexes, which allow subsequent retrieval of each sample from multiplexed sequencing runs. Each pooled library was size-selected by gel excision. The final product was assessed for its size distribution and concentration using Bioanalyzer DNA 1000 Kit. The library was diluted to 10 nM in EB buffer and then denatured using the Illumina protocol. The denatured libraries were diluted to 15 pM, followed by cluster generation on a single-end HiSeq flow cell (v1.5) using an IlluminacBOT, according to the manufacturer's instructions. The HiSeq flow cell was run for 75 cycles using a single-read recipe (v2 sequencing kit) according to the manufacturer's instructions.

4. mRNA Sequencing Data Analyses 4.1 Reads Filtering, Processing and Alignment Reads passed the default purify filtering of Illumina CASAVA pipeline (released version 1.7) and were aligned to mouse reference genome (NCBI37/mm9, July 2007), which was downloaded from the UCSC genome browser. We trimmed one nucleotide in 5’-end, leaving 74 nucleotide long reads for sequence alignment. We used Tophat (v1.0.13) to align the reads onto mouse reference genome (Trapnell et al., 2009). We set multiple hit parameter (-g) equal to 1. Reads having multiple genomic hits were excluded, using solely uniquely mapped reads for all downstream analyses. We additionally changed the default 25 segment length to a larger 35 segment length to decrease the possibility of reads having multiple mapping. The segments are mapped independently, allowing up to 2 mismatches in each segment alignment.

4.2 Quality Assessment As described below, several quality control measures were implemented throughout sample preparation and transcriptome data generation steps. 4.2.1 Spike-In Control RNAs As mentioned in section 3.2, we used different combinations of two spike-in RNAs to tag different samples (Table S1B). This tagging avoids mixing samples during library preparation or sample loading into the sequencer. In addition, three more common spike-in RNAs were added, combining with the two tagging spike-in RNAs to estimate the sequencing error rate of the sequencer. Since we knew the exact sequences of spike-in RNAs, the mismatches within sequenced reads from the reference spike-in RNAs were considered sequencing errors by the sequencer. Thus, the percentage of mismatches was used to quantify the sequencing error rate.

Page 29: Cell Reports Resource - Yale School of Medicine

In principal, the sequencing error rate depends on the number of sequencing cycles. It is expected that the longer the read length is, the higher the error rate will be. The results showed that the median values of the percentage of mismatches in each sequencing cycle were less than 2%, which indicated the high quality of the sequenced reads. Also, we found slightly higher error rates at both ends of the reads, which meant that both PCR and other machine related issues were negligible (Figure S1A). 4.2.2 Transcribed Fraction of Genome/Exon across Each Chromosome The mouse reference genome is composed of nineteen autosomal chromosomes, one mitochondrial chromosome and two sex chromosomes. The sequenced reads were aligned onto the mouse reference genome to calculate the percentage of reads mapping to each chromosome. In principal, the percentage should be proportional to chromosome length because of the expected uniformity of reads mapping to the genome (Figure S1D). Furthermore, we considered a genomic nucleotide to be transcribed when it was aligned by at least one read. The genome transcription ratio per chromosome was quantified by the proportion of transcribed nucleotides to the chromosome length (Figure 1F). Similarly, the ratio of transcribed exons per chromosome was quantified by the proportion of transcribed nucleotides to the sum of exon length within the chromosome (Figure 1F). The non-exon mapping reads were assigned to introns and intergenic regions (Figure S1E). Together, these data enabled the investigation of mRNA-seq genome uniformity and transcriptome coverage. 4.2.3 Layer Specific Marker Genes A list of known and characterized layer specific marker genes were previously published (Chen et al., 2005; Kwan et al., 2008; Nakagawa and O'Leary, 2003; Nieto et al., 2004). Additionally, we used a recent mRNA-seq data from Belgard et al, 2011 for further characterization of layer specific markers (Figures S1B and S1C) (Belgard et al., 2011) 4.2.4 Hierarchical Clustering and Principal Component Analysis We used two approaches to do clustering analysis in order to ensure high quality of mRNA-seq data. Firstly, the "reads per kilobase of exon model per million mapped reads" (RPKM) values of reliably expressed protein-coding genes (see section 4.3) were log2 transformed and quantile normalized across all samples by R bioconductor limma (Smyth, 2004). Subsequently, we used R package WGCNA to perform average-linkage hierarchical clustering (Hclust) of all samples based on the processed gene RPKM data by using 1-r as distance (or dissimilarity), where r was Spearman correlation. Hierarchical clustering revealed notable clustering of the samples firstly by ages and then by layers, which were represented by the different color bars under the cluster (age: orange (P4) yellow (P6) green (P8) blue (P10) violet (P14) black (adult); layer: blue (SgL), green (L4), red (IgL) (Figure S3A). There was no obvious sex clustering. The RIN values of the samples were all high and do not show any clustering (Figure S3A). On the other hand, we used R package prcomp to perform principal components analysis (PCA) on the processed gene RPKM data. Taking account of the comparable amount of attribution between PC2 and PC3, the plots between PC1 and PC3 were provided (Figure S2) in addition to the plots between PC1 and PC2 (Figures 2A and S2). Furthermore, each sample was differentially colored according to its sex, i.e. female (red) and male (blue). There was no significant clustering by sex (Figure S2).

4.3 Gene and Exon Expression The gtf formatted gene/exon/transcript annotations of mm9 reference genome were retrieved from Ensembl (NCBI37/mm9, released version 63). In total, 36,814 genes were annotated, including 22,667 protein-coding genes and 14,147 non-protein-coding genes. In this work, the mRNA-seq data analyses only focused on protein-coding genes because of the greater

Page 30: Cell Reports Resource - Yale School of Medicine

complexity and lower global expression values of non-protein-coding genes (Cabili et al., 2011).. Hereafter, "gene" refers to a protein-coding gene. We used RSEQTools to calculate the gene expression (Habegger et al., 2011). Briefly, we first used the mergeTranscripts program to generate the composite gene model in which the exons from different transcript isoforms were merged into the longest gene. Next, we used the program mrfQuantifier to calculate the reads count per gene, as well as the RPKM for each gene (Mortazavi et al., 2008). To reduce false positive results, we focused on reliably expressed genes, which were defined by the requirement of RPKM ≥1 in at least 2 samples (Figure S2B). After these filters we obtained 12,729 reliably expressed protein-coding genes for the downstream analyses. 4.4 Spatiotemporal DEX Genes We used R package DESeq to identify differentially expressed (DEX) genes (Anders and Huber, 2010). The DEX genes were detected from the reliably expressed protein-coding genes. We did not investigate genes with low expression because they were prone to be affected by background noise. As mentioned in section 4.3, we generated two types of gene expression value. The reads count per gene served as the input for DESeq. Also, we used one male sample and one female sample for each layer at age. They were treated as biological replicates to improve the reliability of DEX genes because DESeq was more reliable at comparing groups with replicates. When performing the comparison, DESeq first gets the mean expression level as a joint estimate for both groups, and then calculates the difference as well as the p-value for the statistical significance of this change. The adjusted p-value was also calculated based on multiple testing with the Benjamini-Hochberg procedure, estimating the false discovery rate (FDR). In this work we set one stringent criterion, FDR < 0.01, so as to detect statistically significant DEX genes. The DEX genes can be split into two types, spatial DEX (sDEX) genes and temporal DEX (tDEX) genes. Genes that were differentially expressed in at least one layer at any given age were defined as spatial DEX genes. Similarly, genes that were differentially expressed in at least one age in any given layer were defined as temporal DEX genes. To identify spatial DEX genes, we performed pairwise comparisons across layers in P4, P6, P8, P10, P14 and adult. The total spatial DEX genes were the summation of all pairwise-layer DEX genes. To identify temporal DEX genes, we did pairwise comparisons across ages in SgL, L4 and IgL. The total temporal DEX genes were the summation of all pairwise-age DEX genes (Table S2A). We found that most of spatial DEX genes were temporally regulated, and half of temporal DEX genes were spatially regulated. Only a small number of spatial DEX genes were evenly expressed throughout all ages. These genes were reported as layer specific markers (see section 4.2.3). So, the sDEX genes were purified to genes that were differentially expressed between layers but not between ages, the tDEX genes were purified to genes differentially expressed between ages but not between layers. We introduced the third type of DEX genes, spatiotemporal DEX (stDEX) genes, to represent genes that were differentially expressed among layers and ages. 4.5 Alternative Splicing and Intron Retention Reads mapping to exon-exon junctions allowed us to study gene splicing events. Although some papers propose the use of statistical tools to estimate the expression of the alternate gene transcripts, this cannot compare in accuracy to the use of detected exon-exon junction reads providing evidence of splicing events that actually occurred. In this work, we used JuncBASE (Junction Based Analysis of Splicing Events) to identify the six different types of alternative splicing (AS) events: cassette exon, alternative donor, alternative acceptor, mutually exclusive exons, alternative first exons, and alternative last exons (Brooks et al., 2011). After performing the JuncBASE pipeline, we set two criteria to select statistically significant alternative splicing events. For the AS events expressed in both conditions, we used a threshold of FDR < 0.01. For the AS events expressed only in one condition, we used the supported reads count > 25 as the

Page 31: Cell Reports Resource - Yale School of Medicine

threshold. Similar to the analysis of DEX, we performed alternative splicing analysis (AS) across layers and ages. We combined the biological replicates to one sample because JuncBASE currently cannot handle replicates as separate inputs. We performed pairwise comparisons between different layers and between different ages. The differentially alterative splicing (DAS) found between layers but not between ages were defined as spatial DAS (sDAS), the DAS found between ages but not between layers were defined as temporal DAS (tDAS), and the DAS found between ages as well as between layers were defined as spatiotemporal DAS (stDAS) (Table S4). We used RSEQtools to build intron annotations and calculated the RPKM and reads count. The expression values were used to estimate the retention of introns. To set the proper threshold, we simulated the distribution of intronic RPKM and reads count for the all potentially expressed introns, of which at least one read hits in the introns. The combination of RPKM ≥ 0.5 and raw reads count ≥ 10 were used as the threshold (Figures S2D and S2E). In additional, we did the comparison of intron retention events between layers and between ages. We used RPKM fold change > 2 to detect spatial and temporal different intron retention events (Table S4). 4.6 Weighted Gene Co-Expression Network Analysis We used R package WGCNA to perform weighted gene co-expression network analyses for the reliably expressed protein-coding genes (Langfelder and Horvath, 2008). In terms of the calculation details, the gene RPKM values were firstly processed by log2 transformed quantile normalization across all samples by R bioconductor limma (Smyth, 2004). Next, we used pickSoftThreshold function to analyze the network topology and chose 4 as soft-threshold power. In order to do automatic network construction and module detection, we used blockwiseModules function to generate signed network. Modules with fewer than 10 genes were merged to their closest larger neighbor module. In total, this analysis produced 40 modules (Table S5). For each module, WGCNA generated the eigengene to characterize the modular feature. To check the reliability of detected modules, the custom R script was used to calculate the correlation between gene and modular eigengene. Genes would be re-assigned to another module to ensure each gene having the largest correlation coefficient with its own modular eigengene. Overall, only a small number of genes were re-assigned to other modules. We used the moduleEigengenes function to re-calculate the eigengene for the changed modules. To better understand the modular feature, we plotted the trajectories of the modular eigengenes (Figure S4), which were shown the same spatiotemporal patterns as heat map characterization of gene expression in each module. The module M40 with 14 genes clearly showed a male bias (i.e. Y chromosome or male enriched genes). For the other 39 modules, we found some modules showed similar patterns. So, we performed hierarchical clustering analysis and found the 39 modules can be classified into 5 clusters (Figures 4A, 4B, S4 and the following table). In summary of the 5 clusters, the first and fifth clusters were temporal related clusters, i.e. up and down regulated along development, and the other three clusters were spatial related clusters, i.e. separately enriched in certain layers.

Cluster Feature Module

I Up regulated M3, M22, M2, M19, M14, M24, M8, M30, M25, M12, M11

II IgL enriched M26, M34, M31, M13, M27

III SgL enriched M7, M21, M35, M28, M29, M5, M32

IV L4 enriched M36, M37, M33, M39, M9

V Down regulated M38, M15, M16, M1, M18, M4, M23, M17, M20, M6, M10

Page 32: Cell Reports Resource - Yale School of Medicine

Moreover, we used network methods to visualize the relationships between modules within a cluster. In each module, we chose the top 10 genes highly correlated with the eigengene as the module hub genes (Table S5). In each cluster, we calculated pairwise Pearson correlations among hub genes. Gene pairs with correlation coefficients larger than 0.7 were chosen for network visualization. The free software Cytoscape (Smoot et al., 2011) was used to present networks in which genes were depicted as circles and, correlated genes were connected by lines. The "un-weighted force-directed layout" parameter was used to optimize network visualization (Figures 5C and S6). When looking closer at the connectivity network, we noticed that in each cluster, one or two modules, typically associated with a specific biological process, received more inter-modular connections than others, as if becoming the focus of the molecular interactions happening in that given spatial or temporal cluster. We define them as "core module" of cluster. The core modules are: cluster I (M2), cluster II (M13) cluster III (M7 and M21), cluster IV (M37), and cluster V (M1 and M16) (Figures 5B and S6).

5. Small RNA Sequencing Data Analysis 5.1 Sequencing Quality Assessment Reads passed the default purify filtering of Illumina CASAVA pipeline (released version 1.7), and were input to software FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to perform the sequencing quality assessment of small RNA-seq (smRNA-seq) samples. FastQC generated a comprehensive report on the composition and quality of the sequenced reads. The per-base quality plot shows an overview of the range of quality scores across all bases at each position and then provides an easy way to justify the sequencing quality. In our samples the majority Phred scores were above 30 in each sequencing cycle, suggesting high sequencing quality of the smRNA-seq samples (Figure S1F). 5.2 Adapter Trimming and Sequence Alignment The FASTX-Toolkit provides a series of commands to preprocess the sequencing reads (http://hannonlab.cshl.edu/fastx_toolkit/download.html). We used fastx_clipper command to clip the Illumina small RNA 3’ adapter (TGGAATTCTCGGGTGCCAAGG) in 3’– end of reads. We discarded these clipped reads with lengths shorter than 15 nucleotides and larger than 49 nucleotides. The length distribution of the clipped reads was definitely enriched for miRNA length, i.e. 22 nucleotides (Figure 1G). Subsequently, we used software miRanalyzer (released version 0.2) and some PERL scripts together to build the miRNA analysis pipeline (Hackenberg et al., 2011; Hackenberg et al., 2009). First, we used PERL script groupGAreads.pl to collapse the clipped reads to one multi-fasta format file. Second, we retrieved the annotation of miRNA and pre-miRNA from miRBase (released version 18) (Kozomara and Griffiths-Jones, 2011) and used bowtie-build function in software Bowtie to build annotation libraries. Finally, miRanalyzer drove Bowtie to orderly align the collapsed reads to miRNA and pre-miRNA. The reads uniquely aligned to either library or unique annotation entries were taken for downstream analyses. 5.3 Expression of miRNA The mouse miRNA annotations were retrieved from miRBase (released version 18), in which 741 pre-miRNA and 1,157 miRNA were deposited. We found 29 miRNA with identical sequences. They were merged to 13 unique mature miRNA, and then 1,141 miRNA were used as the annotation database. The sequence of miRNA was the consensus sequence in many different experiments (http://www.mirbase.org/). Reads aligned to pre-miRNA but not miRNA were attributed to divergence from the consensus sequence. To better estimate the expression levels of miRNA, reads aligned to pre-miRNA were assigned to miRNA based on its aligned locus, i.e. reads closer to the 3’– end were assigned to the annotated 3’– miRNA, and so on. In this work we focused only on the known miRNA, and reads aligned to other genomic regions were ignored. Moreover, we mainly focused on reliably expressed miRNA, which were defined

Page 33: Cell Reports Resource - Yale School of Medicine

as having reads count ≥ 10 in at least 2 samples (Figure S2C). The analysis of miRNA with fewer aligned reads would increase the possibility of false positive results. After these stringent filters, we obtained 436 reliably expressed miRNA. 5.4 Principal Components Analysis We performed principal component analysis to ensure the high quality of the smRNA-seq data. To remove any possible external effects, the reads per million mapped reads (RPM) values of all reliably expressed miRNA were processed by log2 transformed quantile normalization across all samples by R bioconductor limma (Smyth, 2004). We used R package prcomp to perform principal components analysis (PCA) on the processed miRNA RPM data. The plots between PC1 and PC3 were provided in addition to the plots between PC1 and PC2 (Figures 2B and S2). Furthermore, each sample was differentially colored according to sex (female red, and male blue). The mixing of female and male samples was the evidence of litter sex effect (Figure S2). 5.5 Spatiotemporal DEX miRNA We used R package DESeq to identify differentially expressed (DEX) miRNA (Anders and Huber, 2010). The DEX miRNA was detected from the reliably expressed miRNA. We did not investigate miRNA with low expression because they were prone to be affected by background noise. The reads count per miRNA served as the input for DESeq. Also, we used one male sample and one female sample for each layer in any age. They were treated as biological replicates to improve the reliability of DEX miRNA identification because DESeq was more reliable to compare groups with replicates. When performing the comparison, DESeq firstly gets the mean expression level as a joint estimate for both groups, and then calculates the difference as well as the p-value for the statistical significance of this change. The adjusted p-value was also calculated based on multiple testing with the Benjamini-Hochberg procedure, estimating the false discovery rate (FDR). In this paper, we set one stringent criteria, FDR < 0.01, so as to detect reliable DEX miRNA. The DEX miRNA can be split into two types, spatial DEX miRNA and temporal DEX miRNA. miRNA that were differentially expressed in one layer at any given age were defined as spatial DEX miRNA. Similarly, miRNA that was differentially expressed in one age in any given layer were defined as temporal DEX miRNA. To identify spatial DEX miRNA, we did pairwise comparison across layers in P4, P6, P8, P10, P14 and adult. The total spatial DEX miRNA were the summation of all pairwise-layer DEX miRNA. To identify temporal DEX miRNA, we did pairwise comparison across ages in SgL, L4 and IgL. The total temporal DEX miRNA were the summation of all pairwise-age DEX miRNA (Table S2B). We found that most of spatial DEX miRNAs were temporally regulated, and some temporal DEX miRNAs were spatially regulated. Similar to the classification of DEX genes, the sDEX miRNAs were purified to miRNAs that were differentially expressed between layers but not between ages, the tDEX miRNAs were purified to miRNAs differentially expressed between ages but not between layers. We introduced the third type DEX miRNAs, spatiotemporal DEX (stDEX) miRNAs, to represent miRNAs that differentially expressed between layers as well as between ages. 5.6 miRNA-mRNA Regulation Prediction In general the targets of miRNA in mammalians are predicted by seed sequence of miRNA complementary to the 3’- UTR regions of mRNA (Lewis et al., 2005). It is also recommended that Pearson correlation analysis be used to dig the anti-correlation relationship between miRNA and its targets in the expression level (Hsu et al., 2011; Xiao et al., 2009). In this paper, we combined the sequence and expression information together to investigate the regulation relationship between miRNA and its mRNA protein-coding gene targets (mRNA hereafter) (Figure S7A). In terms of the analysis details, we first retrieved miRNA-mRNA regulation pairs

Page 34: Cell Reports Resource - Yale School of Medicine

from TargetScan database (released version 6.2) (Grimson et al., 2007), in which all regulated pairs were predicted based on the complementary sequence. To reduce false positive results, we restricted the analyses to miRNA and its conserved targets, which were deposited in the file of "Conserved_Site_Context_Scores.txt ". In total, 12,558 mRNAs and 771 miRNAs were included. Second, we used software lasso_mir.R to predict the regulation between mRNA and miRNA expression levels (Lu et al., 2011). This approach allowed us to use the Lasso regression model for the identification of miRNA-mRNA targeting relationships that combines sequence based prediction information, miRNA co-regulation, RISC availability, and mRNA/miRNA abundance data. We used a stringent criteria with rank score ≥ 80 to choose reliable miRNA-mRNA regulation pairs (Table S6). To visualize the regulation relationship between the miRNA and all of its targeted mRNAs, the RPM values of miRNAs and the RPKM values of mRNAs were normalized to the same scale by using normalize function in R package som. The R package lowess were used to build smoothed curves for the normalized values of miRNA and targeted mRNAs, profiling the anti-correlation relationship. To further understand miRNA-mRNA regulation relationship, we analyzed the co-expressed mRNAs. We had already identified 5 clusters, in which mRNAs showed the similar spatiotemporal pattern (see section 4.6). We then tried to determine how these patterns were contributed to by miRNA regulation. We counted how many mRNAs were potentially regulated by at least one miRNA in each cluster. The proportion of targeted mRNAs was calculated separately using TargetScan database and Lasso prediction (Figure S7B). Moreover, we tried to find the cluster-enriched miRNAs in the hope of identifying the main regulators/contributors to the characteristic spatiotemporal pattern of the cluster (Figure S7C). Briefly, we first collected the miRNAs per cluster in which miRNA has at least one targeted mRNA included in the cluster. Next, we performed an enrichment test by comparing the targeted mRNAs in the cluster with its expected targeted mRNAs. The p-values were calculated for each miRNA to quantify statistical significance of enrichment. The adjusted p-value was calculated based on multiple testing with the Benjamini-Hochberg procedure, estimating the false discovery rate (FDR). We used one stringent criterion, FDR < 0.01, to detect reliable cluster-enriched miRNAs (Figure S7C).

6. Validation of Gene/Exon Expression by PCR 6.1. Quantitative Real Time RT-PCR An aliquot of the total RNA that was previously extracted from each brain region was used for secondary validation through real-time PCR analysis. One µg of total RNA was used for cDNA synthesis using SuperScript III First-strand synthesis Supermix (Invitrogen) and subsequently diluted with nuclease-free water. TaqMan Gene Expression Assay was used for each gene of interest along with TaqMan Universal Master Mix (Applied Biosystems). PCR reactions were conducted on an ABI 7900 Sequence Detection System (Applied Biosystems) and the resulting Ct value (cycle number at threshold) was used to calculate the relative amount of mRNA molecules. The Ct value of each target gene was normalized by subtraction of the Ct value from Gapdh to obtain the ΔCt value. The relative gene expression level was shown as 2-ΔCt. 6.2. Exon Specific PCR The same cDNA solution was used for the mouse Dlg2 PCR. For human DLG2, cDNA was synthesized in the same manner for human total RNA of the somatosensory neocortical area at comparative developmental periods to the current study (period3: Early fetal: 10-13 postconceptional weeks (PCW), period6: Late mid-fetal: 19-24 PCW, period7: Late fetal: 24-38 PCW, period 10: Early childhood: 1-6 years old, period13: young adulthood: 20-40 years old, sample details are described elsewhere (Kang et al., 2011). Exon-specific high-melting temperature primers were designed using NCBI/Primer-BLAST (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) and nucleotide sequences for each primer set are shown in the table below.

Page 35: Cell Reports Resource - Yale School of Medicine

PCR was performed using Phusion High-Fidelity DNA Polymerase (NEB) under the following conditions: activation at 98ºC for 30 sec, followed by 30 cycles at 98°C for 15 seconds, 68ºC for 30 sec, and 72ºC for 15 sec. PCR products were applied to an Agilent Bioanalyzer or Tapestation for quantification of each band that is specific to either inclusion or exclusion of an alternative exon.

Gene Forward primer (5' -> 3') Reverse primer (5' -> 3')

mouse Dlg2 CCCCGGATTAGGTGACGACGGT TCCTGCCTCGTGACAGGTTCA

human DLG2 ATCCCCGGATTAGGTGACGA CCTGCCTTGTAACAGGCTCA SUPPLEMENTAL REFERENCES Cabili, M.N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., and Rinn, J.L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915-1927. Hackenberg, M., Sturm, M., Langenberger, D., Falcon-Perez, J.M., and Aransay, A.M. (2009). miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res. 37, W68-76. Hsu, S.D., Lin, F.M., Wu, W.Y., Liang, C., Huang, W.C., Chan, W.L., Tsai, W.T., Chen, G.Z., Lee, C.J., Chiu, C.M., et al. (2011). miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res. 39, D163-169. Jiang, L., Schlesinger, F., Davis, C.A., Zhang, Y., Li, R., Salit, M., Gingeras, T.R., and Oliver, B. (2011). Synthetic spike-in standards for RNA-seq experiments. Genome Res 21, 1543-1551. Langfelder, P., and Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559. Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20. Lu, Y., Zhou, Y., Qu, W., Deng, M., and Zhang, C. (2011). A Lasso regression model for the construction of microRNA-target regulatory networks. Bioinformatics 27, 2406-2413. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621-628. Nakagawa, Y., and O'Leary, D.D. (2003). Dynamic patterned expression of orphan nuclear receptor genes RORalpha and RORbeta in developing mouse forebrain. Dev. Neurosci. 25, 234-244. Nieto, M., Monuki, E.S., Tang, H., Imitola, J., Haubst, N., Khoury, S.J., Cunningham, J., Gotz, M., and Walsh, C.A. (2004). Expression of Cux-1 and Cux-2 in the subventricular zone and upper layers II-IV of the cerebral cortex. J. Comp. Neurol. 479, 168-180. Smyth, G.K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article3.

Page 36: Cell Reports Resource - Yale School of Medicine

Xiao, F., Zuo, Z., Cai, G., Kang, S., Gao, X., and Li, T. (2009). miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 37, D105-110.