analytical approaches to rna profiling data for the identification

39
1 Analytical approaches to RNA profiling data for the identification of genes enriched in specific cells Joseph D. Dougherty, Eric F. Schmidt, Miho Nakajima, Nathaniel Heintz Laboratory of Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, NY. 10065

Upload: others

Post on 03-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

1

Analytical approaches to RNA profiling data for the identification of genes enriched in specific cells

Joseph D. Dougherty, Eric F. Schmidt, Miho Nakajima, Nathaniel Heintz

Laboratory of Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, NY. 10065

2ABSTRACT We have recently developed a novel method for the affinity purification of the complete suite

of translating mRNA from genetically labeled cell populations. This method permits comprehensive quantitative comparisons of the genes employed by each specific cell type. We provide a detailed description of tools for analysis of data generated with this and related methodologies. An essential question that arises from these data is how to identify those genes that are enriched in each cell type relative to all others. Genes relatively specifically employed by a cell type may contribute to the unique functions of that cell, and thus may become useful targets for development of pharmacological tools for cell specific manipulations. We describe here a novel statistic, the Specificity Index, which can be used for comparative quantitative analysis to identify genes enriched in specific cell populations across a large number of profiles. This measure correctly predicts in situ hybridization patterns for many cell types. We apply this measure to a large survey of CNS cell specific microarray data to identify those genes that are significantly enriched in each population.

3INTRODUCTION

The mammalian brain is the most complex organ of the body, containing hundreds of intermingled cell populations. These cells can be classified into types according to their morphology, projections, functions and gene expression profiles. Currently, in vivo analysis of gene expression and translation in particular cell types is often performed with methodologies that are non-parallel and difficult to quantify. Because of this, it remains a challenge to determine the complete set of proteins employed by a given cell type, determine which genes are expressed in or specific to a particular cell type relative to all others, or establish the degree to which a given cell population is unique in the nervous system.

Previously, we have described a method, translating ribosome affinity purification (TRAP, Supplemental Figure 1) for the isolation of translating mRNA from individual, genetically defined, cell types (1,2). In this method, transgenic mice are generated which express a fusion of eGFP and a ribosomal protein under the control of a Bacterial Artificial Chromosome (BAC)(3) for a cell-specific 'driver' gene. A complete translational profile of all ribosome bound mRNAs is then generated from these labeled cells via brain homogenization and affinity purification with anti-eGFP antibodies. Relative quantities of the purified mRNAs are assessed via microarray or related technologies. Thus, for any cell type for which a driver gene can be identified, the methodology permits a comprehensive translational profile to be prepared for all genes. The TRAP protocol is rapid, simple, and requires no specialized equipment. This method permits the deconstruction of the complexity of the nervous system, allowing researchers access to individual cell types within the context of the whole brain, with sensitivity sufficient to study whole animal manipulations such as drug treatments, experimental injuries, or genetic manipulations(2).

The fundamental impetus for the development of the TRAP methodology was to allow the rapid and reproducible cell-specific assessment of RNA translation. Microarray analysis, as traditionally applied to the nervous system, results in data representing the aggregate RNA from all of the cell types present in the tissue(4), proportional to the percentage of those cells present and the relative amount of RNA they produce. This has several implications regarding the interpretation of these data(5). As the observed signal on the array represents an averaging of the levels of the transcript in each of these cell types, RNAs present in all cell types, even at moderate levels, will have fairly high observed values compared to RNAs present at high levels, but in rare cell types. In fact, such mRNAs may even be undetectable because they represent a small fraction of the total tissue RNA(1). Furthermore, as the RNAs from all the cell types are measured in aggregate, any changes in RNA levels measured in the whole tissue are not easily attributed to any particular cell type. Detected perturbations in RNA levels could be due to the death of one cell type, the arrival of another, and/or changes within some or all of the cell types present. Likewise, changes in one cell type could be masked by changes of opposite direction in another cell type. All of these factors clearly complicate the application of microarrays to assess changes in RNA due to experimental manipulations, especially those that may have their primary influence on rare cells. TRAP provides not only the ability to detect changes in rare cell types, but also enhanced ability to interpret the results, as it is known a priori which cells contain the tagged ribosomes. In addition, TRAP has the advantage over other approaches to cell specific RNA profiling as it assesses translation, rather than expression, providing a better correlate of actual protein levels (6).

There are distinctions between microarray experiments from TRAP RNA compared to whole tissue RNA, and these distinctions can have important impact on the assumptions regarding experimental design, normalization, analysis and interpretation. To aid researchers implementing cell-specific RNA-analysis technologies(1,7-10), we present here a preferred analytical method for TRAP translational profiling data. Importantly, the TRAP methodology provides in vivo

4quantitative comparative analysis of multiple cell types. Here, we have developed a robust analytical method for identifying and quantifying cell-specific and enriched mRNA’s across multiple cell populations, referred to as the Specificity Index. We apply this to a large survey of CNS cell types and provide a simple perusable archive of plots of this measure across all cell types, for each gene.

MATERIALS AND METHODS

Data TRAP data were generated as described (1,2), and are available for download from GEO:

GSE13379. Etv1 data were not plotted because of known contamination with endothelial or lymphoblast cells(1). Other cell types and drivers are listed in Table 1.

Translating Ribosome Affinity Purification

Additional TRAP experiments on wildtype mouse brains were conducted as described (2). RNA was quantified using the Ribogreen assay, according to manufacturer’s instructions (Invitrogen, Carlsbad, Ca.), and a Modulus single tube fluorometer from Turner Biosystems (Sunnyvale, Ca) with the blue optical kit.

R Code The scripts used for calculation of Specificity Index, are available from the Heintz Laboratory Website (address:___). SI for a given gene (n), in a given cell type (#1), compared to cell types, k = 2...m, is given by the formula...

...where IP1,n is the expression value for gene n in cell type one, and rank(IP1,n / IP1,k) is the position, of gene n, in a descending-ordered list of 'fold-change' (IP1 / IPk) values for all genes. Note that SI is only calculated for those genes in cell type k with an absolute expression above 50 in IPk, and with Log2(IPk/Totalk) values above a threshold...

...where 1..j is a set of negative control genes known not to be expressed in this cell type, IPk,p is the expression value for gene p in the immunoprecipitate from cell type k. Totalk,p is the expression value for gene p in the total tissue RNA from the tissue cell type k was isolated from. Any gene for which log2(IPk/Totalk) < Thresholdk is excluded, with the caveat that Thresholdk was not allowed to exceed zero (Supplemental Materials and Supplemental Figure 5).

Scoring Allen Brain Atlas (ABA) in situ hybridizations

5For comparative analysis of TRAP data, we developed a blinded, unbiased scoring method

(the SENU method). For the first application, for each of four cell types, fifty probesets were selected at random from the top five hundred most enriched genes (IP/Total). For each cell-type, an additional fifty probesets were selected from the array at random, irrespective of IP/Total value. For each cell type, the fifty random and the fifty cell-enriched probesets were scrambled together and presented to three blinded judges, previously trained in the heuristics below until inter-rater reliability was above 60% on training sets.

Judges searched for each probeset in the ABA using the gene symbol and name. If no gene symbol or synonym could be found, the probeset was scored as absent. For probesets present in the ABA, judges first assessed overall quality of the in situ hybdrization (ISH). If the ISH had no detectable signal or was of low quality for the given gene, the gene was scored as a “U.” (Unscorable).

For probesets not scored U, judges evaluated potential expression in the four cell types. For each cell type the judges could assign one of three scores, “S” (Specific for cell-type within region), “E” (Expressed in cell type), and “N” (clearly Not expressed). Detailed heuristics for each are:

S: In situ must be of very good quality and show clear signal in cell type of interest that is at least 3 color levels with the Allen ‘expression viewer’ above any other cells in the same region.

E: In situ shows expression in cell type of interest but overall signal is weak or there is clear signal in surrounding cells as well. In situ may be moderate or good quality.

N: In situ must be of very good quality and clearly have 1) no signal in cell-type of interest and 2) very good signal somewhere else in tissue.

As cell type is difficult to assign from colometric ISH alone, for each cell type, the pattern assayed was: Purkinje Cells: ISH pattern in cerebellum with evenly spaced large cells in the PCL. Motor Neurons: ISH pattern in brain stem in large cells at the approximate locations of the 3rd, 5th and 7th motor

nuclei. Layer V Cortical Neurons: A laminar ISH pattern in cortex at approximately the position of layer 5, with, at

most, labeling in one other layer. Oligodendrocytes: Strong specific ISH pattern in the corpus callosum. Scattered labeling in cortex also

permitted. Note - color criteria for ‘S’ had to be relaxed as oligodendrocytes are often too small to be recognized as cells by ABA expression viewer.

For the second round of SENU analysis, two additional cell types were added, Granule cells, and Cortical Interneurons. For each of the six cell types, one hundred and fifty ISH were scored, fifty each from the top two hundred and fifty of IP/Total, SI, and random lists. If multiple ISH sets were available for the same gene, only the most recent sagital ISH set was used. Heuristics for an ISH pattern consistent with expression in granule cells or interneurons are:

Granule cells: Clear expression exclusively in granule cell layer of cerebellum, in at least 50% of the cells. Cortical interneurons: Scattered , non-laminar expression in the cortex, with a cell number in the range between

two reference ISH patterns, Cort, and the GABA transporter Slc32a1. For the third round of SENU analysis, all remaining cell types were evaluated (glial cells

were only scored in cerebellum). For each cell types, all genes with SI p-values <10e-5 were scrambled with an equal number of randomly selected genes and up to forty genes per cell line were scored blindly as above, using the driver gene ISH as a reference pattern. It is worth noting, however, that several cell types had difficult to interpret ISH patterns (Cck), lacked appropriate signal in even for the driver (Grp), represented small and scattered cells (Olig2, ALdh1L1), or were found in very cell dense regions (Neurod1). For many of these cell types, inter rater reliability was correspondingly lower.

Immunofluorescence

Adult mice were perfused transcardially with PBS followed by 4% paraformaldehyde in PBS, cryoprotected in 30% sucrose PBS, frozen, and sliced to 40 microns on a cryostat. Floating sections were blocked with 5% normal donkey serum in 0.25% Triton X100 PBS and incubated

6overnight with chicken anti-GFP antibody (Abcam, Cambridge, Ma), and/or Grm1 (AB1551, Chemicon, Temecula, Ca) and Calb2 (6b,3 Swant, Bellinzona, Switzerland) incubated ninety minutes with appropriately Alexa conjugated secondary antibodies (Invitrogen, Carlsbad, Ca), and counterstained with DAPI. Images were acquired with a Zeiss LSM 510 inverted confocal microscope.

RESULTS

Dataset The microarray data employed for these studies is from a published survey of CNS cell types

generated with the TRAP methodology (1). This dataset contains samples representing a variety of pure and mixed cell types from different structures of the mouse brain, as well as samples from the corresponding whole tissue (Table 1). The purified samples are referred to as immunoprecipitates (IP). In parallel, RNA which did not bind to the antibody was also harvested to provide an assessment of the gene expression of the tissue as a whole. These samples are referred to as unbound RNA. Microarray analysis, as traditionally applied to the nervous system, results in samples that are most similar to unbound samples. As the immunoprecipitation does not lead to significant depletion of cell-specific RNAs, here we use the unbound samples as a measure for the total tissue homogenate RNA (referred to as Total).

IPvTotal plots

As an initial assessment of TRAP data, described in detail below, we generated scatterplots with the log signal intensity for the IP on the X axis and the Total on the Y axis (IPvTotal plot) for each cell population. Systematic examination of these plots revealed they could be used for quick visual assessment of the quality of the TRAP experiment, particularly for the level of non-specific background (Figure 1, Supplemental Figure 2). We first applied these plots for assessment of different metrics of normalization (Supplemental Figure 3, and Materials). It also became apparent these plots may also indicate the rarity and/or uniqueness of the cell type within its tissue (Supplemental Figure 4 and Materials). Finally, we assessed IP/Total as a measure to identify those RNAs that may be specific or enriched in a given cell type.

Figure 1a shows an example of this plot for Purkinje cells. A list of genes known from the literature to be glial specific (and thus not in Purkinje neurons) has been marked in red, and a variety of genes determined from in situ hybridization databases(11,12) to be highly expressed in Purkinje cell layer, including the driver for this mouse line, Pcp2, have been marked in blue (Supplemental Table 1). From this plot it is clear that RNAs known to be enriched in Purkinje cells have high ratios of IP/Total. Genes that are known not to be expressed in Purkinje cells, such as those that are specific to glia, are highly enriched in the Total RNA. They have low IP/Total ratios. Based on the locations of these positive and negative controls, we have developed a heuristic for the interpretation of IPvTotal plots, illustrated in Figure 1b. Essentially, from the top left corner of the plot to the bottom right, one has increasing confidence, first that the RNA derives from the targeted cell type, and then that it is highly enriched in that type. Note that probesets with low signal (bottom left corner) should be considered with caution, as they tend to have higher variability(13).

IP/Total for identification of enriched genes

As previously shown, if a RNA is specifically translated in the targeted cell type within a tissue, it should have a very high IP/Total ratio (1). As an independent, qualitative measure of the expression of specific mRNAs within a cell of interest, we compared our data to in situ

7hybridization(ISH) data from the Allen Brain Atlas (ABA)(11). Since it is often difficult to establish cell identity by ISH data alone, we chose for this first comparative study four cell types that are relatively simple to identify by size and localization in colormetric ISH (brainstem motor neurons, cerebellar Purkinje cells, layer 5 cortical pyramidal cells, oligodendrocytes). For each cell type, a list of fifty ‘high IP/Total’ probesets was selected at random from the top five hundred probesets, as ranked by IP/Total. Many of these mRNAs are only moderately enriched: minimum IP/Total ratios range from around two (motor neurons, layer V cortical neurons) to around four (Purkinje cells). For comparison, an additional 50 probesets were selected at random from the array, and scrambled with the list above. These lists were then presented to three blinded judges and the ISH for all genes were scored as specific (S), expressed (E), clearly not expressed (N), or unscorable (U) in the cell type of interest. Figure 2a shows examples of S, E, and N and U scores for brain stem motor neurons. After excluding the ISH scored U, probesets for genes with high IP/Total were highly enriched by ISH in the cell type of interest (S), and less likely to appear not expressed (N) than the random list of fifty genes (Chi square, p < .0005 for each cell type). Typically, probesets with high IP/Total ratios were three to four times more likely to be scored S than random genes (Figure 2b). Although this analysis demonstrated that TRAP analysis results are concordant with the easily scored ISH data, the level of enrichment varied substantially between the cell types assessed. Given this fact, and the many factors that limit the utility of ISH data for detection of cell specific changes in gene expression in complex tissues, we sought to develop an independent method for the quantitative measurement of the specificity of expression of any gene in a given cell type or condition relative to a large number of other cell types using comparative analysis of TRAP data from a variety of specific CNS cell types.

The Specificity Index to identify cell-specific and enriched genes

As described above, the IP/Total metric can be used as a simple method to suggest cell-specific and enriched genes. However, there are three drawbacks to the method. First, there are cell types where logically it would be ineffective, such as granule cells of cerebellum or medium spiny neurons of striatum. Over 90% of the cells in the cerebellum are granule cells(14). As such, a comparison of a granule cell IP to Total cerebellum will yield little enrichment of granule cell genes, as shown in Supplemental Figure 4b. In contrast, comparison of the granule cell IP data to the IP data obtained from Purkinje cells clearly reveals a high enrichment of the granule cell driver gene, Neurod1 (Supplemental Figure 4c). This demonstrates that the granule cell IP was robust, and illustrates the value of comparative analysis of TRAP derived from specific cell types. Likewise, comparison of the Drd1a+ or Drd2+ medium spiny neurons to Total striatum, which is made primarily of medium spiny neurons, will identify very few striatally enriched genes(2). The second drawback is that a comparison of IP to Total will only yield information about enrichment relative to one particular dissected structure, and not the rest of the brain. To accurately determine the suite of cell-specific genes, one needs to make multiple comparisons across all available cell types and structures. Finally, IP/Total alone does not give a sense of how likely a particular ratio is to appear by chance, and at what threshold a gene should be considered enriched. Indeed, from the four cell types scored above, there were clear differences in fraction of specific genes found in the top five hundred IP/Total (Figure 2b).

To overcome these problems, we developed a generic algorithm, the Specificity Index (SI), to assess the specificity of a given RNA in one sample relative to all other samples analyzed. For each cell type, the SI is calculated in three steps, as illustrated in Figure 3a-e. First, following GCRMA normalization within replicates and global normalization across samples, the IP was compared to the Total to filter out the non-specific background by setting a simple threshold based on negative controls (Supplemental Figure 5, and Materials). For those cell types known

8to have significant background contamination, this threshold was left at one, so as to not filter too many probesets and create false negatives. Probesets with low signal were also removed, following standard practice with microarray data. Second, for the remaining probesets, this filtered IP was iteratively compared to each other (unfiltered) sample in the dataset and a ratio was calculated for each probeset. To prevent extreme outliers from skewing the subsequent analysis, and to make the analysis more robust for difficult to normalize datasets, the probesets were ranked from highest to lowest ratio within each comparison. Third, for each probeset, its ranks across all comparisons are averaged to give the SI. Thus, the SI is a measure of the specificity of expression for each probeset in a given cell type relative to all other cell types included in the analysis: how highly ranked on a gene list is this probeset, on average, in this cell type compared to all others.

Validation of Specificity Index and comparison to IP/Total

To determine if the SI succeeds in selecting cell specific genes in those cases where IP/Total comparisons fail, we first examined the expression of genes predicted by each method to be translated in granule cells. Figure 4a shows a comparison of eGFP immunohistochemistry for GENSAT BAC transgenics (15) for two genes selected by IP/Total and two selected for a high SI. The genes selected by SI clearly have an expression pattern that is more consistent with highly enriched expression in cerebellar granule cells: labeling of many cell bodies in the cerebellar granule cell layer, with fibers filing the molecular layer, where granule cell axons project.

To determine how effectively the SI index performs in general compared to IP/Total in selecting cell specific and enriched genes, we repeated our SENU analysis of ABA ISH patterns for one hundred and fifty probesets for each of six cell types (Figure 4b). Fifty probesets were chosen randomly each from the top two hundred and fifty probesets of SI and IP/Total, as well as fifty random probesets from the array. These were scrambled and scored by three blinded judges, as above. As before, Chi tests revealed TRAP data were performing significantly better than chance at predicting specific gene expression (p<.01 to p<10-99, across either metric in each cell type). As expected, SI outperforms IP/Total for those cases where the TRAPed cell type makes up a significant fraction of the total, such as Neurod1 positive granule cells. Quite surprisingly SI also out-performed IP/Total with Purkinje cells (Cb.Pcp2), cortical oligodendrocytes (Ctx.Cmtm5) and cortical interneurons (Ctx.cort). In the worst case, that of layer V cortical projection neurons (Ctx.Glt25d2), IP/Total or SI both yielded approximately 50% more than the amount of specific patterns expected by chance. There were no cell types where IP/Total clearly performed significantly better than SI. Thus we determined that the SI is a useful and robust metric for identifying cell specific and enriched genes.

The Specificity Index as a Statistical Measure

The SI is influenced by both the variations in the number of transcripts that are enriched in each cell type being analyzed, and the purity and recovery of TRAP mRNA collected for each cell type. The range of the rankings is dependent on the number of probesets in the comparison, and that number depends on the number of genes expressed and the level of filtering in each particular cell type. Consequently, raw SI values are not directly comparable across cell types. In addition, the SI alone does not provide a sense of how likely a given rank is to occur by chance. Therefore, for each SI we calculate a p-value via permutation testing as illustrated in Supplemental Figure 6a: for each IP, the filtered expression values are randomly shuffled many times and SIs are calculated for all probesets, to determine the frequency of a particular SI value appearing. This creates a simulated probability distribution. The probability of any given SI from

9the true distribution can be assigned from the simulated distribution. Thus one can derive a list of genes that are significantly specific to, or enriched in, any particular cell type, with a known probability (Supplemental Figure 6b). We note that for each cell type, the number of genes that reach a given statistical threshold is different. However, since these probabilities are comparable across cell types, they can be plotted to permit assessment of the specificity for a given probeset across all cell types analyzed, as illustrated for the granule cell driver Neurod1 in Figure 4c.

To determine whether the SI is an accurate relative measure of the specificity of expression of each gene relative to all others for the cell types analyzed, we next performed a post-hoc analysis of our judges’ ratings in the SENU analysis pooled across all six cell types from Figure 4b. For p-values of < .00001, over 75% of scorable ISH were scored ‘Specific,’ compared to approximately 15% of those p >.1 (Figure 4d). Even with extensive training in detailed heuristics and blind scoring there is substantial subjectivity in the interpretation of ISH, and only 55% of the nine hundred ISH had identical scores from all three judges (for 95% however, at least two judges agreed on the score). Of these ISH on which all three judges agreed, 100% of the genes with p < .00001 were scored as specific (not shown). This analysis provides a potential heuristic for the interpretation of various SI p-values for a gene across cells types: while any p-value <.1 suggests some enrichment, as p-values continue to decrease, enrichment increases until the majority, if not all genes at extremely low p-values are highly specific (Figure 4d).

Finally, to generalize this finding to all remaining cell types, we examined the ISH pattern for all cell types for those genes with p<.00001. This represented a challenge as most of these cell types cannot be unambiguously identified by position information alone. For each cell type, the p<.00001 genes were scrambled with an equal number of randomly selected genes and up to forty genes per cell type were scored blindly by three judges. For this analysis, genes were scored as specific if their ISH pattern matched that of the driver for the TRAP line. Across nearly all cell types most of these p<.00001 genes had patterns consistent with specific expression in the correct cell types (Figure 5a), representing in all cases a highly significant enrichment relative to randomly selected genes (Chi square p< .0005 to p<10-21).

However, for three cell types, SI did not perform well at predicting ISH patterns, and we will discuss these briefly because they are each illustrative of an important point regarding this analysis (Figure 5b). First, for the ISH patterns for genes from the line Etv1, which expresses the eGFP-L10a transgene in layer 5b projection neurons, over 70% of the P<.00001 were specific to blood vessels. This strongly suggests that the bacTRAP construct is also expressed in endothelial cells or some component of the blood in this line. This illustrates the point that careful anatomical characterization of TRAP lines is essential. Minor contamination by rare cell types will be very apparent following SI analysis, and confirmation with ISH databases.

Second, for the Cck TRAP line, which is expressed broadly in multiple layers of cortex and in both pyramidal cells and interneurons, we observed no significant enrichment for specific ISH patterns (Chi square, p=.3). We believe this reflects the fact that this line includes so many neuron types that nearly any gene expressed in neurons will be present in the IP. This illustrates the difficulty in assessing ISH results for a TRAP driver that is broadly expressed.

Third, in the Cb.Grp data, representing a mix of unipolar brush cells (UBC) and Bergman glia, SI identified genes did not show enrichment for specific ISH patterns (Chi Square p =.14). As Bergman glia are represented in both the Cb.Aldh1L1 and Cb.Sept4 datasets, the most specific genes for this data should come from the UBCs, a small excitatory interneuron found primarily in the granule cell layer of the posterior lobules of the cerebellum(16). However, since even the driver, Grp, did not have a specific ISH pattern, we were suspicious that ISH may have reduced sensitivity for detecting messages in this scattered population of small neurons in the cell dense cerebellar granule layer. Some SI identified genes, such as Nmb, did show a scattered

10precipitate in lobule X of the cerebellum, but the particles were too small to be clearly identifiable as cells by our judges. To provide an independent dataset, we examined the GENSAT database(15) for the SI identified genes in the 5 lines for which adult data were available online (Grp, Nmb, Ntf3, Otx2, and Eomes). Three of these five lines clearly expressed GFP in cells with the distinct morphology and position of UBCs in the online database. We further confirmed, in the NMB line, that these were indeed UBC's by confocal triple immunofluorescence for GFP and the UBC markers Calb2 and Grm1 (Figure 5c)(17).

In general, we note that, despite the concordance between the TRAP data and ISH results for genes whose expression is easily detected by ISH, a significant fraction of the genes determined to be enriched in a specific cell type by TRAP analysis could not be scored from the ISH data (Figure 2b). RT-PCR on genes without detectable signal (U) by ISH reveals that in these cases the RNA is indeed present in the brain, and enriched in the TRAP samples from the cell types of interest (1). This is not surprising, since successful ISH is dependant on many factors, including expression level of the gene, hybridization kinetics of the probe, and availability of unique sequence for probe design. We conclude that negative results on ISH should be interpreted with caution. Of course, for any specific case, differences in ISH patterns and TRAP measurements could also indicate there is a difference between transcription and translation of a given gene.

Finally, the mixed oligodendrocyte data (Olig2 line) only showed 43% specific ISH patterns. While this is still significantly better (Chi Square, p <10-5) than the 12% identified by chance, we were curious if this reflected the fact that of the twenty six cell types included in the SI calculation, at least 4 contained information primarily from oligodendrocytes (ctx.olig2, cb.olig2, ctx.cmtm5, cb.cmtm5). Thus, we repeated the SI analysis on our data set, but excluded three of these four samples collected from oligodendrocytes so only one, unique oligodendrocyte sample remained. As shown in Figure 6, this resulted in two major effects: first, as the oligodendrocyte data became more unique in the analysis, there were now twice as many genes with p<.00001 by SI; second, when the ISH for these genes were scored blindly as above, 70% of them showed specific ISH patterns. This demonstrates the fact SI is a relative measure that is influenced by the composition of the entire dataset, and that one should carefully consider which datasets to include for the specific experimental question being addressed.

An archive of SI for all genes

To provide a resource to permit researchers to examine the specificity of the translation of any gene across all cell types included in this analysis, we have created SI plots for all genes on the array using updated chip definition files (18) that provide one measure per ENTREZ gene ID (19) (Supplemental Figure 7). Figure 7 illustrates Specificity Index p-values, as well as IP/Total values for 6 representative genes, across the 24 cell populations from the original studies. This includes examples for genes known to be enriched in a few cell types (Slc18a3, the vesicular acetylcholine transporter, in cholinergic cells, Dlx1 in interneurons); metabolic genes expressed ubiquitously, though not equally, across the brain (Actb, Rpl8); and two genes implicated in autism, Nrxn2 and Nrxn1, which show broad but variable expression, or enrichment in a limited set of cortical neurons and granule cells, respectively. SI plots for all genes are available as a downloadable archive from: (Heintz lab website). Simply browsing through images can highlight remarkable biology. For example, the GalNAc transferase family is a group of golgi apparatus enzymes that catalyze the addition of oligosaccharides to protein receptors destined for the cell surface. Supplemental Figure 8 shows plots for six members of this family, Galnt2,3,4,6,14,L2, five of which show remarkable cellular specificity to either oligodendrocyte progenitors, astroglia, mature oligodendrocytes, layer 5 cortical projection neurons, or granule cells. As many of these enzymes have affinities for distinct donors and acceptors (20), cell-specific expression of these proteins may result in distinct cell surface moieties.

11

DISCUSSION We present here a set of analytical procedures that have been developed for analysis of

TRAP translational profiling data. These approaches are specifically designed to accommodate features of TRAP translational profiling data that arise from the cell specific nature of the TRAP data, and to provide a robust framework for comparative analysis of data obtained from large numbers of cell types. In particular, we report the development of a Specificity Index to provide a relative and quantitative measure for the specificity of expression of all genes across the cell types being studied. In general, results of this analysis are concordant with easily evaluated ISH data from the Allen Brain Atlas(11). However, our data also indicate that TRAP translational profiling can reveal cell enriched expression for a large number of genes and cell types that are not easily assessed by ISH.

The Specificity Index

The impetus for the development of the Specificity Index was to accommodate the facts that there are dramatic differences in mRNA profiles between different cell types, and that it was evident that a new method for comparative and quantitative analysis of TRAP data was needed. Although analysis with standard tools for identifying statistically significant differences between samples also can apply to TRAP data (21-25) comparisons of widely divergent cell types, such as astrocytes and Purkinje neurons, result is greater than 60% of probesets reaching statistical significance (p<.05) using the empirical Limma (24) module of Bioconductor (25) with FDR multiple testing correction. This number of statistically significant changes demonstrates the limited utility of such methods for selecting small numbers of targets for biological follow up studies from such dramatically different cell types. The Specificity Index we’ve described here is robust, and uses a permutation based statistical approach to compensate for any irregularities in the distributions of the data, allowing direct comparison of p-values across samples with quite varied distributions. As shown above, this measure provides results consistent with published data, and independent assays of gene expression provided in the Allen Brain Atlas. However, the Specificity Index is clearly dependent on the number and nature of the samples included in the analysis (Figure 6). Consequently, the design of the Specificity Index analysis should be tailored to the biological question at hand. However, as we anticipate that many researchers will be interested primarily in the output of this analysis, rather than its implementation, we provide an archive of Specificity Index histograms for all genes across all cell types included in this study to permit in silico interrogation of cell-specific and enriched mRNA translation.

TRAP compared to In Situ Hybridization, Immunohistochemistry, and BAC Transgenesis for assessment of gene expression

The TRAP methodology is complementary to other methods of examining gene expression and protein translation in the CNS, though it has several distinct advantages. ISH and immunohistochemistry both require the laborious development and optimization of gene specific reagents, and depending on the size, location, expression level, and subcellular localization of the target, may not provide sufficient information to unambiguously identify the cell type labeled. BAC Transgenesis with an eGFP transgene provides comprehensive information about morphology and projections of the labeled cells, as well as a living reagent for further study(3), but requires a substantial time investment. Of the four methods, only TRAP can provide, in a single experiment, interrogation of the entire translated genome. TRAP data is quantitative, with the highest potential sensitivity and dynamic range. However, for the examination of a single

12gene across the entire CNS, ISH, immunohistochemistry, or BAC transgenesis will remain the method of choice.

Conclusions

A variety of different tools have been generated for the analysis of microarray data (13,21-23). For those interested in developing or applying other array analysis methods to TRAP data, it is advisable to first test those methods on our most robust datasets, such as the Purkinje cell data, where a variety of positive and negative control genes can be used as standards (Supplemental Table 1), and the cells can be more easily identified by ISH. Furthermore, for most experiments, there are two important considerations for data analysis: first, quantile normalization should only be applied to that which would be expected to have similar mRNA distributions (from the same cell type or region, see Supplemental Figure 3, and Materials); second, comparisons of IPvTotal data can be used to remove non-specific background prior to IPvIP comparisons, regardless of the source of this background (Supplemental Figure 2,5,9 and Materials). Ongoing improvements in the molecular methodology are likely to remove most of the non-specific background deriving from interaction of purification reagents with untagged ribosomes seen in this first survey, making in many cases the filtering steps for calculating SI unnecessary in the future, though this filtering approach may still be applicable for dealing with low level expression of the transgene in cell types of secondary interest to the study. To aide in this normalization, lists of recommended negative control probesets are included in Supplemental Table 2 (although any genes known not to be expressed in the cell type of interest may be used). Standard statistical methods (21-25) remain essential for detecting more subtle differences, such as the changes within a single cell type following exposure of the animal to a drug (2).

Given the improved sensitivity and anatomic specificity obtained using TRAP and related methodologies, we anticipate wide application of these methodologies for gene expression studies in the mouse nervous system. The methods outlined here provide analytical tools for those researchers employing these methodologies, as a well as those interested in mining published TRAP datasets for cell specific and enriched mRNAs. Continued experimental and analytical developments will enhance the value of the methods and data provided here. Nonetheless this methodology provides a systematic approach to the expressed genes that determine the unique properties of specific neural cell types, and to identify candidate genes to serve as markers and pharmacological targets.

FUNDING

This work was funded by the Howard Hughes Medical Institute, the Adelson Program in Neural Rehabilitation and Repair, the Simons Foundation, the Conte Center (NIH/NIMH 5P50MH074866 P2), and the Croll Charitable Trust.

ACKNOWLEDGEMENTS

We would like to thank P. Greengard, J.P. Doyle, J.C. Earnhart, S. Gayawali, M. Heiman, S. Kriaucionis and members of the Geschwind and Heintz Laboratories for discussions and advice, and R. Shah for blinded scoring of ISH data. We would also like to thank The Rockefeller University Bioimaging facility, and Genomics Resource Center.

REFERENCES 1. Doyle, J.P., Dougherty, J.D., Heiman, M., Schmidt, E.F., Stevens, T.R., Ma, G., Bupp, S., Shrestha, P., Shah, R.D., Doughty, M.L. et

al. (2008) Application of a translational profiling approach for the comparative analysis of CNS cell types. Cell, 135, 749-762.

132. Heiman, M., Schaefer, A., Gong, S., Peterson, J.D., Day, M., Ramsey, K.E., Suarez-Farinas, M., Schwarz, C., Stephan, D.A.,

Surmeier, D.J. et al. (2008) A translational profiling approach for the molecular characterization of CNS cell types. Cell, 135, 738-748.

3. Yang, X.W., Model, P. and Heintz, N. (1997) Homologous recombination based modification in Escherichia coli and germline transmission in transgenic mice of a bacterial artificial chromosome. Nat Biotechnol, 15, 859-865.

4. Sandberg, R., Yasuda, R., Pankratz, D.G., Carter, T.A., Del Rio, J.A., Wodicka, L., Mayford, M., Lockhart, D.J. and Barlow, C. (2000) Regional and strain-specific gene expression mapping in the adult mouse brain. Proc Natl Acad Sci U S A, 97, 11038-11043.

5. Geschwind, D.H. (2000) Mice, microarrays, and the genetic diversity of the brain. Proc Natl Acad Sci U S A, 97, 10676-10678. 6. Ingolia, N.T., Ghaemmaghami, S., Newman, J.R. and Weissman, J.S. (2009) Genome-wide analysis in vivo of translation with

nucleotide resolution using ribosome profiling. Science, 324, 218-223. 7. Arlotta, P., Molyneaux, B.J., Chen, J., Inoue, J., Kominami, R. and Macklis, J.D. (2005) Neuronal subtype-specific genes that control

corticospinal motor neuron development in vivo. Neuron, 45, 207-221. 8. Cahoy, J.D., Emery, B., Kaushal, A., Foo, L.C., Zamanian, J.L., Christopherson, K.S., Xing, Y., Lubischer, J.L., Krieg, P.A.,

Krupenko, S.A. et al. (2008) A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J Neurosci, 28, 264-278.

9. Sugino, K., Hempel, C.M., Miller, M.N., Hattox, A.M., Shapiro, P., Wu, C., Huang, Z.J. and Nelson, S.B. (2006) Molecular taxonomy of major neuronal classes in the adult mouse forebrain. Nat Neurosci, 9, 99-107.

10. Emmert-Buck, M.R., Bonner, R.F., Smith, P.D., Chuaqui, R.F., Zhuang, Z., Goldstein, S.R., Weiss, R.A. and Liotta, L.A. (1996) Laser capture microdissection. Science, 274, 998-1001.

11. Lein, E.S., Hawrylycz, M.J., Ao, N., Ayres, M., Bensinger, A., Bernard, A., Boe, A.F., Boguski, M.S., Brockway, K.S., Byrnes, E.J. et al. (2007) Genome-wide atlas of gene expression in the adult mouse brain. Nature, 445, 168-176.

12. Magdaleno, S., Jensen, P., Brumwell, C.L., Seal, A., Lehman, K., Asbury, A., Cheung, T., Cornelius, T., Batten, D.M., Eden, C. et al. (2006) BGEM: an in situ hybridization database of gene expression in the embryonic and adult mouse nervous system. PLoS Biol, 4, e86.

13. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. and Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249-264.

14. Eccles, J.C., Itåo, M. and Szentâagothai, J. (1967) The cerebellum as a neuronal machine. Springer-Verlag, Berlin, New York [etc.]. 15. Gong, S., Zheng, C., Doughty, M.L., Losos, K., Didkovsky, N., Schambra, U.B., Nowak, N.J., Joyner, A., Leblanc, G., Hatten, M.E.

et al. (2003) A gene expression atlas of the central nervous system based on bacterial artificial chromosomes. Nature, 425, 917-925. 16. Mugnaini, E. and Floris, A. (1994) The unipolar brush cell: a neglected neuron of the mammalian cerebellar cortex. J Comp Neurol,

339, 174-180. 17. Nunzi, M.G., Shigemoto, R. and Mugnaini, E. (2002) Differential expression of calretinin and metabotropic glutamate receptor

mGluR1alpha defines subsets of unipolar brush cells in mouse cerebellum. J Comp Neurol, 451, 189-199. 18. Dai, M., Wang, P., Boyd, A.D., Kostov, G., Athey, B., Jones, E.G., Bunney, W.E., Myers, R.M., Speed, T.P., Akil, H. et al. (2005)

Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res, 33, e175. 19. Maglott, D., Ostell, J., Pruitt, K.D. and Tatusova, T. (2007) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res, 35,

D26-31. 20. Wandall, H.H., Hassan, H., Mirgorodskaya, E., Kristensen, A.K., Roepstorff, P., Bennett, E.P., Nielsen, P.A., Hollingsworth, M.A.,

Burchell, J., Taylor-Papadimitriou, J. et al. (1997) Substrate specificities of three members of the human UDP-N-acetyl-alpha-D-galactosamine:Polypeptide N-acetylgalactosaminyltransferase family, GalNAc-T1, -T2, and -T3. J Biol Chem, 272, 23503-23514.

21. Li, C. and Hung Wong, W. (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol, 2, RESEARCH0032.

22. Tusher, V.G., Tibshirani, R. and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A, 98, 5116-5121.

23. Sabatti, C., Karsten, S.L. and Geschwind, D.H. (2002) Thresholding rules for recovering a sparse signal from microarray experiments. Math Biosci, 176, 17-34.

24. Smyth, G.K. (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 3, Article3.

25. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J. et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, 5, R80.

26. Johnson, W.E., Li, C. and Rabinovic, A. (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8, 118-127.

27. Cameron, R.S. and Rakic, P. (1991) Glial cell lineage in the cerebral cortex: a review and synthesis. Glia, 4, 124-137. 28. Perlson, E., Hanz, S., Ben-Yaakov, K., Segal-Ruder, Y., Seger, R. and Fainzilber, M. (2005) Vimentin-dependent spatial translocation

of an activated MAP kinase in injured nerve. Neuron, 45, 715-726. 29. Kriaucionis, S., Heintz, N. (2009) 5-hydroxymethylcytidine, a novel mammalian nuclear DNA base is present in brain and enriched in

Purkinje neurons. Science, 324, 929-930. 30. Geschwind, D.H. and Gregg, J.P. (2002) Microarrays for the neurosciences: an essential guide. MIT Press, Cambridge, Mass.

Figure 1. Assessment of IPvTotal plots. a) Scatterplot of immunoprecipitated (Purkinje cells, IP) vs. unbound RNA (from whole cerebellum, Total) provides a basic measure of experiment quality. RNA for non-Purkinje cell genes (glial genes, red) are highly enriched in Total RNA, while RNAs determined to be in Purkinje cells (blue, Supplemental Table 1) are enriched in IP RNA. b) Illustration of the interpretation of IPvTotal plots based on the locations of positive and negative control genes.

Figure 2. High IP/Total can identify cell-specific genes. a) Examples of in situ patterns from the Allen Brain Atlas scored as Specific, Expressed, Not Expressed, or Unscorable for brainstem motor neurons (Allen Mouse Brain Atlas [Internet]. Seattle (WA): Allen Institute for Brain Science. ©2008. Available from: http://mouse.brain-map.org). b)

14For each of four cell types, fifty from the top five hundred highest IP/Total ratio genes, and fifty random genes, were scrambled together and scored blindly by three judges trained in the rubric illustrated in a. Genes with high ratio for each cell type (gray bars) were more likely to be categorized as specific (center panel) and less likely to be categorized as Not Expressed (right panel) p<.0005, Chi-test, all cell types. Genes with absent or unscorable ISH patterns (b, left panel) were not included in analysis. Figure 3. Illustration of algorithm for calculation of specificity index (SI) to identify cell specific and enriched genes for a single cell type (Purkinje Cells, pink). a) SI is a comparative analysis, thus multiple bacTRAP experiments are conducted for several classical cell types shown in this illustration from Cajal. b) Data from each cell type are normalized and filtered to remove background, as illustrated in Supplemental Figure 5, prior to IP/IP calculation. c) Normalized and filtered Purkinje Cell data are compared to each other cell type (IP/IP). For each comparison (2..M), probesets are ranked from highest to lowest 'fold change.' SI for each probeset is calculated as the average rank across all comparisons. d) A p-value is assigned to a given SI value via a permutation testing, as illustrated in Supplemental Figure 6a. e) A list of genes significantly enriched in Purkinje cells can be selected based on p-value. Figure 4. The Specificity Index provides a robust method for identifying cell specific and enriched mRNAs. a) Specificity index performs better at selecting granule cell-specific mRNAs. Right panels: examples of GENSAT eGFP expression patterns for two mRNAs with Specificity Index p<10-5, but low IP/Total (<2 fold) show robust expression in cerebellar granule cells. Left panels: examples of two mRNAs high IP/Total (> 3 fold), but non-significant specificity indices show little expression in cerebellum. b) Blind scoring by three judges of Allen Brain Atlas ISH for fifty random, fifty high IP/Total, and fifty high SI genes, across six cell types, reveals that SI generally performs better than IP/Total in predicting specific ISH patterns, among those ISH that are scorable c) Plot for combined Specificity Index p values (blue bars, -log 10 scale) and IP/Total values (red bars, log 2) across all cell populations for a probeset of Neurod1. Specificity index clearly identifies Neurod1 as a marker for granule cells (p<10-5). across all cell types. Axis on left shows log scale. Axes on right show corresponding p-values in blue, and IP/Total ratio in red. Cell types are in same order, and with same abbreviations, as Table 1. d) Posthoc analysis across all judges and cell types reveals that more than 75% of those genes with SI p values < 10e-5 are scored as specific, compared to 15% of those with p > 0.1.

Figure 5. Specificity Index concurs well with ISH pattern for most cell types. a) A higher % of scorable ISH patterns scored are Specific for those genes with SI p<10e-5 for these cell types, relative to randomly selected genes (Chi Squares from p<0.0004 to p<10e-21). 'n' is the number of p<10e-5 genes for a given cell type. b) For three cell types, ISH anlaysis did not show significant enrichment in specific genes by specificity index. Left panel: Etv1 data are known to be contaminated with blood or endothelial cells, reflected here in the large fraction of Not Expressed scores. Middle panel: Cck data are from a mix of many different cortical neuronal cell types, limiting interpretability of both TRAP and ISH data. Right Panel: Unipolar Brush cells are difficult to identify by ISH. c) Confocal immunofluoresence on GENSAT Nmb eGFP line reveals clear GFP expression in both Calb2+ and Grm1+ unipolar brush cells(17). (Grm1 labels only the brush of these cells. Calb2 labels cytoplasm. Z stacks, not shown, confirm all GFP+ cells are positive for either Grm1 or Calb2). Nmb was scored as "Not Expressed" by two of three judges based on ISH alone, suggesting ISH may have limited sensitivity for some cell types.

Figure 6. Outcome of SI analysis depends on composition of dataset. When SI is calculated with only one oligodendrocyte sample in the dataset (b), more genes arrive at p<10e-5 (n of 80 instead of 40), and a higher fraction of those genes show a clear specific ISH pattern, compared to SI calculated with 4 oligodendrocytes samples in the dataset (a)

Figure 7. Specificity index p-values and IP/Totals for a selection of representative mRNAs. a-f) Combined Specificity Index p values (blue bars, -log 10 scale) and IP/Total values (red bars, log 2) across all cell populations. a) The acetylcholine transporter, Slc18a3 is significantly specific to all four cholinergic cell populations assessed. b) The interneuronal marker Dlx1 is translated specifically in the Cort and Pnoc bacTRAP lines. c,d) ubiquitously expressed genes B-actin(Actb) and Ribosomal Protein L8(Rpl8) are not specific to any cell type, though translation does vary across cell types. e,f) The Neurexin autism candidate genes Nrxn1, and Nrxn2, have differential patterns of translation. Nrxn2 is more broadly translated, while Nrxn1 has low to moderate enrichment in cerebellar granule cells and some cortical neuron types. Table 1. List of the cell populations, relevant drivers, and abbreviations (used for Figures 4, 5, 7 and Supplemental Figures 7,8).

15 SUPPLEMENTAL MATERIAL Considerations for normalization and analysis across cell type specific data

There is a major distinction between microarray experiments of TRAP RNA compared to whole tissue, unbound or Total RNA, and this distinction has an important impact on the assumptions regarding normalization: any given cell only translates those mRNAs required for its functions, and at the levels required by that particular cell type. Thus any given cell will have a smaller number of detectable RNA species than a whole tissue sample, which consists of an aggregate of RNAs from cells with a variety of roles. Therefore the distribution of measurable RNAs between IP and Total samples should be different. This is shown in the histograms of Supplemental Figure 3a. Total samples show more RNA's with detectable signal, consistent with the measurement of a more complex population of mRNAs from a mixture of cells. This is an important consideration because some normalization and analysis methods assume only minimal differences in the distributions between samples, and may by default filter to remove those probesets with signal in a small number of samples(26), or force all samples to have identical distributions(13). This is clearly inappropriate when comparing widely divergent cell types in which most genes are expected to vary in expression, with many genes being highly enriched in a certain cell type.

The IPvTotal plot was used to examine the impact of different normalization methodologies. Proper normalization should minimize IP/Total ratios for negative control genes, and maximize it for positive controls. Among the most common methods for normalization of Affymetrix data are the robust multi-array normalization (RMA) and GeneChip RMA (GCRMA), both of which apply quantile normalization to all data sets using the assumption that all samples should have the same RNA distribution (13). However, the assumption that any one cell type should express the same number of mRNAs at similar proportions as any other cell type and/or that the distribution of the aggregate of many cell types (Total) should be similar to the distribution of a single cell type is not supported by our data (Supplemental Figure 3a and (1,2)). Consequently, quantile normalization across IP and Total samples resulted in forcing both samples into an artificial distribution that represents neither. Thus, RNAs that are present in the Total sample will have their signals reduced and RNAs that are not present in the TRAP RNA will be artificially inflated.

The impact of these considerations for specific genes is shown in Supplemental Figure 3b, a scatterplot of Purkinje cell IP vs. cerebellar Total for all probesets on the array, with quantile normalization performed either within groups (separately) or across groups (together). On average, normalization across groups results in a decrease in the ability to detect enriched messages (IP/Total for positive controls 8.17 separate vs. 6.92 together), and higher signal in negative controls (0.23 separate vs. 0.37 together). In the case of specific probesets, particularly those for negative controls with low signal, the difference can be quite dramatic. For example, those from the Cnp1 gene change from a 0.1 IP/Total to a 0.5 IP/Total when the samples are normalized together, and the glial genes Mog and Plp, which are not detected in the IP when samples are normalized separately, appear as if they are present in the IP in Purkinje cells.

Quantile normalization still functions well in removing non-biological variability from biological replicates (multiple independent TRAP samples from the same line and tissue). Thus we first GCRMA normalized within replicate samples. Then, to correct for any global biases in hybridization or scanning conditions, we performed global normalization to the biotinylated spike in controls provided by Affymetrix across all cell populations.

16Background: sources and removal

Following normalization, for each cell population we plotted IPvTotal and displayed positive and negative controls to make initial judgments regarding the quality of a particular TRAP dataset. In Figure 1a, it is apparent there are some glial RNAs with detectable signal in the IP, though they are enriched eight to ten fold in the Total RNA (red genes, 0.12 average IP/Total). There are three possible explanations, which are not mutually exclusive.

First, it is possible that neurons are translating a very low level of glial genes. For example, it is known that Vimentin, expressed highly in glial progenitors(27), can be translated in adult neurons following injury(28). Second, in some cases there may be low levels of eGFP-L10a transgene expression in another cell type. For example, anatomical analysis (Supplemental Figure 9c) demonstrates that low levels of transgene expression in ‘non-targeted’ cell types can contribute signal to the TRAP microarray from the Lypd6 JP48 line. Though rare, careful driver selection can avoid this complexity. It is also possible to exclude data from these contaminating cell types in many cases posthoc by comparative analysis (1). Finally, as TRAP is an affinity purification method, there may be a small amount of RNA binding to the affinity purification reagents that is not derived from the labeled cells. To test this possibility, we performed TRAP on a wildtype brain and determined that the affinity purification reagents can bind a very small amount of RNA (Supplemental Figure 9a) in a manner proportional to the concentration of the lysates (Supplemental 9b). For most cell populations, this background represents a small fraction of the TRAP yield. However, in TRAP experiments with exceptionally low yield (<10 ngs), non-specific background can become problematic. Consistent with this, Supplemental Figure 2 shows increasing relative levels of negative control probesets as yield decreases for examples of experiments with good (Pcp2), low-moderate (Cmtm5), and very low yield (Cort).

Since the low yield IPs contain a larger proportion of non-specific background RNA that comes from unlabeled cell types in the tissue, it is more difficult in these samples to make the distinction between non-specific background and broadly translated messages. In spite of this difficulty, even low yield samples (eg Cort), which have a substantial contribution from non-specific background, still show remarkable enrichment of the positive control (Cort) (Supplemental Figure 2b). Thus, these experiments also provide valid information (see also Figure 4b), although not of the same quality as those with minimal background.

We quantified this level of non-specific background as the average IP/Total ratio of those negative control genes that have measurable signal. Thus, from the examples in Supplemental Figure 2, Cort has an average non-specific background of 1.1, while Cmtm5 has .48, and Pcp2 has .05. We then tested if the background could be removed with a relatively simple filter using this measure. We excluded those probesets falling below this average non-specific background, plus two standard deviations. Assuming a linear contribution of non specific background to TRAP signal, and a normal distribution of background signal intensities, theoretically this should remove the vast majority (96%) of those probesets that derive signal uniquely from background RNA. This threshold is shown as the red lines on Supplemental Figure 2. Filtering to remove these probesets prior to further analysis has the added advantage of reducing the number of probesets tested, thus reducing the requisite number of multiple testing corrections for downstream statistical analyses.

To determine if this filter is effective, we examined comparisons of two cell types from different tissues (Supplemental Figure 5b), or with differential levels of non-specific background contamination (Supplemental Figure 5a). Comparing cerebellar Purkinje cell IP data to Drd1+ medium spiny neuron IP data, without accounting for background, results in the apparent expression of the cerebellar granule cell-specific gene Neurod1 in Purkinje cells (Supplemental Figure 5b, left panel). Simple filtering prior to the IPvIP comparison successfully removed this false positive result (Supplemental Figure 5b, right panel). Thus, regardless of the source of the

17non-specific background, simple filters based on negative controls can be used as a generic method to remove most probesets deriving from non-specific background.

As previously reported, there are also a group of mRNAs that apparently specifically bind the affinity reagents even in the absence of eGFP-L10a protein(1). These probesets have extremely high IP/Total ratios in every IP, including those from control, non-transgenic mouse brains. These may represent specific interactions between anti-eGFP antibodies or protein G beads and nascent peptides on the ribosomes. They were removed from subsequent analysis. IPvTotal may indicate rarity of a cell type

The magnitude of IP/Total for positive controls can be used as a crude measure of the contribution of the targeted cell type’s mRNAs to the total mRNA pool in the tissue of interest. Supplemental Figure 4 shows IPvTotal plots for a less frequent (a), and extremely common cell type (b), with similar levels of non-specific background (red line). One can see that the ratio of the IP/Total for the driver gene (blue) increases with the rarity of the cell. Logically, if a cell contributes 5% of the RNA in the total tissue, then the cell-specific genes should be 20 fold enriched. From this, we can estimate that Purkinje cells, with an average enrichment of 8 fold for their positive control genes (Figure 1a), contribute 12% of the RNA in the cerebellum. While this is a disproportionately high amount relative to their numbers (0.3-0.4% of cerebellum, (29)) this number is not unreasonable given their relatively large cytoplasmic compartments (estimated at 40x the volume of most common cerebellar cells). In addition to magnitude of ratio, there is a broad difference in the number of RNAs with high IP/Total values across the samples in Supplemental Figure 4a and 4b. In general, the number of RNAs that are differentially expressed in this analysis correlates with the distinctive properties of that cell type relative to their tissue. Of course, the exact magnitude of these fold changes can depend on the level of background signal in the IP, thus currently these rules serve as useful heuristics rather than precise measures. Anatomical Considerations Careful characterization of transgene expression is essential to the interpretation of the TRAP data. We typically characterize the eGFP levels both with and without antibody staining. Those mouse lines with more robust expression have better yield and hence lower non-specific background. If there is visible eGFP without antibody in mouse brain sections, yields will generally be sufficient for microarray experiments. However, it is important to detect the presence of trace labeling in additional populations using anti-eGFP antibodies, as some signal from these populations would be detectable in the microarray data (Supplemental Figure 9c). Most mouse lines will express in multiple cell populations. Normally, these populations are present in distinct structures, and can thus be separated by careful dissection. Otherwise, microarray data from mixed populations can also be approached post hoc: for example a Bergman glial IP can be compared to a mixed Bergman glial/Unipolar Brush cell IP to identify Unipolar Brush cell specific genes (1).

Finally, it is important to consider if the experimental manipulation will impact the expression of the transgene itself. If so, this could have a dramatic impact on microarray results, particularly if the manipulation induces a dramatic change in the populations expressing the transgene, or the level of the transgene expression. This will need to be considered in the interpretation of the data.

Recommendations for design of TRAP experiments

18Supplemental Figure 10 provides an example for good TRAP study design. For TRAP,

standard good practices for microarray experimental design, execution, and analysis should be followed (30). Among these, it is particularly important to include careful checks of RNA quality and quantity before amplification. We recommend fluorometric measures for quantification, such at the Ribogreen assay, when measuring RNA concentrations of less than fifty nanogram per microliter, as well as Agilent Bioanalyzer assays to determine RNA integrity. Also, when amplifying RNA, it is important to start with the same amount of RNA from each sample, and use identical protocols. It is absolutely essential that experiment and control samples should be collected and amplified simultaneously or in balanced pairs, to control for non-specific amplification biases and batch effects, which afflict all microarray experiments (30). This is especially important when investigating more subtle manipulations such as drug treatments or the impact of knockouts on specific cell types. Finally, it is frequently advisable to pool tissue from multiple animals for each condition to increase yield in the case of small structures, as well as to help average out minor variations in dissection or treatment from animal to animal. We conduct at least three replicate affinity purifications, per experimental condition, and typically pool from three to six animals per replicate. However, it is clear the amount of background is dependant of the concentration of tissue homogenized (Supplemental Figure 9b). Maintaining an approximate 100mg/ml (or less) ratio of tissue to homogenization buffer, is recommended when pooling tissue to reduce non-specific background.

Future improvements of the TRAP methodology may eliminate the need to collect a Total measure and subtract non-specific background, and several strategies are actively being pursued to allow this. Currently, low level transgene expression in alternate cell types can be controlled by selecting more specific drivers. Weak drivers can be replaced with stronger ones. Often TRAP data with high background is mined to select stronger drivers yielding lines targeting the same cell type, but with better yield and lower background, such as replacing the Cmtm5 line with the Cnp1 line(1) for mature oligodendrocytes. Supplemental Figure 1. Illustration of the TRAP method. BAC transgenic mice are generated to target the expression of a fusion of eGFP and a ribosomal protein (L10a) to a specific population of cells in the mouse brain (shown here are motor neurons, targeted using a BAC containing the motor neuron specific Choline Acetyl Transferase gene). To isolate cell specific translational profiles, the entire tissue is homogenized, treated with detergents to solubulize the endoplasmic reticulum, and centrifuged to prepare a crude homogenate containing a mix of eGFP tagged and untagged polysomes. Importantly the tagged polysomes come uniquely from the motor neurons. Tagged polysomes, and associated mRNA, are then purified from homogenate using antibodies against eGFP, which are bound to protein G coated magnetic beads. Both purified and unpurified RNA are them amplified and hybridized to microarrays to profile the mRNA populations. Supplemental Figure 2. Assessment of IPvTotal plots. a) A ratio threshold (red line) can be set between non-specific background and broadly translated genes, based on the values of the negative control genes (red, glial genes for neurons, and neuronal genes for glia). High yielding lines (Pcp2) generally have low background, while low yielding lines, (Cmtm5), have a correspondingly higher background. b) Compared to higher yielding lings, with very low yielding line (Cort), broadly translated RNA’s can not be easily distinguished from non-specific background, though RNAs representing driver genes (blue) are consistently enriched. Black lines, all plots, 0.5, 1, 2 IP/Total ratio lines. Supplemental Figure 3. Improper normalization can create false positive signals. a) Average histogram of probeset signal intensities for all IP samples (dotted lines) compared to all Total samples (solid lines) reveals differences between distributions. In particular, IP samples have more undetectable probesets (first bin, red arrow), consistent with RNA purified from discrete cell populations compared to RNA from a mix of cell types. b) As illustrated with IPvTotal plots for Purkinje cells, quantile normalization (forcing identical distributions) of IP and Total samples together (right panel) produces artificial signal in negative controls (red genes, in yellow circle, shifted right). Black line: 1 fold. Red line, line of best fit through negative controls. Blue line, line of best fit through positive controls.

19Supplemental Figure 4. IPvTotal is dependant on the composition of the Total. a) IPvTotal for astroglial sample, under control of the driver Aldh1L1 (blue), shows the enrichment of many genes, as illustrated by number of probesets falling above the two fold line, suggesting glia contribute relatively a small fraction of the RNA pool of the whole cerebellum. b) IPvTotal for an extremely common cell type, the granule cell of the cerebellum, fails to show enrichment of granule cell-specific driver Neurod1, in spite of low background (red line), suggesting granule cells contribute a significant fraction of whole cerebellar RNA. c) A scatterplot of an IPvIP comparison of the Neurod1 IP to Pcp2 IP reveals clear enrichment of the Neurod1 probeset in the Neurod1 IP demonstrating the Neuod1 IP is enriched in granule cell RNA. Black lines, 0.5, 1, 2 fold. Red genes, cell-specific negative controls, as Figure 1 Supplemental Figure 5. Simple thresholds improve IPvIP comparisons. a) An illustration of how to filter data using simple thresholds for background and low expressed genes. Background threshold was set at mean plus two standard deviations of the detectable negative control genes. Probesets with expression below 50 were also removed. b) IPvIP comparison of cerebellar Purkinje cells (y-axis) to cerebellar Granule cells (x-axis), which have slightly different levels of background, have corresponding ‘differential’ expression of glial mRNAs (red genes, left panel), which can be removed by applying simple thresholds (right panel). b) Likewise an IPvIP of cerebellar Purkinje cells to Drd1a medium spiney neurons, which have background from different tissues, have a corresponding ‘differential’ expression of background mRNAs from other tissue specific cell types (red, Drd2 from the striatal Drd2+ medium spiny neuron, and Neurod1 from the cerebellar granule cell) which can be removed with simple thresholds (right panel). Supplemental Figure 6. a) Illustration of method for determining p-values for specificity index for one cell type. b) Illustration of analytical flow to identify cell specific and enriched genes for all cell types.

Supplemental Figure 7. Updated chip definitions improve accuracy and interpretability of TRAP experiments. a) IP/Total (log2, red) and specificity p-value (-log 10, blue), for all cell types (y-axis), for the probeset representing the known oligodendrocyte gene MBP as measured using custom chip definition files (cdf) which remove misaligned probes(18). b) Four examples of probesets for MBP using Affymetrix cdf files. Oligodendrocyte populations marked with *. c) Probeset for Purkinje cell-specific gene, PCP2, using custom cdf. d) Four of the probesets for PCP2 using Affymetrix cdf files.

Supplemental Figure 8. Dramatic differential translation of the GalNT gene family, suggest cellular specialization of Golgi apparti. a-f) Combined Specificity Index p values (blue bars, -log 10 scale) and IP/Total values (red bars, log 2) across all cell populations for a selection of the UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase golgi protein family. a) Galnt3 shows specific translation in oligodendrocyte progenitors. b) Galnt4 shows translation in astrocytic cell types. c) Galnt6 shows specific translation in mature oligodendrocytes. d)Galnt14 shows specific translation in Corticospinal/Corticpontine neurons. e) GalntL2 shows specific translation in granule cells of cerebellum. Supplemental Figure 9. Two potential sources of non-specific background a) Representative Picochip capillary electrophoretic traces from Agilent Bioanalyzer for RNA from a TRAP experiment on two cerebellums from Bergman glial (Sept4) bacTRAP mice (left panel) or wild type mice (center panel), suggest a small amount of RNA may derive from non-specific interactions of unlabeled RNA with affinity purification reagents Arrows: 18 and 28s Ribosomal RNA peaks. IPvTotal plot (right panel) shows low level of signal in known negative control genes (neuronal genes, red). Driver genes known to be highly expressed in Bergman glia (Sept4, Aldh1L1, blue) show strong enrichment, while drivers for other cerebellar cell types (Neurod1, Pcp2, Lypd6, blue) show IP/Total ratios similar to negative controls. Black Lines: 0.5, 1, 2 fold lines. Red Line: Average IP/Total ratio of negative controls. Green Line: background IP/Total ratio level suggested by non specific yield (4.5ng) divided by TRAP yield. b) Amount of non-specific RNA binding to affinity purification reagents depends on amount of tissue. Various amounts of brain tissue from wild type mice were homogenized in a consistent volume of homogenization buffer, and TRAP methodology was carried forward. Increasing amount of tissue increases non-specific background. A 1:10 w/v ratio, or less, is recommended to minimize this. c) Confocal immunofluorescence for eGFP in Stellate/Basket neuronal (Lypd6) bacTRAP line shows low level transgene expression in additional cell types. Top left, DAPI nuclear counterstain delineates layers of cerebellum (WM, white matter, GCL, granule cell layer, PCL, Purkinje cell layer, ML, molecular layer). Stellate and Basket cells of molecular layer clearly contain eGFP-L10a (top center, top right). The same eGFP image shown in range scale (blue pixels: no signal, red pixels: saturated) with excessive gain (bottom center), or normal gain (bottom left) shows trace eGFP-L10a in white matter glia (red arrow). IPvTotal (bottom right) shows clear enrichment of driver (Lypd6, blue), but moderate levels of signal from negative control genes (glial genes, red) or drivers for other cell types showing trace expression (Sept4, blue). Note

20that granule cell driver, neurod1 (green), has low signal, consistent with lack of expression in granule cells. Green line: average IP/Total ratio of neurod1 probesets. Red line: average IP/Total ratio of glial genes. Supplemental Figure 10. Recommendations and examples for TRAP experimental design. Supplemental Table 1. Positive and negative Controls. a) List of genes scored as specific to the Purkinje cell layer in the cerebellum by three independent reviewers, based on online ISH atlases(11,12). b) List of known markers for glial cell types, which may be used as negative control genes for neuronal samples. c) List of markers for neurons (neurofilaments and synaptic proteins), which may be used as negative controls for glial samples.

Table 1

Cell populations Driver Abreviations used*Drd1+ medium spiney neurons of neostriatum Drd1 CS.Drd1Drd2+ medium spiney neurons of neostriatum Drd2 CS.Drd2Cholinergic Interneurons of corpus striatum Chat CS.ChatMotor neurons of brain stem Chat BS.ChatCholinergic neurons of basal forebrain Chat BF.ChatMature oligodendrocytes of cerebellum Cmtm5 Cb.Cmtm5Astroglia of cerebellum Aldh1l1 Cb.Aldh1L1Golgi neurons of cerebellum Grm2 Cb.Grm2Unipolar brush cells and Bergman glia of cerebellum Grp Cb.GrpStellate and basket cells of cerebellum Lypd6 Cb.Lypd6Granule cells of cerebellum Neurod1 Cb.Neurod1Oligodendroglia of cerebellum Olig2 Cb.Olig2Purkinje cells of cerebellum Pcp2 Cb.Pcp2Bergman glia and mature oligos. of cerebellum Sept4 Cb.Sept4Cck+ neurons of cortex Cck Ctx.CckMature oligodendrocytes of cortex Cmtm5 Ctx.Cmtm5Cort+ interneurons of cortex Cort Ctx.CortAstrocytes of cortex Aldh1l1 Ctx.AldhL1Corticospinal, corticopontine neurons Glt25d2 Ctx.Glt25d2Corticothalamic neurons Ntsr1 Ctx.Ntsr1Oligodendroglia of cortex Olig2 Ctx.Olig2Pnoc+ neurons of cortex Pnoc Ctx.PnocMotor neurons of the spinal cord Chat SC.Chat

Figure 1

a b

Purkinje Cell IP (Pcp2)Purkinje Cell IP (Pcp2)

Cer

ebel

lum

Tot

al

Cer

ebel

lum

Tot

al

Figure 2

0%

20%

40%

60%

80%

100%

Sco

rabl

e

Uns

cora

ble

Abs

ent

0%

20%

40%

60%

80%

100%

Mot

orN

euro

ns

Pur

kinj

e C

ells

Laye

r VC

ortic

alN

euro

ns

Olig

ode

ndro

cyte

s

% S

peci

fic

High IP/TotalRandom

Olig

o

0%

20%

40%

60%

80%

100%

Mot

orN

euro

ns

Laye

r VC

ortic

alN

euro

ns

% N

ot E

xpre

ssed

High IP/TotalRandom

Pur

kinj

e C

ells

dend

rocy

tes

b

a

Figure 3

a

b

c

d e

Figure 4

c 1426412_at Neurod1

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Specificity Index p-value

% o

f sco

red

ISH

Specific Expressed Not Expressed

d

< 10-5 10-3 to 10-5

10-2 to 10-3

10-2 to 10-1

>.1

aHigh Specificity IndexHigh IPvTotal

Gng4

Crym

En2

Neurod1

0%

20%

40%

60%

80%

100%

Cb.Neu

rod1

Cb.Pcp

2

BS.Cha

t

Ctx.Cmtm

5

Ctx.Cort

Ctx.Glt2

5d2

Random IPvTotal Specificity Index

% S

peci

fic IS

H

b

Figure 5

a

b

c

Figure 6

a b

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

tv

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

Figure 7

Dlx1Entrez ID: 13390

Distal-less homeobox 1

Slc18a3Entrez ID: 20508

Solute carrier family 18, member 3

Rpl8Entrez ID: 26961

Ribosomal protein L8

ActbEntrez ID: 11461

Actin, beta

Nrxn1Entrez ID: 18189

Neurexin I

Nrxn2Entrez ID: 18190

Neurexin II

a b

c d

e f

Supplemental Table 1

A B CPostive controls for Purkinje cells

Negative controls for Neurons

Negative controls for Glia

A930006D11 Mbp Snap253110001A13Rik Aldh1l1 Cplx14933428A15Rik Cspg4 Nefh4933432P15Rik Galc NeflA730030A06 Glul NefmAdprt1 MagBcl11a MobpCapn10 MogCck Olig2Dgkz Plp1EprsGrik1Gtf2f2Hsp105Kcnab1Letm1Lhx5Ndufs3Nef3Pcp2Sec61a1Zdhhc14

CAP

Generate and CharacterizeBAC transgenic mice

AAAAAAA

CAP

Polysomes fromMotor Neurons

Polysomes fromAll other cells

will stick to affinity matrix will not stick to affinity matrix

Harvest enriched mRNA (IP) Harvest whole tissue mRNA (Total)

Hybridize Microarrays

Dissect and HomogenizeTissue of Interest

Supplemental Figure 1

Purkinje Cell IP (Pcp2)

Cer

ebel

lum

Tot

al

Cort Cell IP (Cort)

Cor

tex

Tota

la

b

Oligodendrocyte IP (Cmtm5)

Cor

tex

Tota

lC

orte

x To

tal

Cort Cell IP (Cort)

Supplemental Figure 2

Supplemental Figure 3

a

b

Purkinje Cell IP Purkinje Cell IP

Cer

ebel

lum

Tot

al

Cer

ebel

lum

Tot

al

All IPsAll Totals

Expression Level

Num

ber o

f Pro

bese

ts

a

b

c

Supplemental Figure 4

Astrocytes IP

Granule Cell IP

Cer

ebel

lum

Tot

alC

ereb

ellu

m T

otal

Pur

kinj

e C

ell I

P

Granule Cell IP

Establish background threshold based on IP/Total of negative control probesets

b

c

Supplemental Figure 5

Pur

kinj

e C

ell I

P

Granule Cell IP

Pur

kinj

e C

ell I

P

Granule Cell IP

Pur

kinj

e C

ell I

P

Drd1a MSN IP Drd1a MSN IP

Pur

kinj

e C

ell I

P

a

Purkinje Cell IP (Pcp2)

Cer

ebel

lum

Tot

al

Purkinje Cell IP (Pcp2)

Cer

ebel

lum

Tot

al

Purk

inje

Cel

l IP

Granule Cell IP

For each cell compare IP to Total

Remove probesets below threshold, and low expressed

Using filtered list for any IP versus IP comparisons removes majority of background

Pre-Filter Post-Filter

Pre-Filter Post-Filter

a

Supplemental Figure 6

b

* * * *

Supplemental Figure 7

** * * * * * * *

* * * * * * * ** * * * * * * *

* * * *

* * * *

MbpEntrez ID: 17196

Myelin basic protein

Pcp2Entrez ID: 18545

Purkinje cell protein 2(L7)

1419084_a_atPcp2

1453207_atPcp2

1429583_atPcp2

1424944_atPcp2

1451961_a_atMbp

1425264_s_atMbp

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

CS.

Drd

1C

S.D

rd2

CS.

Cha

tBS

.Cha

tBF

.Cha

tC

b.C

mtm

5C

b.Al

dh1L

1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp2

Cb.

Sept

4C

tx.C

ckC

tx.C

mtm

5C

tx.C

ort

Ctx

.Ald

hL1

Ctx

.Glt2

5d2

Ctx

.Nts

r1C

tx.O

lig2

Ctx

.Pno

cSC

.Cha

t

1425263_a_atMbp

1425264_s_atMbp

a

c

b

d

Galnt4Entrez ID: 14426

UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 4

Galnt3Entrez ID: 14425

UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 3

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

Galnt6Entrez ID: 207839

UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 6

Galnt14Entrez ID: 71685

UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 14

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

CS

.Drd

1C

S.D

rd2

CS

.Cha

tB

S.C

hat

BF.

Cha

tC

b.C

mtm

5C

b.A

ldh1

L1C

b.G

rm2

Cb.

Grp

Cb.

Lypd

6C

b.N

euro

d1C

b.O

lig2

Cb.

Pcp

2C

b.S

ept4

Ctx

.Cck

Ctx

.Cm

tm5

Ctx

.Cor

tC

tx.A

ldhL

1C

tx.G

lt25d

2C

tx.N

tsr1

Ctx

.Olig

2C

tx.P

noc

SC

.Cha

t

Galnt2Entrez ID: 108148

UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 2

Galntl2Entrez ID: 78754

UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase-like 2

a b

c d

e f

Supplemental Figure 8

Supplemental Figure 9

74.1 ngs 4.4 ngs

Dapi Egfp Overlay

NormalGain

ExcessGain

PCL

ML

GCL

WM

a

c

Cer

ebel

lum

Tot

al

Lypd6 IP

Cer

ebel

lum

Tot

al

Bergman Glial IP

ngs

RN

A fr

om

wild

type

bra

in ti

ssue

0

2

4

6

8

10

12

14

125 mg/ml 275 mg/ml 525 mg/ml

b

Tissue input per ml homogenization buffer

Supplemental Figure 10

Design1) Define question2) Select line appropriate for cell

type3) Plan to balance conditions

across batches

Immunoprecipitation and Microarrays

1) Harvest paired conditions together in one batch and collect total polysome sample2) Quantify RNA carefully, and start with identical amounts of all samples for amplification3) Amplify and hybridize all batches together, if feasible

Analysis1) GCRMA normalize together only those samples that should have the same distribution. Global normalize subsequently to biotinylated spike ins.2) Compare IPs to Total. Calculate a background threshold using the IP/Total ratio for negative controls. Remove from further analysis those probesets with IP/Total below this threshold.3) Conduct statistical analysis on remaining probesets.

Anatomy1) Confirm transgene expression2) Does manipulation alter

expression?

Design: MECP2 in glia1) What is the impact on MECP knockout on cortical astrocytes in vivo?2) The previously generated Aldh1L1 JD130 line is expressed exclusively in astrocytes [1]. Cross MECP2 KO with Aldh1L1 line to generate breeders.3) Plan for three batches. Each batch is three bacTRAP/MECP null mice and three litermate bacTRAP only controls.

Anatomy: MECP2 in glia1) Aldh1L1 bacTRAP line was previously and thoroughly characterized [1]. Skip this step.2) In first litter of bacTRAP/MECP null mice, confirm transgene isstill expressed uniquely in astrocytes, and at comparable levels to littermate controls.

IP and Microarrays: MECP2 in glia1) Day 1(3-4 hours): Harvest cortices from three MECP2 null/bacTRAP mice and 3 bacTRAP littermate controls. Pool MECP2 null and control tissue separately. Homogenize and prepare polysomes. Prior to immunoaffinity step, set aside 20 ulsfor Total polysome sample. Complete immunoaffinity purification, and RNA extraction of Total and IP’d RNA until isopropanol precipitation. Store RNA at -80. Day 2:Repeat day 1 with second batch of 3 MECP2 null and 3 controls.Day 3:Repeat day 1 with third batch of 3 MECP2 null and 3 controls.2) Day 4 (2 hours): Complete purification all three batches of frozen RNA. Quantify with Ribogreen assay. Assure that RNA integrity is above 8 for all samples with Bioanalyzer assay. 3) Day 4-6: From each sample, take 20 ngs of RNA, and begin Affymetrix two cycle amplification for all twelve samples. Carry through amplification and hybridization of all samples together.

Analysis: MECP2 in glia1) GCRMA all Total samples together. GCRMA together all IP’dsamples (from MECP2 null and controls). Normalize all samples(total and IP) together to spike in controls2) Calculate IP/Total for a list (Supplemental Table 2) of non-astrocyte probesets (ie neuron specific genes). Remove all probesets below the Mean + 2 S.D. of the ratios on this list from further analysis for astrocytes.3) Use the Limma module of Bioconductor to detect those genes that change significantly between MECP2 null IP and control IP. These represent the astrocyte’s response to the knock out. Genes that change significantly between the MECP2 null Total and control Total samples will represent the response of the other cells in the tissue. These can be compared listwise or statistically to determine the astrocyte specific response.

Principles Example