single-cell analysis the drosophila embryo at single-cell ... · embryo, and expression patterns...

7
RESEARCH ARTICLE SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell transcriptome resolution Nikos Karaiskos, 1 * Philipp Wahle, 2 * Jonathan Alles, 1 Anastasiya Boltengagen, 1 Salah Ayoub, 1 Claudia Kipar, 2 Christine Kocks, 1 Nikolaus Rajewsky, 1 Robert P. Zinzen 2 By the onset of morphogenesis, Drosophila embryos consist of about 6000 cells that express distinct gene combinations. Here, we used single-cell sequencing of precisely staged embryos and devised DistMap, a computational mapping strategy to reconstruct the embryo and to predict spatial gene expression approaching single-cell resolution. We produced a virtual embryo with about 8000 expressed genes per cell. Our interactive Drosophila Virtual Expression eXplorer (DVEX) database generates three-dimensional virtual in situ hybridizations and computes gene expression gradients. We used DVEX to uncover patterned expression of transcription factors and long noncoding RNAs, as well as signaling pathway components. Spatial regulation of Hippo signaling during early embryogenesis suggests a mechanism for establishing asynchronous cell proliferation. Our approach is suitable to generate transcriptomic blueprints for other complex tissues. I ntricate gene regulatory networks produce and maintain complex assemblies of spe- cialized cells such as tissues and organs. To unravel the underlying gene expression dy- namics, considerable efforts have been made to compare tissue-specific materials (14). Cell culture often constitutes a poor proxy for in vivo complexity, and dissected tissues are composed of heterogeneous cell populations (310). An al- ternative is isolation of specific cell types through cell sorting (1114); however, pooled cells obscure heterogeneity, and expression in rare populations of cells may not be detectable. Furthermore, tran- scriptional relationships at the single-cell level, such as exclusivity and concomitancy of expres- sion of groups of genes, cannot be distilled. This restricts our ability to infer gene regulatory re- lationships and to predict what functional roles individual cells play and how they integrate with their spatial environment. With the advent of single-cell expression profiling, it has become pos- sible to assess the transcriptomic landscape of complex cell mixtures with single-cell resolution, thereby allowing insights into differentiation tra- jectories, cell fate decisions, spatial relationships, and rare cell types (1521). The Drosophila melanogaster embryo has been an exquisite model for the patterning principles that shape cellular identities. The fertilized egg undergoes 13 rapid nuclear divisions, resulting in a syncytial embryo of ~6000 nuclei. By develop- mental stage 5, nuclei have moved to the embryo periphery, become surrounded by cell membranes, and spatial gene expression patterns emerge as cells translate anteroposterior and dorsoventral positional information into transcriptional re- sponses [e.g., (22)]. Stage 6 is marked by the first morphogenetic movements after cellularization completes, and gene expression around this stage has been extensively assayed in whole embryos [e.g., (23)], in mutants converting entire embryos to germ layers (4), and in dissected slices (24). Available in situ databases present systemati- cally annotated spatial gene expression (25, 26), but they often stop short of single-cell resolu- tion, direct comparison of several genes per cell, genome-wide profiling of all transcripts, including noncoding RNAs, and quantitative assessment of gene expression. To overcome these problems, we used mas- sively parallel droplet-based single-cell sequenc- ing (Drop-seq) (19) and quantified gene expression across >10,000 fixed cells from dissociated embryos (27 ) at a median depth of thousands of genes per cell. Computational analysis of the high-resolution in situ patterns of 84 genes (28) indicated that most, if not all, cells of the fly embryo have in- dividual transcriptional identities, highlighting the need to resolve the embryo at single-cell resolu- tion. Previous efforts to map sequenced embry- onic cells back to their origin [e.g., (21)] did so by reducing mapping complexity (e.g., by binning the entire zebrafish embryo composed of thou- sands of cells into 128 expression regions), and these methods could not correctly map our data at the required resolution. Therefore, we devised a mapping strategy and algorithm (DistMap) based on spatially distributed scores. The result- ing virtual embryo gives access to single-cell tran- scriptome information at higher spatial resolution (87% of cells in the embryo are confidently re- solved) and depth (>8000 genes per cell). Deconstructing the transcriptomic state of the embryo To assess genome- and embryo-wide transcrip- tome diversity with single-cell resolution, we hand- selected embryos at the onset of gastrulation (stage 6) (Fig. 1A). Cells were extracted from >5000 pre- cisely staged embryos, methanol fixed (27 ), and sequenced in seven Drop-seq runs corresponding to five biological replicates (table S2), resulting in a total of ~7975 sequenced D. melanogaster cells (table S3 and Fig. 1A). The vast majority of the sequenced cells (>90%) represented single-cell transcriptomes, as assessed by mixing cells from stage-matched D. melanogaster and D. virilis embryos (fig. S1A and tables S2 and S3). Gene expression correlated well across Drop-seq repli- cates (R > 0.94) and between Drop-seq and stage- matched, unfixed, whole embryos (R > 0.88) (fig. S1B) and was consistent with absolute quantifi- cation in individual stage-matched embryos (29) (fig. S1C). To concentrate on cells with unambiguous pat- terning information, we excluded pole cells, yolk nuclei, and cell doublets from further analysis. Pole cells constitute a discrete lineage and con- tribute only to the germ line (30), whereas yolk nuclei function primarily in energy metabolism (31). Pole cells and yolk nuclei readily separated in a principal components analysis (fig. S1E). Sim- ilarly, t-distributed stochastic neighbor embedding (t-SNE) representation (32) correctly groups cells according to cell type, while a central cluster of doublets (cells expressing markers of distinct dor- soventral territories, i.e., mesoderm, neurectoderm, and dorsal ectoderm; see table S4) emerges cen- trally (fig. S1E, inset, and fig. S1D). Furthermore, we considered only cells with 12,500 unique tran- scripts and expressing more than five genes of the Berkeley Drosophila Transcription Network Project (BDTNP) reference atlas (see below). The remaining ~1300 high-quality cells had a median unique transcript number of >20,800 mapping to >3100 genes (fig. S2C). These cells separated along the first two principal components by dor- soventral identities (fig. S2D) but not by biolog- ical replicates (fig. S2E). The embryos contained a DsRed reporter transgene under control of a ventral neurogenic ectodermal vnd enhancer (33), and DsRed transcripts were primarily detected in a subset of cells that also score highly for broad neurectodermal markers (fig. S2D). We in silico dissectedthe embryo by merging the tran- scriptomes of cells expressing specific markers for various dorsoventral territories and found that the merged transcriptomes accurately reflected gene expression in the respective domains (fig. S2, A and B). This dissection procedure is versa- tile and represents a distinct advantage of single- cell analysis over traditional bulk analyses (see supplementary note 1). We concluded that individual cells were se- quenced without batch effects or bias for par- ticular cellular identities and that our Drop-seq RESEARCH Karaiskos et al., Science 358, 194199 (2017) 13 October 2017 1 of 6 1 Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany. 2 Systems Biology of Neural Tissue Differentiation, BIMSB, MDC, 13125 Berlin, Germany. *These authors contributed equally to this work. Corresponding author. Email: [email protected] (N.R.); [email protected] (R.P.Z.) on March 19, 2020 http://science.sciencemag.org/ Downloaded from

Upload: others

Post on 15-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

RESEARCH ARTICLE◥

SINGLE-CELL ANALYSIS

The Drosophila embryo at single-celltranscriptome resolutionNikos Karaiskos,1* Philipp Wahle,2* Jonathan Alles,1 Anastasiya Boltengagen,1

Salah Ayoub,1 Claudia Kipar,2 Christine Kocks,1 Nikolaus Rajewsky,1† Robert P. Zinzen2†

By the onset of morphogenesis, Drosophila embryos consist of about 6000 cells thatexpress distinct gene combinations. Here, we used single-cell sequencing of preciselystaged embryos and devised DistMap, a computational mapping strategy to reconstructthe embryo and to predict spatial gene expression approaching single-cell resolution.We produced a virtual embryo with about 8000 expressed genes per cell. Our interactiveDrosophila Virtual Expression eXplorer (DVEX) database generates three-dimensionalvirtual in situ hybridizations and computes gene expression gradients. We usedDVEX to uncover patterned expression of transcription factors and long noncodingRNAs, as well as signaling pathway components. Spatial regulation of Hippo signalingduring early embryogenesis suggests a mechanism for establishing asynchronouscell proliferation. Our approach is suitable to generate transcriptomic blueprints forother complex tissues.

Intricate gene regulatory networks produceand maintain complex assemblies of spe-cialized cells such as tissues and organs. Tounravel the underlying gene expression dy-namics, considerable efforts have beenmade

to compare tissue-specific materials (1–4). Cellculture often constitutes a poor proxy for in vivocomplexity, and dissected tissues are composedof heterogeneous cell populations (3–10). An al-ternative is isolation of specific cell types throughcell sorting (11–14); however, pooled cells obscureheterogeneity, and expression in rare populationsof cellsmay not be detectable. Furthermore, tran-scriptional relationships at the single-cell level,such as exclusivity and concomitancy of expres-sion of groups of genes, cannot be distilled. Thisrestricts our ability to infer gene regulatory re-lationships and to predict what functional rolesindividual cells play and how they integrate withtheir spatial environment. With the advent ofsingle-cell expression profiling, it has become pos-sible to assess the transcriptomic landscape ofcomplex cell mixtures with single-cell resolution,thereby allowing insights into differentiation tra-jectories, cell fate decisions, spatial relationships,and rare cell types (15–21).TheDrosophilamelanogaster embryo has been

an exquisite model for the patterning principlesthat shape cellular identities. The fertilized eggundergoes 13 rapid nuclear divisions, resulting in

a syncytial embryo of ~6000 nuclei. By develop-mental stage 5, nuclei have moved to the embryoperiphery, become surroundedby cellmembranes,and spatial gene expression patterns emerge ascells translate anteroposterior and dorsoventralpositional information into transcriptional re-sponses [e.g., (22)]. Stage 6 is marked by the firstmorphogenetic movements after cellularizationcompletes, and gene expression around this stagehas been extensively assayed in whole embryos[e.g., (23)], inmutants converting entire embryosto germ layers (4), and in dissected slices (24).Available in situ databases present systemati-cally annotated spatial gene expression (25, 26),but they often stop short of single-cell resolu-tion, direct comparison of several genes per cell,genome-wide profiling of all transcripts, includingnoncoding RNAs, and quantitative assessment ofgene expression.To overcome these problems, we used mas-

sively parallel droplet-based single-cell sequenc-ing (Drop-seq) (19) andquantified gene expressionacross>10,000 fixed cells fromdissociated embryos(27 ) at a median depth of thousands of genes percell. Computational analysis of thehigh-resolutionin situ patterns of 84 genes (28) indicated thatmost, if not all, cells of the fly embryo have in-dividual transcriptional identities, highlightingthe need to resolve the embryo at single-cell resolu-tion. Previous efforts to map sequenced embry-onic cells back to their origin [e.g., (21)] did so byreducing mapping complexity (e.g., by binningthe entire zebrafish embryo composed of thou-sands of cells into 128 expression regions), andthese methods could not correctly map our dataat the required resolution. Therefore, we deviseda mapping strategy and algorithm (DistMap)based on spatially distributed scores. The result-ing virtual embryo gives access to single-cell tran-

scriptome information at higher spatial resolution(87% of cells in the embryo are confidently re-solved) and depth (>8000 genes per cell).

Deconstructing the transcriptomic stateof the embryo

To assess genome- and embryo-wide transcrip-tomediversitywith single-cell resolution,we hand-selected embryos at the onset of gastrulation (stage6) (Fig. 1A). Cells were extracted from >5000 pre-cisely staged embryos, methanol fixed (27 ), andsequenced in sevenDrop-seq runs correspondingto five biological replicates (table S2), resulting ina total of ~7975 sequenced D. melanogaster cells(table S3 and Fig. 1A). The vast majority of thesequenced cells (>90%) represented single-celltranscriptomes, as assessed by mixing cellsfrom stage-matchedD.melanogaster andD. virilisembryos (fig. S1A and tables S2 and S3). Geneexpression correlated well across Drop-seq repli-cates (R> 0.94) and betweenDrop-seq and stage-matched, unfixed, whole embryos (R > 0.88) (fig.S1B) and was consistent with absolute quantifi-cation in individual stage-matched embryos (29)(fig. S1C).To concentrate on cells with unambiguous pat-

terning information, we excluded pole cells, yolknuclei, and cell doublets from further analysis.Pole cells constitute a discrete lineage and con-tribute only to the germ line (30), whereas yolknuclei function primarily in energy metabolism(31). Pole cells and yolk nuclei readily separatedin a principal components analysis (fig. S1E). Sim-ilarly, t-distributed stochastic neighbor embedding(t-SNE) representation (32) correctly groups cellsaccording to cell type, while a central cluster ofdoublets (cells expressingmarkers of distinct dor-soventral territories, i.e., mesoderm, neurectoderm,and dorsal ectoderm; see table S4) emerges cen-trally (fig. S1E, inset, and fig. S1D). Furthermore,weconsidered only cells with ≥12,500 unique tran-scripts and expressing more than five genes ofthe BerkeleyDrosophila Transcription NetworkProject (BDTNP) reference atlas (see below). Theremaining ~1300 high-quality cells had amedianunique transcript number of >20,800 mappingto >3100 genes (fig. S2C). These cells separatedalong the first two principal components by dor-soventral identities (fig. S2D) but not by biolog-ical replicates (fig. S2E). The embryos containeda DsRed reporter transgene under control of aventral neurogenic ectodermal vnd enhancer (33),andDsRed transcripts were primarily detected ina subset of cells that also score highly for broadneurectodermalmarkers (fig. S2D).We “in silicodissected” the embryo by merging the tran-scriptomes of cells expressing specific markersfor various dorsoventral territories and found thatthe merged transcriptomes accurately reflectedgene expression in the respective domains (fig.S2, A and B). This dissection procedure is versa-tile and represents a distinct advantage of single-cell analysis over traditional bulk analyses (seesupplementary note 1).We concluded that individual cells were se-

quenced without batch effects or bias for par-ticular cellular identities and that our Drop-seq

RESEARCH

Karaiskos et al., Science 358, 194–199 (2017) 13 October 2017 1 of 6

1Systems Biology of Gene Regulatory Elements, BerlinInstitute for Medical Systems Biology (BIMSB), MaxDelbrück Center for Molecular Medicine in the HelmholtzAssociation (MDC), 13125 Berlin, Germany. 2SystemsBiology of Neural Tissue Differentiation, BIMSB, MDC, 13125Berlin, Germany.*These authors contributed equally to this work. †Correspondingauthor. Email: [email protected] (N.R.);[email protected] (R.P.Z.)

on March 19, 2020

http://science.sciencem

ag.org/D

ownloaded from

Page 2: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

data accurately reflect the transcriptomic state ofindividual embryonic cells.

Spatial reconstruction of the embryo

High-quality cells could be grouped into nineprominent cell clusters in the stage 6 embryo

(fig. S2F). Genes well known for their roles inembryonic patterning and tissue specificationwere readily identified (table S5 and supplemen-tary note 2). Spatial expression of cluster-specificdriver genes, as assessed by in situ hybridization(25, 26), was similar within clusters, and spatial

coherence within clusters was further supportedby gene ontology (GO) term enrichments (fig. S3and supplementary note 3).As might be expected, transcription factors

were prevalent among the genes that drive clus-ter identity (table S5; e.g., fkh, kni, grn, hb,Abd-B,ken oc, and toy). Several of the highly variablegenes remain unstudied in early development[see table S5; e.g., DNaseII, Z600, Meltrin, mtt,Atx-1, and several genes without names (CGs)],although their cluster-specific expression suggestsfunctional roles in early embryogenesis. Further-more, numerous long noncodingRNAs (lncRNAs)and amicroRNA precursor are among themostvariable genes (table S5; e.g., CR44683, CR43302,CR43279,CR41257, andCR45185), which indicatesan unrecognized role for lncRNAs in early embry-onic patterning and development.We sought to map cells back to their position

of origin to produce a virtual embryo with single-cell transcriptome resolution by using a databaseof known in situ markers (Fig. 1B). The BDTNPgenerated in situ hybridization data for 84 genes,resulting in a quantitative high-resolution geneexpression reference atlas with substantial com-binatorial complexity (28). To correlate these84 marker genes with our single-cell transcrip-tomes, we binarized the BDTNP atlas by manu-ally choosing thresholds for each gene (25, 26)(Fig. 2A.I). The combinatorial expression of these84 binarized BDTNPmarkers sufficed to unique-ly classify almost every positionwithin the embryo(fig. S5A). Attempts to map our single-cell datausing previously published algorithms (20, 21)were unsuccessful. We therefore designed amapping strategy based on distributed map-ping scores (DistMap) (see the supplementarymaterials). We binarized the Drop-seq data (Fig.2A.II) (see the supplementary materials), thencompared the profiles of each cell against eachbin, collected the (mis)matches into confusionmatrices, and computed Matthews correlationcoefficients (MCCs) for every cell-bin combination(Fig. 2A.III). The result is a distributed mappingscore for any sequenced cell across all embryonicpositions (Fig. 2A.IV). Figure 2B exploresDistMap’sefficacy and demonstrates that sequenced cellsmapped to a few cell diameters with high con-fidence (red) approaching the accuracy of thebinarized bins themselves (green), whereas ran-dom mapping positions are spread throughoutthe whole embryo (blue). Due to the high tran-scriptional complexity, we reasoned that map-ping a cell to multiple likely positions would bemore meaningful than assigning it to a singlelocation. DistMap coversmost of the embryo (Fig.2C), assigns cells confidently (fig. S5C), and allowsthe quantification of many more genes per binthan initially detected in individual sequencedcells, so that most bins exhibit 6500 to 8500expressed genes (Fig. 2D).To compute the spatial expression of a gene,

we combined normalized gene expression percell with the MCC scores for every cell-bin pair(see the supplementary materials). This allowsquerying any given gene across all cells of thevirtual embryo and produces a virtual in situ

Karaiskos et al., Science 358, 194–199 (2017) 13 October 2017 2 of 6

Fig. 1. Deconstructing and reconstructing the embryo by single-cell transcriptomics combined withspatial mapping. (A) Single-cell sequencing of the Drosophila embryo: ~1000 handpicked stage 6fly embryos are dissociated per Drop-seq replicate, cells are fixed and counted, single cells arecombined with barcoded capture beads, and libraries are prepared and sequenced. Finally, single-celltranscriptomes are deconvolved, resulting in a digital gene expression matrix for further analysis.(B) Mapping cells back to the embryo: Single-cell transcriptomes are correlated with high-resolutiongene expression patterns across 84 marker genes, cells are mapped to positions within a virtualembryo, and expression patterns are computed by combining the mapping probabilities withthe expression levels (virtual in situ hybridization).

Fig. 2. Reconstructing the embryo by spatial mapping based on distributed scores. (A) DistMap.The 84 BDTNP gene expression patterns (I) and the single-cell expression profiles (II) were binarized.(III) Confusion matrices are calculated scoring expression (dis)agreement between the transcriptomesand the ~3000 positional bins of the reference atlas. Matthews correlation coefficients (MCCs) arecalculated for every cell/bin combination. (IV) Positional assignment for each cell is distributed basedon MCCs across all bins. (B) Density plot showing mapping confidence (mean Euclidean distance)between a cell’s highest scoring location and the following six locations. Single-cell transcriptomes(red) map to embryo positions with similar confidence as cells of the reference atlas (green). (C) Bincoverage across the embryo. More than 87% of all locations in the embryo are confidently covered(P < 0.05; see the supplementary materials for details). (D) The virtual fly embryo has a resolution of6000 to 8000 genes per cell.

RESEARCH | RESEARCH ARTICLEon M

arch 19, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 3: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

hybridization (vISH). To assess prediction quality,we computed vISHs of the 84 BDTNP-mappedgenes using all high-quality cells, as well assubsamples, and compared the resulting vISHsagainst the BDTNP database. The discrepanciessaturated by ~750 cells (fig. S5B), so that themapping would only bemarginally improved byincluding more than our full set of ~1300 high-quality cells.To uncover the spatial signature of the nine

single-cell clusters described above (Fig. 3A), wecalculated the average mapping scores per binacross all cells per cluster (Fig. 3B). The concor-dance of the spatial signatures of these nine princi-ple clusterswith regionally confineddevelopmentalfates [e.g., (34, 35)] is striking. Although cluster 4largely encompasses the primordiumof themeso-derm, cluster 3 corresponds to the future neur-ectoderm and ventral epidermis, and cluster 6corresponds to the dorsal epidermis and extra-embryonic tissues. Cluster 9 in anteroventral re-gions corresponds to the future esophagus andpharynx, whereas clusters 2 and 7will give rise tothe anterior and posterior midgut upon invagi-nation. Furthermore, this spatial clustermappingis in agreement with GO term enrichments (seesupplementary note 3).

The virtual embryo predicts spatialgene expression

Withsingle-cell transcriptomes confidentlymappedonto the embryo, each of the positional bins canbe individually queried for gene expression. Ouronline Drosophila Virtual Expression eXplorer(DVEX) (www.dvex.org) allowsgenerationof vISHs

for single genes and combinations. Predictions aredisplayed on a virtual embryo inmultiple orienta-tions (Fig. 4A), and expression gradients can beestimated along the anteroposterior and dorso-ventral axes (e.g., Fig. 4B). Furthermore, DVEXprovides an interactive environment to explore thet-SNE representation, gene expression in clusteredcells, and genes driving clustering (e.g., Fig. 4C).We observed close concordance between vISH

predictions and expression detected by RNAin situ hybridization for genes expressed in awidevariety of patterns (Fig. 5 and fig. S4), includingvnd::DsRed reporter expression in the ventralneurectoderm. Many of the genes shown werenot previously known to be patterned at stage 6.Especially striking were predictions restricted tosmall patches of cells; expression limited to a fewcells is often undetectable in traditional tran-scriptome studies but is resolved by vISH (Fig. 5and fig. S4).vISH can be used to identify genes with distinct

spatial patterns.Wepredicted the spatial expressionof the 476 most highly variable genes, clusteredtheir correlationmatrix and identified 10 parentalbranches, which generate archetypal expressionpatterns when averaged (fig. S5E). These archetypesreflect the predominant transcriptional patterningresponses, identify gene sets that respond tosimilar regulatory cues, and allow the discovery ofunstudied and unusual gene expression patterns.

Identification of potentialdevelopmental regulators

We generated vISHs for >150 DNA-binding tran-scription factors that are detectably expressed in

the stage 6 embryo (fig. S6), many of which areunstudied or understudied with respect to earlydevelopment. Additionally,wepredictedpatternedexpression of 16 genes that contain DNA bindingdomains (36). These are likely transcription factors(fig. S6A), and we experimentally validated thevISH predictions for two out of two patternedcandidates, CG34224 and CG10553 (Fig. 5B). Thiscomprehensive overview of transcription factorexpression allows spatial assessment of regulatorcombinations that may activate or restrict targetgenes locally (fig. S6B).Several lncRNAs have also been shown to be

potent regulators [e.g., (37 )], but have not beenassayed globally and systematically in earlyD.melanogasterdevelopment. By screeningDVEX,we identified dozens of expressed and patternedlncRNAs (fig. S6C). The lncRNAs CR44317, CR44691,and CR45693, for example, are weakly expressed,rendering thembarely detectable inwhole-embryosequencing data (23); however, RNA in situ hybrid-ization showed reliable transcript signals in thepredicted spatial domains (Fig. 5C and fig. S4).Additionally, vISH predictions for CR45559 andCR44917 were partially confirmed (Fig. 5C andfig. S4). The expression patterns of these lncRNAsrange from dorsoventral modulation to gap-,terminal-, and pair-rule patterns.CR43432 expressionpredictionwasparticularly

unusual, because it combines ventral, posterior,and dorsal aspects. CR43432 appears to “wraparound” lateral regions of the embryo, and itappears to be specifically excluded from theneurectodermby vISH (Fig. 5D). In fact, expressionis strongly anticorrelated with the neurectoderm

Karaiskos et al., Science 358, 194–199 (2017) 13 October 2017 3 of 6

Fig. 3. Sequenced cells cluster by spatialidentity. (A) Two-dimensional t-SNE represen-tation of the high-quality cells shows ninemajor clusters grouped by transcriptomesimilarity. (B) Mapping of clusters reveals thatcells within each cluster share a contiguousspatial domain.

Fig. 4. DVEX accurately predicts spatial gene expression patterns. DVEX is the online resourcefor the virtual embryo. (A) Virtual in situ hybridization (vISH) for the pair rule gene ftz (red) andthe mesodermal gene sna (green) in five orientations. Stippled box indicates cells analyzed in (B).EL, egg length; DV, dorsoventral; AP, anteroposterior. (B) Quantification of relative expression per cellmapped along an axis (here, dorsoventral) for stumps (expressed in the ventral mesoderm, left)and the vnd::DsRed reporter (primarily expressed in the ventral neurectoderm, right). Relative expressionin log space; thresholds were 0.85, embryos are oriented anterior left. (C) Examples of marker genesand their expression in t-SNE clustered cells. Expression indicated, gray (low) to red (high).

RESEARCH | RESEARCH ARTICLEon M

arch 19, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 4: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

marker SoxN at the single-cell transcriptomelevel, and double ISH confirmsmutually exclusiveexpression (Fig. 5D). Additionally, CR43432 ishighly expressed in yolk nuclei (Fig. 5D, right).The complementary non-neurogenic expres-sion of CR43432 suggests that it might act todelimit neurogenic genes or to promote non-neurogenic fates. In total, we discovered ~40lncRNAs predicted in a multitude of patterns(fig. S6C). Taken together, vISH is a powerful toolto discover novel putative regulators of embryonicpatterning.

Cell communication by spatiallyregulated signaling

The Hippo signaling pathway is a major regu-lator of organ size, cell cycle, and proliferation(38, 39) but has to our knowledge not been im-plicated in the early embryo. By querying wheretranscripts of ligands, ligandmodulators, recep-tors, and signal transducers are expressed, weidentified patterned expression of major Hipposignaling components along the anteroposterioraxis, with overlapping expression primarily in

an anterior domain (Fig. 6A). Coexpression ofthese molecules may promote Hippo signaling,which culminates in the phosphorylation of thetranscription factor Yorkie, thereby diminish-ing Yorkie’s nuclear localization (38, 39). Aftermitotic arrest at stage 5, cell cycle reentry isdelayed in an anterior region (40), and it isconceivable that active Hippo signaling in thatdomain delays mitotic onset. Using antibodiesagainst Yorkie and amitosis marker (phospho-rylated histone H3), we detected higher nuclear-cytoplasmic ratios of Yorkie in cells undergoingmitosis in anterior patches at about stage 7(Fig. 6B), suggesting active Hippo signaling inintervening regions. To our knowledge Hipposignaling has previously not been implicatedin cell-cycle regulation in early Drosophiladevelopment.Additionally, we predict that components of

other signaling pathways are expressed in aspatially restricted fashion, including alternateligands, receptors, and antagonists of Dpp/TGFb(fig. S7A). Our experimental data suggests ante-rior repression of the TGFb signaling cascade

(fig. S7B) (see supplementary note 4). Hence,by analyzing the expression of signaling mole-cules in the embryo with spatial resolution, weare able to predict where signals originate andwhere they can be transduced.

Detection of evolutionary geneexpression changes

Several cis- and trans-regulatory circuits have di-verged between D. melanogaster and D. virilis[e.g., (41, 42)].We askedwhether changes in geneexpression patterns over the course of speciationmight be detectable using DVEX. Clustering ob-tained from673 stage 6D. virilishigh-quality cells(fig. S8A)boreastrikingsimilarity toD.melanogasterwith respect to cluster number, proportional clus-ter size, and cluster mapping by vISH (compareFig. 3 and fig. S8A). Gene expression correlationbetween merged transcriptome data of the twospecies was high (R = 0.77). We used the virtualembryos of D. melanogaster and D. virilis to sys-tematically compute vISHs, compare orthologs,and identify divergences.The genesCG6660andGJ14350 arehomologous

by protein conservation and genomic synteny(table S8), with CG6660 predicted not to be ex-pressed inD.melanogaster (fig. S8B, left), whereasGJ14350 was predicted to be expressed in ananteroposterior stripe-modulated pattern inD. virilis (fig. S8C, left). By RNA in situ hybridiza-tion,CG6660wasnot detectable,whereasGJ14350was expressed in stripes similar to the prediction(fig. S8C). For the homologous pair fok/GJ17890(table S8), fok was predicted and verified byin situ hybridization to be expressed in an anteriorventral patch in D. melanogaster (fig. S8D), butvISH predicted absence of the anterior patchand weak posterior expression of GJ17880 inD. virilis. In situ hybridization inD. virilis showedthat, although there is a tendency for lowposteriorexpression ofGJ17880 as early as stage 6/7 byRNAin situ hybridization, robust posterior stainingwas not seen until stage 8; however, the absenceof anterior expression in D. virilis was confirmed(fig. S8E). These examples illustrate that DVEXcan serve as a sensitive tool for the identificationof gene expression changes.

Discussion

Here,we resolved ametazoan embryo composed of~6000 (or ~3000 when considering bilateral sym-metry) individual cells. Although the Drosophilaembryo may be an extreme example, in whicheach cell has a unique transcriptional profile,the transcriptomes of neighboring cells can bevery similar to each other. To successfully mapdissociated and sequenced cells to their correctspatial position based on combinatorial expressionof marker genes, we required a suitable set ofmarker genes, deep capture of gene expressionin each cell, and powerful computationalmappingto be able to confidently score differences betweenan enormous number of mapping possibilities.To illustrate the latter point, if considering only1000 sequenced cells across only 1000 locations,one already would have to calculate one millionpossibilities.

Karaiskos et al., Science 358, 194–199 (2017) 13 October 2017 4 of 6

Fig. 5. Prediction accuracy and detection of new regulators. (A) vISH predictions are accurateacross a wide variety of expression patterns. Expression of CGs had not been reported previously.(B) Patterned expression of putative transcription factors. (C) Patterned expression of lncRNAs.(D) CR43432 and pan-neurogenic genes are expressed in complementary patterns. Dual vISHof SoxN and CR43432 (top left); double in situ hybridization validates the predicted expression.CR43432 is additionally expressed in yolk nuclei (not shown in vISH).

RESEARCH | RESEARCH ARTICLEon M

arch 19, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 5: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

We were able to overcomethese challenges and to producea “virtual embryo” with ~8000genes per cell for three mainreasons. First, the 84 in situmarkers captured sufficient spa-tial transcriptional complexity toallow us to guidemapping of eachsequenced cell to its positions.Second, we optimized our Drop-seq approach to reliably capturethousands of genes per cell. Third,and perhaps most important,we devised a mapping strategy,DistMap, which reliably mapssingle-cell transcriptomes back totheir origin. DistMap is scalableand extendable to other three-dimensional tissues at single-cell resolution. This is becauseDistMap usesmeasured gene ex-pression and does not requiretranscript-level imputation, andits scoring scheme is suitable forsparse data sets. Additionally, dis-tributedmapping limits the effectof outliers and populates posi-tions with transcript informa-tion beyond the base sequencinglevel; in thisway, fromanoriginaldepth of ~3500 genes capturedper cell, we were able to assign,on average, ~8000 genes per cell.Nevertheless, DistMap clearly canbe improved in several respects;for example, it currently usesbinarized rather than continuousdata and maps each cell inde-pendently, rather then allowingmapped cells to improve subse-quent scoring.Once a virtual embryo has

been produced, what kind ofbiology can be learned?We firstbuilt a computational platform(DVEX) that allows interactiveinterrogation of single-cell tran-scriptome data in spatial context, including thecomputation of gradients. We then leveragedDVEX to compute thousands of virtual in situsand to select genes that had interesting expres-sion patterns. For example, we identified pat-terned transcription factors never implicated inearly development before, as well as dozens oflncRNAs with intriguing and sometimes novelexpression patterns. Because we used a secondfly species to control for cell doublet frequency,we incidentally acquired a virtual embryo forD. virilis. Even though these species are separatedby at least 40million years of evolution and haveclearly diverged cis-regulatory DNA sequences[e.g., (43–45)], we found only a few cases withclear expression divergence, which highlightsstrong selection pressure on maintaining gene ex-pression patterns at this early stage. It also sug-gests a large extent of gene regulatory plasticitywhere cis-regulatory sequences may diverge,

whereas the overall expression patterns remainlargely unchanged (44, 46).We uncovered a substantial amount of tran-

scriptional modulation of components of majorsignaling pathways. Local expression of ligandssets up signal sources, but the ability to respondto these signals appears to be heavily regulatedat the transcriptional level, even early in devel-opment, from patterned expression of specificreceptor molecules to modulators of signal trans-duction. One such case is Hippo signaling, whichhas not been described to play a role in earlyDrosophila development. Active Hippo signalinghas been connected to cell cycle delay and di-minished proliferation (39). Thus, the predictionof expression of major Hippo pathway compo-nents in an anterior subdomain (Fig. 6A) wasof interest. Indeed, we detected evidence of pro-ductive Hippo signaling by showing that thetranscriptional effector Yorkie is diminished

in anterior nuclei that do notundergo mitosis (Fig. 6B). Morethan 30 years ago, Hartensteinand Campos-Ortega employedfuchsin staining to show thatmitotic reentry after stage 6 oc-curs asynchronously (40). Ourdata show that localized Hipposignaling constitutes a mecha-nism that breaks synchronicityof cell cycle reentry in early flyembryogenesis.In general, how many guide

in situs are needed to reconstructtissues after dissociation andsingle-cell sequencing? The an-swer depends, apart from se-quencing depth, clearly on thetranscriptional complexity anddevelopmental stage of the tis-sue. In early metazoan devel-opment, most decisions aboutspatial identity are carried out bya temporal cascade of combina-tions of transcription factors (47).In our case, 84 in situs (mostlytranscription factors) sufficed touniquely and individually labelmost of the ~6000 cells. However,it may be possible to assemblecomplex tissues from sequencedcells without using in situmark-ers as guides, somewhat akin tosolving a puzzle. Clearly, we needa better understanding of the de-sign principles of gene regula-tion to achieve this or to test ideasabout these principles. For exam-ple, in early development, the ex-pression of most genes generallydoesnot change inadiscontinuousfashion from cell to cell. This fea-ture could be implemented in fu-ture versions ofDistMap to reducethe number of guide expressionpatterns needed.

REFERENCES AND NOTES

1. S. K. Bowman et al., eLife 3, e02833 (2014).2. L. Christiaen et al., Science 320, 1349–1352 (2008).3. N. Soshnikova, D. Duboule, Science 324, 1320–1323

(2009).4. A. Stathopoulos, M. Van Drenth, A. Erives, M. Markstein,

M. Levine, Cell 111, 687–701 (2002).5. L. Cherbas et al., Genome Res. 21, 301–314 (2011).6. T. S. Mikkelsen et al., Nature 448, 553–560 (2007).7. N. C. Riddle et al., Genome Res. 21, 147–163 (2011).8. M. Stoeckius et al., EMBO J. 33, 1751–1766 (2014).9. M. Stoeckius, D. Grün, N. Rajewsky, EMBO J. 33, 1740–1750

(2014).10. T. Schauer et al., Cell Reports 5, 271–282 (2013).11. S. Bonn et al., Nat. Protoc. 7, 978–994 (2012).12. A. Handley, T. Schauer, A. G. Ladurner, C. E. Margulies,

Mol. Cell 58, 621–631 (2015).13. V. M. Weake et al., Genes Dev. 25, 1499–1509 (2011).14. F. A. Steiner, P. B. Talbert, S. Kasinathan, R. B. Deal,

S. Henikoff, Genome Res. 22, 766–777 (2012).15. D. A. Jaitin et al., Science 343, 776–779 (2014).16. K. Shekhar et al., Cell 166, 1308–1323.e30 (2016).17. D. Grün et al., Nature 525, 251–255 (2015).18. A. M. Klein et al., Cell 161, 1187–1201 (2015).

Karaiskos et al., Science 358, 194–199 (2017) 13 October 2017 5 of 6

B

Fig. 6. Spatial regulation of Hippo signaling in the embryo. (A) vISHspredict patterned expression of Hippo signaling components in stage 6 embryos.Shown are components involved in receiving the signal (receptors and ligands),transducing it through the cytoplasm (transducers) and inhibition of thetranscriptional cofactor Yorkie (Yki). Ubiquitous pathway components areindicated; vISHs for patterned components are shown. Most patterned positiveHippo components are expressed in anterior regions. Active signaling culminates innuclear exclusion of Yki. (B) Shown is the anterior of a stage 7 embryo; cephalicfurrow is indicated by stippled white line, anterior left. Staining with antibodiesagainst phosphorylated histone H3 (H3S10-P) marks cells undergoing mitosis;nuclear Yki is depleted in cells not marked by H3S10-P.

RESEARCH | RESEARCH ARTICLEon M

arch 19, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 6: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

19. E. Z. Macosko et al., Cell 161, 1202–1214 (2015).20. K. Achim et al., Nat. Biotechnol. 33, 503–509 (2015).21. R. Satija, J. A. Farrell, D. Gennert, A. F. Schier, A. Regev,

Nat. Biotechnol. 33, 495–502 (2015).22. M. Levine, E. H. Davidson, Proc. Natl. Acad. Sci. U.S.A. 102,

4936–4942 (2005).23. B. R. Graveley et al., Nature 471, 473–479 (2011).24. P. A. Combs, M. B. Eisen, PLOS ONE 8, e71820 (2013).25. E. Lécuyer et al., Cell 131, 174–187 (2007).26. P. Tomancak et al., Genome Biol. 8, R145 (2007).27. J. Alles et al., BMC Biol. 15, 44 (2017).28. C. C. Fowlkes et al., Cell 133, 364–374 (2008).29. J. E. Sandler, A. Stathopoulos, Genetics 202, 1575–1584

(2016).30. E. M. Underwood, J. H. Caulton, C. D. Allis, A. P. Mahowald,

Dev. Biol. 77, 303–314 (1980).31. M. G. Riparbelli, G. Callaini, Mech. Dev. 120, 441–454 (2003).32. L. Van der Maaten, G. Hinton, J. Mach. Learn. Res. 9,

2579–2605 (2008).33. M. Markstein et al., Development 131, 2387–2394 (2004).34. G. M. Technau, J. A. Campos-Ortega, Rouxs Arch. Dev. Biol.

194, 196–212 (1985).35. V. Hartenstein, Atlas of Drosophila Development (Cold Spring

Harbor Laboratory Press, 1993), vol. 1.36. L. S. Gramates et al., Nucleic Acids Res. 45, D663–D671 (2017).37. L. A. Goff, J. L. Rinn, Genome Res. 25, 1456–1465 (2015).38. H. Oh, K. D. Irvine, Development 135, 1081–1088 (2008).

39. J. Huang, S. Wu, J. Barrera, K. Matthews, D. Pan, Cell 122,421–434 (2005).

40. V. Hartenstein, J. A. Campos-Ortega, Rouxs Arch. Dev. Biol.194, 181–195 (1985).

41. M. Treier, C. Pfeifle, D. Tautz, EMBO J. 8, 1517–1525(1989).

42. R. P. Zinzen, J. Cande, M. Ronshaugen, D. Papatsenko,M. Levine, Dev. Cell 11, 895–902 (2006).

43. E. Emberly, N. Rajewsky, E. D. Siggia, BMC Bioinformatics 4, 57(2003).

44. M. Z. Ludwig, C. Bergman, N. H. Patel, M. Kreitman, Nature403, 564–567 (2000).

45. P. Simpson, S. Ayyar, Adv. Genet. 61, 67–106 (2008).46. S. B. Carroll, Endless Forms Most Beautiful: The New Science of

Evo Devo (W. W. Norton, 2005).47. E. Davidson, The Regulatory Genome. Gene Regulatory

Networks In Development And Evolution (Academic Press,ed. 1, 2006).

ACKNOWLEDGMENTS

We thank M. Biggin and S. Keranen (Lawrence Berkeley NationalLaboratory) for discussions and unpublished BDTNP data, R. Satijafor initial Drop-seq help, S. Ugowski (MDC) for experimentalassistance, S. Small (New York University) for a transgenicDrosophila line, A. Stathopoulos (California Institute of Technology,NIH R35GM118146) for sharing unpublished results, J. Zeitlinger(Stowers) for Yorkie antibody, E. Laufer (Columbia) for pMadantibody, N. Friedman (Hebrew University) and members of the

Rajewsky and Zinzen laboratories for constructive discussions, thereviewers for valuable comments and suggestions, D. Munteanu(BIMSB/MDC) for information technology support, the DZHK (projectnumber BER 1.2 VD), and the Deutsche Forschungsgemeinschaft(SPP 1738, RA 838/8-1, and RA 838/5-1) for funding. Raw andprocessed data sets are available from the Gene Expression Omnibusrepository (GSE95025). The DistMap R-package is available athttps://github.com/rajewsky-lab/distmap. N.R. and R.P.Z. definedstrategy, supervised, and procured funding; N.K., P.W., J.A., C.Ko.,N.R., and R.P.Z. designed experimental strategy; and P.W. did flygenetics and embryo collections. J.A. set up, C.Ko. supervised, andJ.A., S.A., A.B., and C.Ko. performed Drop-seq. A.B., S.A., and P.W.prepared sequencing libraries; N.K. developed and implementedcomputational analyses/tools, including DistMap and DVEX;P.W. and C.Ki. validated predictions experimentally; and N.K.,P.W., C.Ko., N.R., and R.P.Z. analyzed data and wrote the manuscript.

SUPPLEMENTARY MATERIALS

www.sciencemag.org/content/358/6360/194/suppl/DC1Materials and MethodsSupplementary TextFigs. S1 to S8Tables S1 to S8References (48–59)

29 March 2017; accepted 24 August 2017Published online 31 August 201710.1126/science.aan3235

Karaiskos et al., Science 358, 194–199 (2017) 13 October 2017 6 of 6

RESEARCH | RESEARCH ARTICLEon M

arch 19, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 7: SINGLE-CELL ANALYSIS The Drosophila embryo at single-cell ... · embryo, and expression patterns are computed by combining the mapping probabilities with the expression levels (virtual

embryo at single-cell transcriptome resolutionDrosophilaThe

Nikolaus Rajewsky and Robert P. ZinzenNikos Karaiskos, Philipp Wahle, Jonathan Alles, Anastasiya Boltengagen, Salah Ayoub, Claudia Kipar, Christine Kocks,

originally published online August 31, 2017DOI: 10.1126/science.aan3235 (6360), 194-199.358Science 

, this issue p. 194; see also p. 172Sciencemodulation of signaling pathways.

from local expression of regulators such as transcription factors and long noncoding RNAs to spatial−−mechanisms for most cells (see the Perspective by Stadler and Eisen). The virtual embryo offers insights into developmental

an interactive three-dimensional (3D) ''virtual embryo,'' with the expression of more than 8000 genes per cell measured developed an algorithm to generate et al.consists of about 6000 cells with distinct gene expression profiles. Karaiskos

single-cell sequencing reveals cellular heterogeneity and rare cell types. At the onset of gastrulation, the fly embryo When looking at populations of cells, features such as cell heterogeneity and localization are masked. However,

3D gene expression blueprint of the fly

ARTICLE TOOLS http://science.sciencemag.org/content/358/6360/194

MATERIALSSUPPLEMENTARY http://science.sciencemag.org/content/suppl/2017/08/30/science.aan3235.DC1

CONTENTRELATED http://science.sciencemag.org/content/sci/358/6360/172.full

REFERENCES

http://science.sciencemag.org/content/358/6360/194#BIBLThis article cites 56 articles, 17 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience

Science. No claim to original U.S. Government WorksCopyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of

on March 19, 2020

http://science.sciencem

ag.org/D

ownloaded from