in situ sequencing - diva portalsu.diva-portal.org/smash/get/diva2:768864/fulltext01.pdf · m. in...

In situ Sequencing Methods for spatially-resolved transcriptome analysis

Marco Mignardi

In situ Sequencing Methods for spatially-resolved transcriptome analysis

Marco Mignardi

Abstract It is well known that cells in tissues display a large heterogeneity in gene

expression due to differences in cell lineage origin and variation in the local

environment at different sites in the tissue, a heterogeneity that is difficult to

study by analyzing bulk RNA extracts from tissue. Recently, genome-wide

transcriptome analysis technologies have enabled the analysis of this variation

with single-cell resolution. In order to link the heterogeneity observed at

molecular level with the morphological context of tissues, new methods are

needed which achieve an additional level of information, such as spatial

resolution.

In this thesis I describe the development and application of padlock probes and

rolling circle amplification (RCA) as molecular tools for spatially-resolved

transcriptome analysis. Padlock probes allow in situ detection of individual

mRNA molecules with single nucleotide resolution, visualizing the molecular

information directly in the cell and tissue context. Detection of clinically

relevant point mutations in tumor samples is achieved by using padlock probes

in situ, allowing visualization of intra-tumor heterogeneity. To resolve more

complex gene expression patterns, we developed in situ sequencing of RCA

products combining padlock probes and next-generation sequencing methods.

We demonstrated the use of this new method by, for the first time, sequencing

short stretches of transcript molecules directly in cells and tissue. By using in

situ sequencing as read-out for multiplexed padlock probe assays, we measured

the expression of tens of genes in hundreds of thousands of cells, including point

mutations, fusions transcripts and gene expression level.

These molecular tools can complement genome-wide transcriptome analyses

adding spatial resolution to the molecular information. This level of resolution is

important for the understanding of many biological processes and potentially

relevant for the clinical management of cancer patients.

Keywords

Padlock probes; rolling circle amplification; in situ sequencing; spatially-resolved transcriptomics;

molecular diagnostics.

To my beloved family,

for every minute spent away from them.

“It's not that I'm so smart, it's just that I stay

with problems longer.”

Albert Einstein

Read more at

http://www.brainyquote.com/quotes/quotes/a/alb

erteins106192.html#cczl3Tfa4iYUq3kL.99

List of Publications

This thesis is based on the following papers:

Grundberg I, Kiflemariam S, Mignardi M, Imgenberg-Kreuz J, Edlund K,

Micke P, Sundström M, Sjöblom T, Botling J, Nilsson M. In situ mutation

detection and visualization of intratumor heterogeneity for cancer research and

diagnostics. Oncotarget 4, 2407-2418 (2013).

Ke R*, Mignardi M*, Pacureanu A, Svedlund J, Botling J, Wählby C, Nilsson

M. In situ sequencing for RNA analysis in preserved tissue and cells. Nat

Methods. 10, 857-860 (2013).

Kiflemariam S*, Mignardi M*, Ali MA, Bergh A, Nilsson M, Sjöblom T. In

situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic point

mutations and gene expression levels in prostate cancers. J Pathol. 234, 253-61

(2014).

Mignardi M*, Mezger A*, Larsson C and Nilsson M. Oligonucleotide gap-fill

ligation for mutation detection and sequencing in situ. Submitted manuscript.

* These authors contributed equally.

Reprints were made with permission from the respective publishers.

Related work by the author

Mignardi M, Nilsson M. Fourth-generation sequencing in the cell and the

clinic. Genome Med. 6, 31 (2014).

Weibrecht I*, Lundin E*, Kiflemariam S, Mignardi M, Grundberg I, Larsson

C, Koos B, Nilsson M*, Söderberg O*. In situ detection of individual mRNA

molecules and protein complexes or post-translational modifications using

padlock probes combined with the in situ proximity ligation assay. Nat Protoc.

8, 355-372 (2013).

* These authors contributed equally.

Contents

Introduction .................................................................................... 9

Cellular heterogeneity: origins and implications .................................. 10

Classic methods for the analysis of gene expression ............................ 12

Resolution, multiplexibility and throughput of transcriptome analysis .. 12

In vitro methods .......................................................................... 14

In situ methods ........................................................................... 17

Spatially-resolved transcriptome analysis ........................................... 21

In situ sequencing ........................................................................ 23

Application of in situ sequencing in tumor biology ............................... 25

In situ sequencing for research ...................................................... 25

Padlock probe assay in the clinic .................................................... 28

Present investigations ..................................................................... 30

In situ mutation detection and visualization of intratumor

heterogeneity for cancer research and diagnostics. ........................... 30

In situ sequencing for RNA analysis in preserved tissue and cells ........ 31

In situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic

point mutations and gene expression levels in prostate cancers. ......... 32

Oligonucleotide gap-fill ligation for mutation detection and sequencing

in situ. ....................................................................................... 33

Concluding remarks and future perspectives ...................................... 35

Populärvetenskaplig sammanfattning på svenska ................................ 37

Acknowledgments .......................................................................... 39

References .................................................................................... 42

Abbreviations

ACTB actin beta

AMACR alpha-methylacyl CoA racemase

AR androgen receptor

bDNA branched DNA

cDNA complementary DNA

CNV copy number variation

CRC colorectal cancer

CTC circulating tumor cell

DNAnb DNA nano-ball

dsDNA double-stranded DNA

EGFR epidermal growth factor receptor

ER estrogen receptor

ERG ETS-related gene

FACS fluorescence activated cell sorting

FFPE formalin-fixed paraffin embedded

FISH fluorescent in situ hybridization

FISSEQ Fluorescent In Situ SEQuencing

H&E hematoxylin and eosin

HCR hybridization chain reaction

HER2 human epidermal growth factor receptor 2

ISH in situ hybridization

KRAS kirsten rat sarcoma viral oncogene homolog

LCM laser-capture microdissection

LNA locked nucleic acid

LOH loss-of-heterozygosity

MELAS mitochondrial encephalomyopathy, lactic acidosis

and stroke-like episodes syndrome

mtDNA mitochondrial DNA

NGS next-generation sequencing

nt nucleotide

OLA oligonucleotide ligation assay

PCa prostate cancer

PNA peptide nucleic acid

qPCR quantitative PCR

RCA rolling circle amplification

RCP rolling circle product

RNA-seq RNA sequencing

RT-PCR reverse-transcription qPCR

SBH sequencing-by-hybridization

SBL sequencing-by-ligation

smFISH single-molecule FISH

SNP single nucleotide polymorphism

TMA tissue microarray

TP53 tumor protein 53

VIM vimentin

9

Introduction

The understanding of cellular processes at the molecular level in physiological

and pathological conditions is necessary for the future development of sensitive

diagnostics tools and, ultimately, of personalized treatments of diseases. Since

the first methods were described in 1977, nucleic acids sequencing technologies

have improved at an extraordinary pace. Thanks to the last generations of

sequencing methods, collectively called next-generation sequencing (NGS)

technologies, the full map of the genome and the transcriptome can be obtained

at an unprecedented speed and resolution. Many years after the introduction of

these methods, we are now attempting an even more accurate description of

complex biological processes that takes into account the dynamic expression of

the cellular genome through space and time. This next step will require a further

level of resolution.

My doctoral work focused on the development of the first method for

sequencing nucleic acids directly in cells and tissues. The peculiarity of in situ

sequencing is that it allows the direct observation and measurement of the

fundamental unit of the human transcriptome, a single nucleotide, at the

resolution of single molecules in individual cells. This new molecular method

has the potential to become a fundamental tool for tissue analysis providing

spatial context to the molecular information.

In this thesis, I will briefly describe the known biological causes and

consequences of the heterogeneity observed in human tissues and the most

common methods to study this heterogeneity at the transcriptome level. These

different methods have different magnitude of analysis and I will review their

performance in terms of throughput, cellular and molecular resolution and

spatial information. I will then describe in situ sequencing, and how it differs

from the described methods and how it can complement other molecular

methods for the analysis of tissues. Finally, I will highlight some of the

applications that could benefit from the use of the molecular tools described in

this work, both in basic research and in molecular diagnostics.

10

Cellular heterogeneity: Origins

and implications

The instructions necessary to build human cells and to execute their

physiological functions are enclosed in their genome, 23 pairs of

deoxyribonucleic acid (DNA) called chromosomes. These instructions are used

to synthetize hundreds of thousands of ribonucleic acid (RNA) molecules of

different size and functions which overall, constitute the cellular transcriptome.

Some of these RNA molecules are translated into proteins, the building blocks

of the cell and of the human body. A “mistake” along this flow of information

can affect the structure or the function of the cell, sometimes with dramatic

effects for the whole organism. A collection of cells forms a tissue, which, in

turn, is organized in organs that execute and coordinate the physiological

functions. Cells can be grouped in different “types” according to their

phenotype. The definition of cell type is much in debate1, however it implies that

large variation exists among cells of different tissues and within the same tissue.

In fact, despite sharing the same genome, cells of the human body appear to be

phenotypically extremely heterogeneous. So, how does this diversity arise?

Two sources of variation mainly contribute to generate such cell diversity:

genetic and non-genetic2. An evident manifestation of cellular heterogeneity

arising from genotypic variation among a cell population is in diseased tissues3–

5, particularly in tumor tissues in which genetic heterogeneity is promoted by the

high genome instability of cancer cells6–8

. However, new compelling evidence

indicates that genotypic variation is a widespread event among normal tissues9–

13. Non-genetic heterogeneity is the result of a differential expression of the

genome among a clonal population of cells, due to, for example, epigenetic

regulation14–16

, the cell cycle17,18

and the stochastic nature of gene expression19–

21. This wealth of diversity is reached through a complex scenario of interactions

between genome, transcriptome, proteome, metabolome and the environment

surrounding the cells22

and results in a differential expression of the cellular

genome in space and time23,24

.

Technologies with the necessary resolution to analyze heterogeneity in gene

expression are fundamental tools to understand biological processes such as

development25

, cell fate and differentiation26

, immunity27

and the maintenance

and differentiation of stem cells28

. Transcriptome heterogeneity is of particular

interest in tumor biology. Genotypic and phenotypic variation between tumors

(intertumor heterogeneity) and within the same tumor (intratumor heterogeneity)

is observed across all types of cancer29,30

. The capacity of tumor cells to

differentiate and proliferate through clonal expansion is what currently limits

cancer treatment31–34

. Investigating tumor heterogeneity in greater detail is

11

shedding light on the molecular mechanisms underlying tumor genesis,

progression and metastasis35,36

. This holds great potential for the identification

of novel therapeutic targets and of biomarkers for improved cancer

diagnostics37–40

.

The accurate description of tissue heterogeneity is not a trivial task. It requires

the adaption and tweaking of molecular technologies originally developed for

bulk analysis to achieve higher resolution as well as the introduction of new

methods. Some of these methods and their recent use for spatial transcriptomics

analysis will be discussed in the following chapters.

12

Classic methods for the analysis

of gene expression

Resolution, multiplexibility and throughput of transcriptome analysis

The ability to accurately describe and “measure” heterogeneity is first of all a

technological challenge. For instance, when comparing different tissues an

average obtained by a bulk measurement may represent the difference between

samples but it will fail to detect differences within the same tissue41

. Also,

differences within the same tissue may be localized, constituting regional

variations or be distributed randomly across the tissue. Regional distributions

may reflect important characteristics of the tissue42

, but not all the methods can

characterize such spatial information (Figure 1).

Figure 1. Analysis of tissue heterogeneity. The four samples exemplify four tissue sections composed

of three distinct cell types, represented by different colors, in different proportions. An average

measurement can distinguish sample 1 from the others, but samples 2, 3 and 4 will give the same

result in a bulk analysis although they differ in the tissue organization. To detect and identify this

regional variation, highlighted by the dashed lines, the method must yield spatial resolution.

Nowadays, there are many methods used for transcriptome analysis, all based on

nucleic acid hybridization and enzymatic reactions (i.e. polymerization or

ligation) or most often a combination of the two. There is great difference in the

depth and type of information that each method can provide. Beside their

technical performances (such as analytical sensitivity, specificity, limit of

detection or dynamic range as defined in Jenning et al.43

), there are other

parameters that become of critical importance when analyzing tissue samples.

These parameters are described below.

13

The analytical resolution defines the smallest unit that can be discriminated by

a specific method and can be used in relation to the sample or to the molecular

target. Cellular resolution refers to the sample and therefore subcellular

compartments are the smallest units. Molecular resolution refers to the

molecular target where the smallest unit is a single nucleotide of an individual

RNA molecule. Maximum resolution is achieved by methods that are able to

discriminate single nucleotides with subcellular localization. Some methods are

suitable for detection of transcript abundance from tissue lysates and therefore

the cellular resolution is minimal. Of note, resolution, as intended here, is

distinct from the limit of detection (LOD) which is the lowest amount of analyte

that is statistically distinguishable from background or a negative control43

.

Of consideration is the throughput of the assay, here referred to as the capacity

of a system to analyze a large number of samples, or rather the rate of samples

processed. Translated to tissue transcriptome analysis, throughput can refer to

the number of cells that are effectively analyzed from a single sample. Small

organs or samples of tissue removed from organs are sliced in thin tissue

sections, containing 104-10

6 cells or more. Even though several techniques exist

to extract single cells from tissue samples18,44–49

and to analyze the RNA content

from single cells50

, capture and analysis of many cells from a single section can

be technically challenging (especially if manual manipulation is required),

limited by the cost of the downstream molecular tests, or it can be achieved at

the loss of information (e.g. the localization in the case of bulk analysis, as

described below).

Multiplexibility refers to the amplitude of transcriptome that can be targeted by

a method. Some methods are limited to the analysis of a few transcripts at a

time. For example, fluorescence-based in situ methods are limited by the

resolvable fluorescent spectra. In general, sequencing, meant as an iterative

process of measurement, is a successful way to increase the multiplexibility of

RNA analysis to include the whole transcriptome.

Gene expression is a dynamic process in the cell, subjected to variations in the

four dimensions (three spatial and one temporal) across the tissues. The ability

of a method to capture spatial and temporal resolution is crucial to dissect

tissue heterogeneity. Spatial resolution differs from cellular resolution because it

does not refer to the section of tissue from which information is derived, but

rather to the ability to relate many sections to each other (Figure 1). For

instance, single cell resolution can be achieved but information of which cells

are connected to each other or grouped in the same region may not be known. In

situ techniques intrinsically have spatial resolution and its depth, down to

subcellular compartments, is dictated by the size and density of the signal

generated. In vitro assays maintain spatial resolution by extracting single cells

14

from tissue sections and recording (typically by imaging) the position of the

cells in the section before extraction. The throughput of these methods, however,

is typically low unless enzymatic dissociation is used, in which case spatial

resolution is lost. Time-resolved measurements are perhaps the most difficult to

achieve in tissue analysis. Time-resolved methods usually rely upon insertion of

genetically-engineered vectors for expression of fluorescence-tagged molecules

that can be detected by high-resolution imaging. Because these assays require

living samples, they are typical used for analysis of cell cultures51,52

.

Alternatively, RNA measurements can be performed at different time-points,

although this implies the analysis of different samples53

.

The ideal assay should be able to analyze the whole transcriptome with single

nucleotide resolution of every cell composing a tissue and at the same time

preserve the spatial information and relationship of all the cells analyzed. At

present, such an idealized assay does not exist and therefore a combination of

methods is necessary to explore biological questions. The most common

methods to study gene expression are described below and are for simplicity

grouped into two types: in vitro methods, where nucleic acids are extracted from

the cells and the analysis is performed typically in a test tube or a fluidic system;

or in situ methods, where the analysis is performed directly on preserved cells

and tissue samples.

In vitro methods

The gold standard for RNA analysis is reverse-transcription qPCR (RT-PCR), a

technique based on kinetic PCR, better known as quantitative PCR (qPCR) or

real-time PCR (the nomenclature used in this thesis follow the MIQE guidelines

proposed by Bustin et al.54

). qPCR is an adaption of the polymerase chain

reaction (PCR)55

, in which the accumulation of double-stranded DNA (dsDNA)

is monitored in real time during the course of the reaction56

. Fluorescent

reporters are used to intercalate or hybridize to the newly formed DNA products

and thus the increase in fluorescent intensity over time relates to the amount of

starting target material in the sample57

. In RT-PCR, RNA molecules are first

converted to cDNA by reverse transcription and subsequently amplified and

measured by qPCR58,59

. The main advantages of this method are the high

specificity and analytical sensitivity down to a single molecule of target RNA59

,

which makes it a suitable method to work with minimum amounts of starting

material, like a single cell58

. RT-PCR is also characterized by a large dynamic

range spanning several orders of magnitude. Relative quantification can be

performed comparing RNA extracted from different samples while absolute

quantification can be achieved with the use of internal controls to generate a

standard curve60

. Because based on fluorescent read-out and given the technical

limitations in multiplex PCR, RT-PCR has been limited to the analysis of few

15

targets at a time. However, recent micro- and nanofluidic systems for high

throughput qPCR analysis allow measurement of 100s of transcripts in 100s or

1000s of single cells47,61

. Furthermore, the development of digital PCR and the

implementation of real-time PCR measurement in picoliter droplets have the

potential to increase precision, sensitivity and reproducibility of this method

even further62,63

. Despite the technical feasibility of performing RT-PCR on

single cells and the availability of high throughput systems able to analyze

hundreds of cells at a time, it is not possible to identify the original position of

the individual cells on the tissue of origin. Therefore, spatial resolution is what

limits this assay when studying tissue heterogeneity. However, subpopulations

of cells from the same tissue can be reconstructed based on statistical clustering

algorithms that associate cells with similar expression patterns64

.

Before the development of high throughput gene expression analysis platforms,

global transcriptome analysis from tissue extracts was enabled by the microarray

technology. The first microarray technology was developed already in 1995 by

depositing cDNA libraries on 96-well microtiter plates followed by

hybridization of fluorescent-labelled specific reverse-transcribed mRNA from

Arabidopsis thaliana65

. The underlying mechanism has not substantially

changed since then, still relying on immobilization of template oligonucleotides

on a solid support and hybridization of target RNA or DNA molecules combined

with fluorescent read-out. Instead, sample preparation protocols have greatly

improved making this technology suitable for single cell analysis66–68

. Also,

array-based methods have expanded to the analysis of single nucleotide

polymorphisms (SNPs) and copy number variations (CNVs)69,70

. Compared to

RT-PCR thus, microarray technologies are highly multiplex assays able to

achieve single nucleotide and single cell resolution, although generally lacking

spatial information. Less sensitive and not as accurate in absolute quantification

of low abundant transcripts as qPCR-based assays, the main technical drawback

is represented by low specificity of probe hybridization (background level of

hybridization and cross-reactivity), which limits the dynamic range71

. Also,

array design requires prior knowledge of the target regions, making the

identification of splice variants or de novo transcriptome analysis

challenging72,73

. These limitations have been partially solved by high throughput

RNA sequencing technologies, which are now becoming the method of choice

for transcriptome analysis even though microarrays are still a cheaper and

reliable alternative in many applications73

.

The foundations for high throughput sequencing of DNA were laid already at

the end of the ’80s, and comprised carrying out sequencing outside gel capillary

systems and picturing DNA not as a string of nucleotides but as a sequence of

overlapping oligomers74,75

. The first method that followed this logic was the

sequencing-by-hybridization (SBH) method, described in 199376

. It took about a

decade, however, before the first commercial systems for high throughput

16

analysis of DNA were launched. NGS technologies differ substantially in the

sequencing chemistry used and in technical performances. They are

comprehensively reviewed in Metzker, 201077

. NGS technologies can be used to

sequence the transcriptome or parts of it, a method called RNA sequencing

(RNA-seq)78,79

. Except for the most recent sequencers, which are able to directly

read single transcripts, the RNA is extracted from the sample, cDNA is

synthetized using random primers or oligo(dT) primers (to capture only the pool

of poly(A) transcripts), fragmented and ligated to special adaptors before being

sequenced in one of the NGS platforms80

. Sequencing RNA molecules offers

several advantages over array-based methods. First of all, it is an untargeted

approach meaning that prior knowledge of the sequencing targets is not

necessary. Second, despite the general concordance between RNA-seq and

microarray experiments for transcriptome profiling, the former is superior in

sensitivity and has a broader dynamic range because it is less sensitive to

background fluorescence73,81

. Third, it intrinsically has single nucleotide

resolution and in most recent technologies even single molecule resolution can

be achieved82,83

. Therefore, most of the sequence variation at RNA level can be

detected by this method. Protocols for sample preparation and cDNA synthesis

have been adapted to work with just a few picograms of starting material, the

mRNA content of a single cell50,84

or even a fraction of it85

. Recent technical

improvements have enabled robust sequencing of full-length mRNA

molecules86–88

, the use of molecular barcodes to simultaneously sequence many

individual cells maintaining cellular resolution89,90

, and molecular tagging for

precise RNA quantification and tracking of transcription polarity91–93

.

One of the most widely used methods to isolate single cells after enzymatic

digestion of tissues, is fluorescence activated cell sorting (FACS). Described in

1969 for the first time94

, it is now a fundamental tool for single cell analysis.

FACS instruments are multiparametric flow cytometers equipped with lasers for

detection of fluorescent markers on the cells. Cells contained in droplets are

individually read by the lasers, classified and counted based on fluorescence

intensity and morphological parameters and sorted based on a defined gating

strategy in plates or tubes. Although mostly used for expression analysis of

surface markers labelled with fluorescent-antibodies, RNA analysis by FACS

and flow cytometry has also been demonstrated95,96

. The great advantages of

FACS analysis are the high flow rate, which allows counting of thousands of

cells per second, and the simplicity of the protocols. On the other hand the

fluorescent-based read-out limits the multiplexibility of the method to few

markers per cell (up to 12 in some cases) and the sensitivity is quite low

compared to other techniques described.

17

In situ methods

Spatial resolution is an intrinsic property of in situ techniques because the

molecular reactions are performed directly on the tissue section and the spatial

information is readily visible. The first molecular staining was published in 1969

by Gall and Pardue97

who for the first time performed in situ hybridization (ISH)

of nucleic acids using radiolabelled probes. Since then, radioactive read-out has

been replaced with fluorescent or chromogenic substrates and a variety of

different molecular techniques have been developed to allow detection,

visualization and counting of RNA in situ. Current in situ methods generally

share three steps: fixation of the nucleic acids and permeabilization of the

sample; detection of the target RNA and signal amplification to achieve the

highest signal-to-noise ratio; and imaging and analysis of the stained sample.

Based on the modality of target recognition, the methods can be divided into two

main approaches: hybridization-based and amplification-based systems.

Fluorescent in situ hybridization (FISH) is the fluorescence version of the more

general ISH technique and is certainly the most well known and established of

all the in situ assays. The first version of this technique, described in 198098

made use of 3’-end fluorescently-labelled RNA probes to detect specific DNA

sequences in cells. mRNA detection came soon after by combining the use of

biotinylated oligonucleotides and fluorescent-antibodies for detection99

. The first

version of FISH has limited molecular resolution allowing relative

quantification but not absolute counting. Also, because of high fluorescent

background, the method can detect only highly abundant transcripts and

multiplexibility is limited by the number of resolvable fluorophores, typically 3

or 4. These limitations, except for the latter, have been solved by the second

generation of FISH probes, collectively called single-molecule FISH (smFISH).

The original kb-long probes are replaced by multi-labelled 50 bp-long probes

that hybridize along the target sequence, generating a diffraction-limited spot

allowing single molecule quantification in situ100

(Figure 2). Other

improvements in probe design and the use of low-noise charge-coupled devices

(CCD) cameras contribute to the high resolution of smFISH. Detection of five

genes at a time has been demonstrated by this technique in mammalian cells and

applied to whole tissue sections101,102

. However, the variation in fluorescence

intensity among detected transcripts affects the sensitivity of the method. To

solve this issue, a second more recent design comprises the use of approximately

50 single-labelled probes each only 20 nucleotide-long103

. The key advantages

of this more optimized protocol are the high sensitivity and improved specificity

which allows robust detection and quantification of transcripts, including

products of gene fusions, in a variety of tissue samples104–106

. Because of the

micrometer or even sub-micrometer size of the signals, smFISH inherently

reaches single molecule detection at cellular or subcellular resolution. Single

18

nucleotide resolution has been technically demonstrated, but application to

detection of clinically relevant mutations or analysis of tissues is currently not

feasible with these methods107,108

. Also the issue of the low multiplexibility has

been addressed by employing sequential hybridization or coupling combinatorial

fluorescent labelling with high resolution imaging techniques109–111

. High

multiplexibility can be achieved, but practical limitations and questions on the

scalability when using such methods for tissue analysis still remain.

The introduction of probes containing modified nucleotides confers higher

efficiency and specificity to hybridization-based assays, enabling detection of

short RNA targets. Two technologies, locked nucleic acids (LNA) and peptide

nucleic acids (PNA), are now broadly used, particularly for detection of miRNA

or in combination with other classic methods112–114

.

Amplification of the starting target material is a common strategy in nearly all

molecular methods and for in situ analysis as well. Exponential amplification in

situ of DNA and cDNA have been demonstrated previously115,116

. However, this

approach resulted in lower specificity and diffusion of the amplified target117,118

.

An alternative approach is to amplify the signal after hybridization of the target

RNA molecule. The general idea behind it is to increase the local density of

fluorophores or reporter molecules to increase signal-to-noise ratio and

consequently the sensitivity of the read-out. Branched DNA (bDNA) technology

is one way to obtain signal amplification of hybridization-based methods in

which a number of self-complementary probes build a DNA “tree” upon

recognition of the target molecule, making available hundreds of target

sequences for reporter probes, the “branches”. Sensitivity of detection is greatly

enhanced and even specificity is increased by the requirement of more than one

hybridization event to occur119–121

. Multiplex read-out using bDNA has been

proposed but not yet shown. Another amplification method based on

hybridization is hybridization chain reaction (HCR) in which an initiator probe

hybridizes to the RNA target and triggers self-assembly of florescent RNA

hairpin reporters122

. With this method five transcripts can be visualized

simultaneously as demonstrated in zebrafish embryos, but HCR requires long

incubation times and does not achieve single molecule resolution.

A well exploited strategy for signal amplification, both in situ and in vitro, is

DNA linear amplification via rolling circle amplification (RCA) by ϕ29 DNA

polymerase123

. RCA consists of an isothermal amplification of a circular single-

stranded DNA molecule: when a primer with an available 3’-end hybridizes to

the circle, ϕ29 DNA polymerase can extend the primer making a long strand of

complementary DNA. A 100 nt-long circle is replicated several hundred times

per hour thanks to the high processivity of this polymerase (1x103 nt/min)

124.

Some of the characteristics of this mechanism make RCA an exquisite signal

19

amplification system, especially for in situ applications. First of all, the speed

and ease of the reaction, which does not depend upon temperature cycling.

Second, RCA is a localized amplification that produces a tether of DNA, which

spontaneously collapses into a micrometer-sized bundle of DNA (a rolling circle

product, RCP). If hybridized by fluorescent-labelled probes, a single RCP

appears as an intense spot-like signal easy to identify over the fluorescence

background of the tissue with a conventional epi-fluorescent microscope.

Finally, the RCA reaction generates a clonally-amplified substrate suitable for

sequencing by NGS technologies125

.

RCA is exploited by another in situ RNA analysis technology, the padlock

probe, on which all the work presented in this thesis is based. Described for the

first time by Nilsson et al. in 1994126

as a variant of the oligonucleotide ligation

assay (OLA)127

, a padlock probe is a single oligonucleotide whose end-

sequences are joined together by a DNA ligase upon hybridization on a

complementary target, resulting in a closed single-stranded DNA circle.

Important features of this mechanism are: i) the specificity given by the high

fidelity of DNA ligases in discriminating mismatches at the ligation site126,128

and, because of this, ii) the method achieves single nucleotide resolution; iii)

tolerance to washes because after ligation the padlock probe is topologically

locked to the target126

; and iv) the single molecule resolution, achieved by using

RCA to amplify and detect ligated padlock probes129

. Because of these

characteristics, padlock probes can be used for genotyping of mitochondrial

DNA (mtDNA) and single mRNA molecules in human cells and tissues130,131

.

For mRNA detection, cDNA is first synthesized by in situ reverse-transcription

of specific transcripts and then made single stranded by RNaseH treatment. A

padlock probe hybridizes to the cDNA target and is circularized by a DNA

ligase leaving a 3’-end overhang on the cDNA strand that is digested by ϕ29

DNA polymerase before priming RCA of the padlock probe (a mechanism

termed target-primed RCA)130

. The RCP is stained by hybridization of hundreds

of fluorescently-labelled oligonucleotides which generates a bright dot clearly

detectable under a fluorescent microscope. Mismatched padlock probes cannot

be circularized and amplified via RCA and therefore no signal is generated131

.

The method is able to detect and quantify 30% of actin beta (ACTB) transcripts

in single cells although sensitivity is lower in tissue sections or when multiple

padlock probes are combined131

in which case FISH, in comparison, proves to

be superior. Another disadvantage is the overall complexity due to the many

enzymatic steps involved and incubations at different temperatures which makes

it difficult to implement the protocol in the existing automated slide stainers.

Originally limited to four distinguishable probes per assay, multiplexibility has

been increased by sequencing RCPs in situ132

.

20

Figure 2. Single-molecule in situ techniques. In smFISH (I) multi-labelled probes (in gray and red)

hybridize to the target mRNA molecule (in black) generating a diffraction-limited spot visible under a

fluorescence microscope. In a newer version of the method, smFISH (II) multi-labelled reporter probes

are replaced by shorter single-labelled oligonucleotides. In the bDNA technique, oligonucleotides are

added to target a specific RNA region and to build a DNA “tree” (in blue) to which reporter probes

can hybridize. In the padlock probe assay, a series of enzymatic reactions are carried out to synthesize

cDNA from a specific mRNA. Padlock probes (in green) are hybridized and ligated and the circular

padlock probe is amplified by RCA. Fluorescent probes hybridize to the RCA product generating a

diffraction-limited spot. The images of the cell nuclei and fluorescent signals are taken, in order, from

ref. 102, 103, 121 and 131.

21

Spatially-resolved transcriptome

analysis

The most comprehensive description of gene expression heterogeneity in a

tissue sample requires the use of spatially-resolved transcriptomics methods in

which transcriptome analysis is complemented by spatial information. In other

words, gene expression analysis is pushed to include all four parameters

discussed in the previous chapter: analytical resolution, throughput,

multiplexibility and spatial resolution. At present, there is no single method

capable of acquiring the full depth of information in all parameters. Instead,

current approaches combine techniques in order to maximize the amount of

molecular information retrieved from whole tissues and organs (Figure 3).

Figure 3. Comparison of throughput and multiplexibility of the methods used for tissue transcriptome

analysis. The ideal method, in black, maximizes performances for both parameters, maintaining spatial

resolution. In situ-based techniques, in green, generally have a lower multiplexibility but can

simultaneously investigate a large number of cells or the entire tissue sample. Array- and sequencing-

based methods (respectively in blue and yellow) are limited in throughput but can analyze the whole

transcriptome. High-throughput RT-PCR, in red, is in between. The numbers in brackets indicate the

reference number to which each dot refers. The symbol on the in vitro methods indicates how the

single cells have been extracted. Explicitly, whether or not the method maintains spatial resolution as

described in the text. & = enzymatic dissociation; # = laser-capture microdissection; ¤ = mechanical

homogenization; £ = performed on cell line.

22

Spatially-resolved transcriptome analysis is not a new approach. A number of

large projects are on-going with the aim of fine molecular mapping of whole

organs with particular focus on the brain133

. This molecular characterization has

been long done by a combination of systematic sampling, called voxelation, and

gene expression analysis methods of lower cellular resolution than the most

recent technologies can offer134–140

. However, transcriptome-wide, single cell

and single nucleotide resolution will be required to investigate other

mechanisms like allele specific expression or RNA editing. With the constant

improvement of the protocols for sample preparation, it is plausible to envisage

that RNA-seq will be the method of choice for this type of analysis.

Spatial localization of single cell transcriptomes back in the organ of origin is

one of the few technical limitations left for tissue analysis by RNA-seq. RNA

tomography (Tomo-seq) partially circumvents this limitation by cryosectioning

18 µm thick sections of a whole zebrafish embryo and subjecting each one to

RNA-seq141

. Gene expression patterns are reconstructed along the three main

body axes but the method does not achieve cellular resolution. The improvement

of methods to isolate single cells from tissue sections could solve this problem:

hundreds to thousands of single cells can be isolated from normal tissues and

cell hierarchies can be reconstructed by different means of cluster analysis of the

transcriptome obtained from each cell by single-cell RNA-seq90,142,143

. In a very

recent work lead by Sten Linnarsson, high resolution topographical mapping of

genome-wide gene expression has been demonstrated for the first time. By using

laser-capture microdissection (LCM), a small area of a mouse brain section was

dissected in small “voxels”, 50x50x50 µm3 in size, and subjected to RNA-seq

144.

By this method, near single cellular resolution is obtained preserving spatial

information, but only a small region of the tissue was sampled. Extending this

method to the whole tissue, in tens of consecutive sections, is practically very

difficult and not cost-effective with the current technology. Once the sequencing

costs will decrease further, high resolution whole-transcriptome tissue analysis

will be limited only by the number of available reads per sequencing run. Future

sequencing technologies based on single molecule sequencing through

nanopores may be a solution to this issue145,146

. A new approach has been

proposed by Lundeberg and colleagues, in which a tissue section is deposited

onto a chip array containing high-density oligo(dT). The oligonucleotides are

organized in squares of about the size of a single cell and barcoded in order to

distinguish each individual square. After imaging the tissue, the cells are

permeabilized and the poly(A) fraction of the transcriptome is captured by the

barcoded oligonucleotides. After sequencing the captured transcripts, the spatial

location on the chip can be inferred by the barcodes and, because of limited

diffusion between cells, assigned to a specific cell of the tissue. This approach,

not yet published, has the potential to achieve genome-wide information

maintaining spatial and cellular resolution.

23

In situ sequencing

An alternative to the extraction of single cells from tissues is sequencing of

transcripts directly on tissue sections132,147

. The padlock probe assay is able to

sequence the minimum unit of a transcript, a single nucleotide. Expanding the

method to “read” a stretch of nucleotides requires two additional steps: first, a

longer target sequence needs to be clonally amplified via RCA and, second, a

sequencing chemistry able to sequence such material in situ. Solutions for both

of these steps already existed. The gap-fill reaction has been described several

times in literature as a method to capture and circularize DNA in solution-phase

assays129,148,149

. The same approach is used in situ although it shows lower

efficiency when performed directly in tissues. Sequencing of RCPs (also called

DNA nano-balls, DNAnb) by sequencing-by-ligation (SBL) chemistry has been

described by Drmanac et al. and is the sequencing technology used by the

private company Complete Genomics125

. DNA nanoballs technology is used to

sequence the entire human genome and comprises fragmentation and capture of

genomic DNA, clonal amplification via RCA and spotting of the RCPs on a

solid surface where the subsequent sequencing reactions and imaging are carried

out. In situ sequencing makes use of the same principles except for the fact that

the surface on which the RCPs are bound is a cell instead of an array (Figure 4).

The same unchain SBL chemistry can be used to sequence RCPs generated in

situ, with a minimal read length of six bases. However, there are three key

differences between the two sequencing methods: first, the size of the

sequencing substrate is <300 nm for DNAnb and about 1 µm for RCPs meaning

that, when imaging, more pixels are occupied by a single RCP; second,

Complete Genomics makes use of a proprietary technology (self-assembling

DNB Arrays) to prepare patterned arrays, spotting ultra-high density of single

DNAnb with 90% efficiency. Instead, because of the nature of the target, in situ

sequencing generates a sort of random arrays, which in fact limits the density of

the RCPs per area. For comparison, Complete Genomics’ first array reaches a

density of about 530.000 DNAnb/mm2 (1 billion DNAnb in 25x75 mm area)

while in situ sequencing estimates maximum 270.000 RCPs/mm2. The third

difference is the higher error rate of in situ sequencing (1.0% compared to

0.001% for Complete Genomics)125,132

. The read length of in situ sequencing, six

bases, is too short to discriminate among the whole set of genes but it offers the

possibility to read 4096 (46) barcodes. Gene-specific padlock probes can be

tagged with a molecular barcode increasing multiplexibility of the assay.

However, this is a targeted approach and given the current size of RCPs and the

imaging system used, only a density of about 100 RCPs per cell can be

achieved.

Some of these limitations have been addressed by a method described by George

Church and colleagues called Fluorescent In Situ SEQuencing (FISSEQ)147

. In

24

this approach, RCPs are generated in the cells after self-circularization of

randomly synthesized cDNA molecules. Sequencing is generated in an

untargeted fashion and sequenced by Sequencing by Oligonucleotide Ligation

and Detection (SOLiD) chemistry150

achieving 27 bases read length. This allows

mapping of most of the sequenced RCPs to a single transcript. The use of

confocal imaging and partition sequencing consents to obtain 4x higher read

density (about 400 RCPs/cell) although more than 50% mapping to rRNA.

Figure 4. In situ sequencing. a) In situ synthesis of cDNA is carried out using random or specific

primers followed by RNaseH treatment of the original mRNA molecule (b). Two different strategies

can be used. c) For sequencing of native mRNA molecules, a padlock gap probe is hybridized to the

cDNA and the gap region, in red, is filled by DNA polymerization. d) Alternatively, a multiplexed

padlock probe assay can be used instead, utilizing a classic padlock probe with a gene-specific

molecular barcode, in red, in the backbone. e) In situ RCA of the circularized padlock probes

generates a RCA product containing the clonally-amplified target region or the molecular barcode

which is subjected to several cycles of sequencing-by-ligation with an imaging step at every cycle (f)

and the images are analyzed for base-calling at every cycle (g). Five RCPs are represented in the

enlargements in (f) and the corresponding base-calling is showed in (g). Reprinted from ref. 132.

25

Application of in situ sequencing

in tumor biology Next generation sequencing technologies have been widely applied in tumor

biology for molecular characterization of many cancer tissues. At least two

observations seem to be widespread in nearly all cancer types. The first is the

presence of a large spectrum of different somatic mutations including single

nucleotide variations as well as structural variations151–154

. The second is that the

landscape of mutations differs among patients affected by the same tumor

(intertumor heterogeneity) and even within the same tumor tissue (intratumor

heterogeneity)34,153,155

. This heterogeneity is not confined to mutations but is also

observed at the level of expression of wild-type genes156

, a result of cellular

pathways being altered by the presence of mutations and conferring the

neoplastic phenotype to the tumor cells157

. These two observations have

extremely important implications for the study, diagnosis and treatment of

cancers.

Because cancer is a disease characterized by a high degree of heterogeneity in

which the spatial context and the distribution of this heterogeneity constitute

important information, in situ sequencing is a novel tool that can aid researchers

and clinicians in the molecular analysis of tumor samples and its application in

research and diagnostics will be discussed below.

In situ sequencing for research

Recent literature depicts tumor growth like a tree where a number of mutations

constitute a main dynamic population of neoplastic cells, the trunk, from which

a number of sub-populations branches out as they acquire new

mutations153,158,159

. This explains the origin of the observed heterogeneity and

may explain the failure of current pharmacological treatments. Multiregional

sampling and sequencing of different tumor areas is a successful approach to

study this occurrence6,160,161

. On a sample of similar size, the regional

distribution of selected biomarkers of interest can be visualized and quantified in

situ by the multiplex padlock probe assay with cellular resolution and with no

need for sub-sampling to enrich for tumor content, which carries the risk of

introducing biases by random selection of mutant clones (Figure 5). Beside the

neoplastic cells, the role of tumor microenvironment and of the stromal tissue

surrounding the tumor is increasingly a subject of research because of their

potential contribution to the tumor development162

. Different tissue components

(cancer associated fibroblasts, immune cells, the vasculature and others) can be

visualized in situ by detection of tissue-specific transcripts together with the

26

molecular composition of local tumor clones. Specific interactions could be

found by analyzing a large collection of samples which is technically feasible by

in situ sequencing. Another fascinating application is the ability to obtain

specific molecular expression or mutational profiles from circulating tumor cells

(CTCs), cell-free circulating DNA or clones from metastatic tumor sites and

trace them back to the primary tumor site to identify potential markers

conferring resistance to therapy or aggressive phenotype to the cancer163–165

.

These expression profiles may be confined to a small fraction of tumor cells

present in the primary site and may be missed by bulk RNA sequencing. In situ

sequencing data may instead be able to resolve and identify specific patterns

even if these are localized in one or more regions of the tissue sample.

In all these potential applications, padlock probe technology and in situ

sequencing are used to detect or profile known markers, identified before by

other methods. In a reverse approach instead, in situ sequencing can be used to

define sub-clonal populations of cells that, on consecutive tissue material, can be

selected by e.g. laser capture microdissection for molecular characterization at

deeper resolution.

27

Figure 5. Spatially-resolved gene expression analysis by in situ sequencing. Morphological staining of

a tissue sample is combined with molecular staining obtained with multiple gene-specific padlock

probes. Single or multiple markers can be used to define specific tissue regions, e.g. tumor and stroma

compartments. In addition, gene expression patterns of specific regions can be acquired and linked to

the morphology of the tissue.

28

Padlock probe assay in the clinic

Molecular biology is increasingly entering the clinical laboratory and molecular

pathology is setting the stage for the future of personalized clinical management

of cancer patients166

. The more molecular markers will be validated and enter

the clinic, the more there will be a need for methods that are able to analyze

samples at the desired level of resolution, including spatial resolution. These

methods should meet the technical standards for diagnostics and have the least

possible impact on the existing laboratories in terms of infrastructure and

management.

Bright-field microscopy still plays a central role in the diagnostic routine.

Visualizing the mutational status of the sample, as an extra level of information,

is convenient and presents several advantages over other methods. Padlock

probe technology achieves single nucleotide resolution and is the only in situ

technique able to robustly and specifically detect point mutations. Also, it can

quantify gene expression and over-expression of tumor markers167

providing

spatially-resolved molecular profiles directly overlaid to the tumor tissue

histology (Figure 5). Importantly, padlock probes can be applied to the most

commonly used tissue sample preparations like formalin-fixed paraffin

embedded (FFPE) or fresh-frozen tissue sections, tissue microarrays (TMAs)

and tumor touch imprints168

. Finally, the assay has a relatively short turnaround

time and, for mutation detection, requires relatively simple data interpretation.

Other in vitro assays have, in principle, similar characteristics. So, what is the

potential advantage of using an in situ analysis?

Mutational analysis of the kirsten rat sarcoma viral oncogene homolog (KRAS)

gene in colorectal cancer (CRC) patients is recommended before anti-epidermal

growth factor receptor (anti-EGFR) because treatment is ineffective in the

presence of mutant forms of this gene169,170

. The most widely used sample, in

CRC diagnostics, is FFPE tissue and a number of different assays are used for

the analysis of DNA extracted from the tissue. The sensitivity of these assays

has been evaluated between 1 and 20% of the tumor content (homozygous tumor

cells/wild-type)170

. The performance of these assays is highly affected by the

tumor content of the samples, which therefore has to be subjected to

histopathological examination and manually enrichment by microdissection in

case of low tumor content (usually when below 10% of tumor cells)169

. This

adds an extra step with associated additional risks for errors and costs. The

padlock probe assay has been shown to have comparable sensitivity, down to

1% mutant cells and high specificity although this has to be further evaluated in

a clinical trial168

. The advantage of such assay is that no extra sample is required

because histopathological and molecular assessment can be done on the same

sample. This also means that spatial information is preserved and can be directly

29

related to the tissue morphology identifying for example different clones.

Conversely to in vitro measurements, the relative expression of different tumor

regions can be quantified and in the future can provide potential clinical

information.

Recently, a number of multigene expression profile tests that score the risk of

tumor relapse and prognosis have entered the clinical practice166,171,172

. These

tests are bulk analyses performed on the RNA extracted from tumor samples. By

in situ sequencing, the padlock probe assay can be multiplexed to a desired

number of genes, including point mutations and other structural variations, to

visualize expression profiles directly on the tissue samples. A digitalized and

quantifiable measurement of the molecular state of the tissue should result in a

better classification and grading of the tumor, supporting or replacing the

traditional morphological examination. For instance, the Gleason score for

grading prostate cancer, a particularly heterogeneous type of cancer, is based on

the histological evaluation of a number, usually 12, of needle biopsies173

. Many

lesions are often present in different biopsies and with different levels of

differentiation (scored 1-5)174

. The grade is assigned based on the areas that

make up most of the cancer. Two limitations of this approach that have been

amply discussed are the subjective interpretation by the pathologist and the fact

that regions with higher grade (poorly differentiated) but under-represented are

not taken into account175

. Molecular staining of the sample by multiple padlock

probes can, in principle, identify and quantify more accurately the composition

of the tumor samples, returning an objective score based upon digital

information. This approach is not limited to prostate but can be applied to all

tumors for which an association between known markers and the cancer

phenotype exists. For instance, the molecular classification of breast cancers is

based on the expression level of estrogen receptor (ER), progesterone receptor

(PR) and human epidermal growth factor receptor 2 (HER2/neu or HER2) and

other markers (like the marker of proliferation Ki67, cytokeratin CK5/6 and

epidermal growth factor receptor EGFR)166,176,177

. This molecular classification,

together with other factors, is used to decide on the clinical management of

breast cancer patients. All these and other markers can be implemented in a

multiplex padlock probe-based assay, performed on the same histological

sample and, in some years, in automatic fashion. Other tumors are already

screened for different mutations or molecular signatures of clinical utility and

their number is likely to increase; therefore, methods that can quickly implement

their use in the clinic, as for example the padlock probe assay, are a precious

resource.

30

Present investigations

In situ mutation detection and visualization of

intratumor heterogeneity for cancer research and diagnostics.

In this work we applied padlock probes and RCA for in situ detection of

clinically relevant point mutations in solid tumors.

Before starting EGFR antibody therapy, patients with CRC are tested for

activating mutations in KRAS, which make the therapy ineffective169

. Seven of

these point mutations are located in codon 12 and 13 representing almost 95% of

all mutations found in KRAS in these patients178

. The same mutations are also

found in lung adenocarcinoma, associated with poor prognosis, while some

EGFR mutations in presence of wild-type KRAS indicate that the tumor is more

sensitive to EGFR inhibitors179

. Although many molecular tests to detect these

variants already exist, new methods able to additionally capture the distribution

of these mutations in different regions of the tissue are needed.

We designed padlock probes to target a total of 14 different point mutations and

the corresponding wild-type alleles in KRAS, EGFR and tumor protein 53

(TP53) genes in colorectal and lung cancers. In this assay, gene specific primers

are used to synthetize cDNA to which complementary padlock probes hybridize

and are ligated. Circularized padlock probes are amplified by in situ RCA and

detected by hybridization of fluorescent-labelled complementary

oligonucleotides. Up to four colors can be easily distinguished. In total, 84

different samples were analyzed, using different combinations of padlock probes

to target different mutations. Of the 84 tumors, 44 represented prospective

samples of unknown molecular KRAS status and were subjected to multiplex

analysis of the seven most frequent KRAS mutations. For all samples tested by

the padlock in situ assay, DNA was extracted from the same tissues and

analyzed by pyrosequencing, one of the standard methods used in clinical

laboratories to detect the presence of KRAS mutations. For all samples tested,

the result of the in situ padlock assay was confirmed by DNA pyrosequencing

demonstrating 100% concordance between these two methods. However, more

information can be obtained by the in situ assay. For example, on the same

tissue section we were able to visualize heterogeneity of KRAS mutant

expression, which related to tumor progression and histological heterogeneity.

The same heterogeneity of expression was observed for an EGFR mutation that

differed in distinctive histological regions. By targeting several mutations in

situ, it was possible to distinguish different patterns of mutations in different

tissue compartments in lung cancer cases as well as loss-of-heterozygosity

31

(LOH). All this information is of potential clinical relevance, for instance for

monitoring tumor progression or prediction of response to therapy, but is lost by

methods that rely on bulk analysis of DNA extracted from the tumor tissue

without retaining spatial context.

To demonstrate the applicability of the method in a clinical setting, the padlock

probe assay was performed in a variety of different sample types, including

FFPE material, which is the most commonly used specimen in diagnostics and

tissue microarrays (TMAs), useful for high-throughput screening of point

mutations. Many of the samples included in the study presented a tumor cell

fraction lower than 10% and all were correctly genotyped with no need for

enrichment of the tumor fraction. Finally, by using cytological preparations

made by touch imprints from the fresh tumor, the KRAS mutation status was

obtained on the day of sample arrival. In conclusion the result of this work

shows the potential and applicability of the in situ padlock probe and RCA assay

for research and diagnostics.

In situ sequencing for RNA analysis in preserved tissue

and cells

In this work we demonstrated, for the first time in situ sequencing of single

mRNA molecules, providing a highly multiplex read-out to the padlock probe

assay.

The light spectrum resolved by an epi-fluorescent microscope limits the padlock

probe technology to detect only four targets at a time. Although this level of

multiplexibility is suitable for some applications, for example detection of

specific point mutations, a higher level of multiplexibility is needed to resolve

more complex patterns of gene expression or to allow for screening of many

markers simultaneously. Next generation sequencing or high-multiplex PCR can

be used for genome-wide gene expression analysis with cellular resolution.

Laser-assisted cell extraction from tissue section preserves the geographical

information but is practically limited in throughput to hundreds of cells and may

not represent differential gene expression in distinct regions of the tissue. Cells

isolated by enzymatic dissociation and FACS analysis, instead, lack the spatial

information. Sequencing RCPs generated in the tissue context allows multiplex

detection of mRNA molecules for direct gene expression analysis preserving

morphological information of the sample.

We used SBL chemistry125

to sequence the RCPs generated by clonal

amplification via RCA of padlock probes, in breast cancer fresh-frozen tissue

sections. First, we demonstrated sequencing of native transcripts by using

padlock gap probes and gap-fill polymerization to capture a short fraction, (four

32

nucleotides) of the retro-transcribed mRNA. After circularization and

amplification of the padlock gap probe, the targeted four nucleotide region was

sequenced by repeated cycles of SBL and imaging. In this way we were able to

distinguish HER2 and ACTB expression on a breast cancer section, and showed

that HER2 expression was confined to the tumor region. Because a four-

nucleotides read length is too short to uniquely distinguish a high number of

genes, we used gene-specific molecular barcodes inserted in the padlock probe

sequence to distinguish up to 256 (44) genes. We designed padlock probes to

target 39 transcripts in one ER-negative and two ER-positive breast cancer tissue

sections, including a panel of 21 genes used to predict distant tumor recurrence

in breast cancer patients180

. After random cDNA synthesis, we applied the

padlock probes and sequenced the RCPs, validating expression data from 31

genes. The data from the three samples corresponded to the known ER status.

We used HER2 and vimentin (VIM) expression as a marker to differentiate

between tumor and stromal compartments and used hematoxylin and eosin

(H&E) staining of the same section to confirm the histological localization.

When comparing the expression of genes detected in the HER2 or VIM

compartments from the ER-negative sample with published RNA-seq data, the

highest correlation was found with an ER-negative breast-cancer derived cell

line and normal breast tissue respectively. Later comparison with RNA-seq data

obtained from the same ER-negative tumor sample showed even stronger

correlation (unpublished data). Overall, this work demonstrates the feasibility of

using next generation sequencing to visualize gene expression profiles directly

in cells and tissue sections.

In situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic point mutations and gene expression levels in prostate cancers.

In this work we demonstrated in situ sequencing for simultaneous detection of

point mutations, gene expression and gene fusions and for visualization of intra

and inter-tumor heterogeneity in prostate cancer (PCa).

Prostate cancers are highly heterogeneous tumors, very often characterized by

multifocality which refers to the presence of two tumors separated by normal

tissue181,182

. The Gleason grading system assigns an overall score to the prostate

tumor based on morphological examination. Because of heterogeneity between

different tumor foci, the score is made by taking into account the two most

represented grades. Thus, the score of individual tumors does not always

represent the overall score181

. Different mutations are frequently found in PCa.

Fusion of androgen-regulated transmembrane protease serine 2 (TMPRSS2) with

the ETS-related gene (ERG) transcription factor is found in about 50% of

PCa183

. Amplification of androgen receptor (AR) gene is present in about 30% of

33

the cases184

, while a point mutation is present in TP53 in 10-35% of the cases185

.

Visualization and measurement of heterogeneity at molecular level may improve

grading of PCa and development of therapeutic strategies.

We designed barcoded padlock probes to target the five most frequent

TMPRSS2-ERG fusion variants described in literature: T1/E5, T1/E4, T1/E5,

T2/E4 and T2/E5 (numbers indicate in what exon of the two genes the fusion

junction is located). Furthermore, we designed barcoded padlock probes to

detect AR, a specific point mutation in TP53 gene and alpha-methylacyl CoA

racemase (AMACR), also found over-expressed in PCa186

. The pool of probes

and the sequencing of barcodes were tested on VCaP cells carrying T1/E4 fusion

transcripts, AR amplification and a specific TP53 point mutation and LNCaP

cells as negative control for these mutations. We specifically identified the

mutations and increased expression levels of AR in VCaP cells compared to

LNCaP cells. We then tested padlock probes for detection of TMPRSS2-ERG

fusion transcripts in 13 FFPE tumor tissues that had been previously evaluated

by immunohistochemistry staining. We scored seven cases positive for fusion

transcripts and six as negative with 100% concordance between padlock probes

and antibody-based detection of ERG overexpression. By sequencing the

molecular barcodes the identity of the fusion variants could be revealed showing

heterogeneity among different tissue regions. We therefore sequenced fusion

variant-specific padlock probes in fresh-frozen tissue sections from four of the

cases being positive for TMPRSS2-ERG. Two samples showed a similar pattern

of fusion variants (~90% T1/E4 and ~10% T1/E5 of the total reads)

homogeneously distributing over the tissue. The other two samples showed

different fusion variant compositions: in one PCa T1/E4 was present at 92% of

the total reads and T2/E4 representing 4% of the reads but only found in one

fifth of the imaged area. In the last sample, T2/E4 was the most prevalent variant

(76% of the total reads), while 13% of the reads resulted T1/E4 and were found

only in a third of the imaged area. Despite the small number of sample analyzed,

we demonstrated that padlock probes and in situ sequencing can be used to

visualize and quantify intra and inter-tumor heterogeneity in prostate cancer.

Oligonucleotide gap-fill ligation for mutation detection

and sequencing in situ.

In this work we demonstrated the use of oligonucleotide gap-fill ligation as a

mechanism for mutation detection and as an alternative to gap-fill

polymerization for targeted in situ sequencing.

Padlock gap probes differ from classic padlock probes in the fact that upon

hybridization to a template, a gap is left between the ends of the two target-

complementary arms. This gap can be either filled by polymerization or by

34

ligation of an oligonucleotide matching the gap in size and sequence

complementarity. The padlock gap probe can then be circularized by ligation

and amplified via RCA. When gap-fill ligation is used to circularize the padlock

probe, two separate ligation events need to occur. We investigated the

specificity and efficiency of this mechanism when a pool of oligonucleotides,

called gap probes, was used to circularize padlock gap probes and demonstrated

the feasibility to reveal the identity of the incorporated oligonucleotide by

sequencing it in situ.

To test this approach, we chose a six nucleotide-long region on the mtDNA

containing a point mutation associated with mitochondrial encephalomyopathy,

lactic acidosis and stroke-like episodes syndrome (MELAS)187

. We designed a

fully complementary gap probe and other gap probes having a single nucleotide

mismatch at different positions. When seven gap probes were combined, only

the perfect match being 5’-phosphorylated (a requirement for the DNA ligation

to occur) we were able to distinguish between a wild-type cell line and a mutant

cell line by presence or absence of signals. A parallel reaction where instead the

MELAS specific gap probe was 5’-phosphorylated, showed presence of signal

in the mutant cell line but not in the wild-type. We estimated this reaction to be

about 20% less efficient than a single DNA ligation of a padlock probe. We also

tested the method to genotype mRNA molecules and reveal the identity of the

incorporated gap probe by in situ sequencing. By using two allele-specific gap

probes, both 5’-phosphorylated, we were able to distinguish between human and

mouse fibroblasts and to assess the incorporation of the correct gap probes by

sequencing a targeted SNP on the ACTB transcripts of the two cell lines. This

method has a simpler and cheaper design compared to the use of multiple

padlock probes for detection of several mutations in a short sequence region.

Also, the double ligation provides high specificity to the reaction although the

overall efficiency is lower. If necessary, the identity of the gap probe

incorporated can be revealed by sequencing.

35

Concluding remarks and future

perspectives

In situ sequencing, a padlock probe-based technology described in this thesis,

has enormous potential as the method of choice for tissue analysis in research

and in clinical settings. This molecular tool can detect clinically relevant point

mutations as well as other structural genetic aberrations and resolve gene

expression patterns directly in the context of cells and tissues. This technology

can be used to realize the so called “digital pathology” where optical images are

transformed into digital signals and these in turn are translated into information

of various types188,189

. In the near future, tissue samples will be automatically

processed for morphological and molecular staining, and images of the whole

slide will be acquired and automatically analyzed through different image

analysis software. To that end, at least three major improvements of in situ

sequencing are required.

First, the procedure has to be automated to increase speed, throughput and

reproducibility. Implementation of the protocol into an existing NGS machine

may enable a higher level of automation. The fluorescent read-out and the

sequencing chemistry used are, in principle, compatible with some of the

existing NGS systems. Automation will be necessary to scale up the number of

samples analyzed and to test the robustness of the method in a large study

cohort. Fluidic systems that necessitate lower volume of reagents will likely

increase the efficiency of the molecular reactions and at the same time, decrease

time and cost of the assay.

Second, the sensitivity of in situ sequencing needs to be increased to extend the

analysis to low abundant genes or the detection of rare cells, which would be of

great importance for clinical applications. This can be achieved in different

ways. One way is to increase the sensitivity of the detection. In situ padlock

probe assays already exploit the RCA method, but further signal amplification

can be achieved in various ways190

. To increase the signal-to-noise ratio,

however, it may be necessary to decrease the fluorescent background of the

specimen. Several protocols have been described for decreasing sample

autofluorescence and background, including the most recent tissue clarification

techniques191,192

. Some of these protocols may be compatible with the padlock

probe assay. Alternatively, one can increase the analytical sensitivity of the

method by amplification of the target material. A method based on recombinases

which allows exponential isothermal amplification of DNA molecules193

has

been tested in situ in combination with padlock probe detection (unpublished

data). Diffusion of the amplified target between cells thereby altering the spatial

36

definition of expression profiles may be problematic; however, spatial resolution

may be of less importance in case of mutation detection where presence or

absence of a clinical mutation is the primary question.

Finally, an extensive use of this method will generate large amounts of data.

With the imaging setting used in this work, imaging 50 mm2 of tissue at 20x

magnification generates about 10 gigabytes of data for one single cycle. Imaging

many samples for four rounds of sequencing cycles will produce terabytes of

data. High computational power will be needed to store and analyze such data.

Solutions developed to handle data produced by NGS machines can potentially

be used for in situ sequencing as well. To fully integrate in situ sequencing in

digital pathology, however, the method requires the integration of software and

algorithms able to analyze digital information and transform them into

biological information. These may be easy to implement for some applications

like detecting point mutations and scoring of tissues based on the expression of a

specific mutation. More difficult tasks will be the recognition of a specific

pattern based on an a priori hypothesis, finding new patterns from the molecular

profile and relating the molecular information to the morphological structures.

Some work in this direction has already been done, mostly for H&E stained

samples194–196

. Integrating or developing new software for molecular analysis is

the biggest challenge that in situ sequencing method will face.

37

Populärvetenskaplig

sammanfattning på svenska

Ett sätt att försöka fastställa vad en vävnad gör och hur den mår är att mäta

aktiviteten av gener hos cellerna som utgör vävnadens byggstenar. Alla celler i

vävnaden har samma uppsättning gener, men skälet till att olika celltyper ser

olika ut och har olika funktion i vävnaden beror på vilka gener som är aktiva i

varje enskild cell. Genaktiviteten kan uppskattas genom att mäta mängden

mRNA som produceras från varje enskild aktiv gen. Traditionellt har man gjort

denna analys genom att extrahera RNA från vävnaden och sedan mäta hur

mycket det finns av varje mRNA. Problemet med den typen av analys är att

vävnader består av en mängd olika celltyper och dessa kan befinna sig i olika

tillstånd på grund av mikromiljön i olika delar av vävnaden. Informationen om

denna vävnadsheterogenitet går förlorad då mätvärdena blandas ihop till ett

genomsnitt för hela den vävnadsdel som analyserats. Ny teknologi möjliggör

analys av det totala mRNA innehållet (transkriptomet) i enskilda celler som

isolerats från vävnader, men informationen om den ursprungliga placeringen i

vävnaden går förlorad, och det är heller inte möjligt att använda denna teknologi

för att analysera alla celler i en vävnad. Alltså behövs nya metoder som tillför en

spatiell dimension till data för att kunna koppla ihop informationen om mRNA-

uttryck i enskilda celler med vävnadstruktur och morfologi.

I denna avhandling presenterar jag utveckling och tillämpning av en klass av

molekylära redskap som kallas padlock prober (molekylära hänglås) och

rullande cirkel amplifiering (RCA) för spatiellt upplöst transkriptomanalys.

Padlock prober tillåter in situ (på plats i celler) detektion av enskilda mRNA

molekyler med en upplösning som är tillräcklig för att särskilja enskilda bas-

skillnader mellan mRNA sekvenser, och på så sätt visualisera molekylär

information i vävnaden. Till exempel visas i min avhandling hur kliniskt

relevanta cancerframkallande mutationer är utbredda i snitt från tumörer, och

visualiserar därmed den heterogenitet som finns inom tumörer med avseende på

dessa mutationer. För att spatiellt kunna kartlägga mer komplexa

genuttrycksmönster i vävnader, har vi utvecklat en teknik som tillåter

sekvensering av RCA producter in situ. Vi kombinerade då vår padlock prob

teknik med en sekvenseringskemi som används inom massiv parallel

sekvensering (next generation sequencing). Vi visade för första gången att det

går att sekvensera korta bitar av mRNA direkt i fixerade celler och vävnader

med metoden. Genom att använda in situ sekvensering går det att läsa av mer

komplexa pooler av padlock prober som samtidigt detekterar och identifierar

många olika mRNA molekyler i samma cell. Vi använde denna strategi för att

38

mäta aktiviteten av tiotals gener i hundratusentals celler, i analyser som

inkluderar punktmutationer, fusionstranskript samt uttrycksnivå.

De presenterade molekylära verktygen komplementerar vanlig

transkriptomanalys genom att tillföra en spatiell dimension till den molekylära

informationen. Den nivå av upplösning som de nya metoderna ger är viktig för

att fullt ut förstå många biologiska processer och kan komma att bli viktig för att

välja rätt behandling till cancerpatienter.

39

Acknowledgments

First of all, I would like to thank all of you who have read my thesis. Many

people contributed, in a way or another, to this work and I would like to thank

all of them, hoping that if I forget someone he/she will forgive me.

I’d like to start with my supervisor, Mats. It’s not easy to express in few

sentences all my gratitude for everything you did for me. Thanks for never

giving up in front of my endless pessimism, for teaching me that there are other

important things in life beside lab work and for showing me that it is possible to

face everyday problems with a smile and calm! I really had a lot of fun working

with you!

I also would like to thank my opponent, Prof. Sunney Xie and the members of

my PhD committee, Prof. Kamila Czene, Prof. Joakim Lundeberg and Prof.

Stefan Nordlund for reading my thesis and finding the time for being here

today to discuss with me.

Half of this work has been done at Uppsala University and I would like to thank

Ulf Landegren for the beautiful years I have spent sharing the lab with his

group and, with Ola and Masood, for creating an exciting scientific

environment. Thank you all!

“Seeing is believing”: a special thanks goes to Carolina Wählby and to her

group in Uppsala, particularly to Alexandra, for making us believe that in situ

sequencing was really working and for being the best collaborators one could

wish to work with!

I want to thank my other collaborators especially for the patience they showed

working with a moody person like me! Rongqin, thanks for the long time spent

together trying to make in situ seq working, for the many talks and all the times

we shared room at retreats and conferences! Now that you are finally with your

family, I wish you to realize all your other dreams! Chatarina, thanks a lot for

introducing me to the lab and for teaching me padlocks! Sara K, thanks for the

patience showed working with me on the prostate paper and for all the times that

you cleaned my messy bench! Ida, thanks for the many nice meetings and

discussions in the lab and outside the lab.

I want to thank the wonderful people that surrounded me every day in the lab

making it a fun place to work in. Anja, how could I have done without you? I

will never thank you enough for everything you did, for all the help at work and

outside work, for the morning therapeutic sessions and for making much nicer

the gray mornings on the train. Thanks for sharing the “up and down” moments

40

of the PhD, the sugar crisis and for the many laughs…thanks because you made

everything smooth for me! Ivan, thanks for being a good friend and for bringing

me out to concerts and beers from time to time. And also thanks for

demonstrating me that time is truly relative, particularly for Colombians!

Thanks Annika for always taking care of every person in the lab and always

being supportive. Elin L, thanks for taking great care of Philippa and, with

Annika, for being such a nice desk mate! Tom, thanks for the patience

demonstrated having me as supervisor and for being a great colleague

afterwards. And thanks for all the time spent designing the lab website! Di, it’s

great to have you in the group, thanks for all the good discussions, the help in

the lab and for sharing your bright ideas. Malte, thanks for always promoting

social after-work activities and with Thomas for taking care of the beer supplies

in the lab. Thomas, thanks for sharing your curiosity and experience on the

most exotic culinary places in Stockholm. Jessica, thanks for sharing the lab

bench and never get upset for the mess that I always leave! Thanks Pavan for

always having nice words for me and for the great Indian food we had at your

place. Xiaoyan, thanks for the chats around a beer in the lab and for all the time

spent to help me out with image analysis. Amel, thanks for wishing me good

morning, every morning, despite my usual bad morning mood! Thanks Tagrid

for your smile, it cheers me up everytime! Chenling, thanks for your enthusiasm

on the new projects and for the tea you brought to me. Thanks to Mustapha and

all the other students for making the lab such a nice environment!

Anna, thanks for the glasses of wine while discussing about work and life, for

your happy laugh and all the fun we had in the lab and, with Jonas, for giving

me the giant palm tree!

And a big thank to all of you guys for filling up my small apartment with

balloons and for giving me Philippa!

On the Uppsala side, thanks Elin F, Lotte and Camilla for sharing the office with

me: Elin, I miss the jumping cows on your screen! Lotte, thanks for the nice

discussions during late lunches in Rudbeck and Camilla, thanks a lot for all

your help with the confocal during teaching! Thanks Lucy for the nice chats in

Rudbeck, David for the many stories about travels and different cultures and

Erik, thanks for your “hidden” work and your stimulating questions about

research.

Thanks to all the friends which I have met here in Stockholm for the nice time

spent together: Tim, Anatoly and the people from Achour group that shared the

office space with me. Thanks to Lara, Cecilia and Rnjana for the great party

nights and the good time spent outside the lab. Thanks to the Italian crew in

41

SciLife: Walter, Claudia and particularly Davide and Elena for the long coffee

breaks, the fun and the nice moments down in the basement! ; )

I had a great time when I was in Uppsala and I would like to thank all the people

I have met and shared the lab with: Christina and Elin E, thanks for taking

good care of the lab; Juhnong, for always updating me on the most recent

gossips! Lei for the many good discussions, Carl-Magnus for sharing the office

with me when I first came and for introducing me to the Swedish culture; Tonge

for your beautiful and contagious laugh and your crazy spirit; Gaelle for the nice

evenings spent outside in Uppsala and Stockholm; Caroline, thanks a lot for all

your help with the thesis! And thanks Liza, Johan O, Karin, Linda, Agata,

Andries, Maria, Abdullah, Johakim, Johan V, Irene and Pier for all the good

time I have spent with you in the lab.

Thanks to Laura, Chiara and Marco for sharing frustrations and complaints

about the weather, food and everything that is different from home! Mr Jeffrey

and Mr Tjerk thanks for the great Valborg we had and for all the crazy nights

spent together at the Nations! Ammar, thanks for being a good friend and with

Hanan thank for your great hospitality and all the chats; Jonathan, buddy,

thanks a lot for being always around, for the many evenings spent together with

Spyros, for all the good time we had and for the tough moments we have been

through together!

I had some tough moments during these years and I want to say thanks to

Alessio for always being there to help and listen and for attending to a

conference in Umeå just to have an excuse to come and visit me! And thanks to

all my good friends in Italy that always supported me!

Two persons have been particularly important for me in these many years.

Spyros, thanks for all the fun, the trips, the long talks and many, many laughs

and from time to time for reminding me about the superiority of the Greek

culture (particularly at complaining!). But, above everything, thanks for always

telling me what you think and not just what I’d like to hear.

Rachel, thanks for every moment we have spent together…you have been my

most demanding reviewer, my most passionate supporter, my best friend.

Thanks, because you can always make me smile! : )

Finally, I want to thank my beloved family that, even if far away, I have always

felt here with me. Grazie Luca, Simona e Alice per essere qui con me oggi e per

tutte le feste fatte ogni volta che torno a casa! Grazie dada per tutti i momenti in

cui mi sei stata vicina in questi anni. E grazie mamma e papa’ per tutto il

vostro affetto e supporto. Spero che oggi tutti voi possiate essere orgogliosi di

me.

42

References 1. Schwartz, S. M. The definition of cell type. Circ. Res. 84, 1234–1235 (1999).

2. Huang, S. Non-genetic heterogeneity of cells in development: more than just noise. Dev. Camb.

Engl. 136, 3853–3862 (2009).

3. McClellan, J. & King, M.-C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).

4. Cai, X. et al. Single-cell, genome-wide sequencing identifies clonal somatic copy-number

variation in the human brain. Cell Rep. 8, 1280–1289 (2014).

5. Kim, J. et al. Somatic deletions implicated in functional diversity of brain cells of individuals

with schizophrenia and unaffected controls. Sci. Rep. 4, 3807 (2014).

6. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).

7. Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of

genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).

8. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

9. Piotrowski, A. et al. Somatic mosaicism for copy number variation in differentiated human

tissues. Hum. Mutat. 29, 1118–1124 (2008).

10. Mkrtchyan, H. et al. The human genome puzzle - the role of copy number variation in somatic

mosaicism. Curr. Genomics 11, 426–431 (2010).

11. O’Huallachain, M., Karczewski, K. J., Weissman, S. M., Urban, A. E. & Snyder, M. P. Extensive

genetic variation in somatic human tissues. Proc. Natl. Acad. Sci. U. S. A. 109, 18018–18023

(2012).

12. Zong, C., Lu, S., Chapman, A. R. & Xie, X. S. Genome-wide detection of single-nucleotide and

copy-number variations of a single human cell. Science 338, 1622–1626 (2012).

13. Quinlan, A. R. & Hall, I. M. Characterizing complex structural variation in germline and somatic

genomes. Trends Genet. 28, 43–53 (2012).

14. Sproul, D., Gilbert, N. & Bickmore, W. A. The role of chromatin structure in regulating the

expression of clustered genes. Nat. Rev. Genet. 6, 775–781 (2005).

15. Hayashi, K., Lopes, S. M. C. de S., Tang, F. & Surani, M. A. Dynamic equilibrium and

heterogeneity of mouse pluripotent stem cells with distinct functional and epigenetic states. Cell

Stem Cell 3, 391–401 (2008).

16. Heinz, S. et al. Effect of natural genetic variation on enhancer selection and function. Nature 503,

487–492 (2013).

17. Zopf, C. J., Quinn, K., Zeidman, J. & Maheshri, N. Cell-cycle dependence of transcription

dominates noise in gene expression. PLoS Comput. Biol. 9, e1003161 (2013).

18. Sasagawa, Y. et al. Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing

method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 14, R31 (2013).

19. Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its

consequences. Cell 135, 216–226 (2008).

20. Li, G.-W. & Xie, X. S. Central dogma at the single-molecule level in living cells. Nature 475,

308–315 (2011).

21. Choi, P. J., Xie, X. S. & Shakhnovich, E. I. Stochastic switching in gene networks can occur by a

single-molecule event or many molecular steps. J. Mol. Biol. 396, 230–244 (2010).

22. Oltvai, Z. N. & Barabási, A.-L. Systems biology. Life’s complexity pyramid. Science 298, 763–

764 (2002).

23. Luscombe, N. M. et al. Genomic analysis of regulatory network dynamics reveals large

topological changes. Nature 431, 308–312 (2004).

24. Yu, P. et al. Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation.

Genome Res. 23, 352–364 (2013).

25. Biressi, S., Molinaro, M. & Cossu, G. Cellular heterogeneity during vertebrate skeletal muscle

development. Dev. Biol. 308, 281–293 (2007).

26. Rusnakova, V. et al. Heterogeneity of astrocytes: from development to injury - single cell gene

expression. PloS One 8, e69734 (2013).

43

27. Gordon, S. & Plűddemann, A. Tissue macrophage heterogeneity: issues and prospects. Semin.

Immunopathol. 35, 533–540 (2013).

28. Nolan, D. J. et al. Molecular signatures of tissue-specific microvascular endothelial cell

heterogeneity in organ maintenance and regeneration. Dev. Cell 26, 204–219 (2013).

29. Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis

project. Nat. Genet. 45, 1113–1120 (2013).

30. Michor, F. & Weaver, V. M. Understanding tissue context influences on intratumour

heterogeneity. Nat. Cell Biol. 16, 301–302 (2014).

31. Gottesman, M. M. Mechanisms of cancer drug resistance. Annu. Rev. Med. 53, 615–627 (2002).

32. Kim, J. J. & Tannock, I. F. Repopulation of cancer cells during therapy: an important cause of

treatment failure. Nat. Rev. Cancer 5, 516–525 (2005).

33. Fisher, R., Pusztai, L. & Swanton, C. Cancer heterogeneity: implications for targeted

therapeutics. Br. J. Cancer 108, 479–485 (2013).

34. Bedard, P. L., Hansen, A. R., Ratain, M. J. & Siu, L. L. Tumour heterogeneity in the clinic.

Nature 501, 355–364 (2013).

35. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide

resolution. Nature 461, 809–813 (2009).

36. Walter, M. J. et al. Clonal architecture of secondary acute myeloid leukemia. N. Engl. J. Med.

366, 1090–1098 (2012).

37. La Thangue, N. B. & Kerr, D. J. Predictive biomarkers: a paradigm shift towards personalized

cancer medicine. Nat. Rev. Clin. Oncol. 8, 587–596 (2011).

38. Schink, J. C. et al. Biomarker testing for breast, lung, and gastroesophageal cancers at NCI

designated cancer centers. J. Natl. Cancer Inst. 106, dju256 (2014).

39. Rosenblum, D. & Peer, D. Omics-based nanomedicine: the future of personalized oncology.

Cancer Lett. 352, 126–136 (2014).

40. Huang, M., Shen, A., Ding, J. & Geng, M. Molecularly targeted cancer therapy: some lessons

from the past decade. Trends Pharmacol. Sci. 35, 41–50 (2014).

41. Toriello, N. M. et al. Integrated microfluidic bioprocessor for single-cell gene expression

analysis. Proc. Natl. Acad. Sci. U. S. A. 105, 20173–20178 (2008).

42. Bayraktar, O. A., Fuentealba, L. C., Alvarez-Buylla, A. & Rowitch, D. H. Astrocyte

Development and Heterogeneity. Cold Spring Harb. Perspect. Biol. (2014).

doi:10.1101/cshperspect.a020362

43. Jennings, L., Van Deerlin, V. M., Gulley, M. L. & College of American Pathologists Molecular

Pathology Resource Committee. Recommended principles and practices for validating clinical

molecular pathology tests. Arch. Pathol. Lab. Med. 133, 743–755 (2009).

44. Bonner, R. F. et al. Laser capture microdissection: molecular analysis of tissue. Science 278,

1481–1483 (1997).

45. Kurimoto, K., Yabuta, Y., Ohinata, Y. & Saitou, M. Global single-cell cDNA amplification to

provide a template for representative high-density oligonucleotide microarray analysis. Nat.

Protoc. 2, 739–752 (2007).

46. Suarez-Quian, C. A. et al. Laser capture microdissection of single cells from complex tissues.

BioTechniques 26, 328–335 (1999).

47. White, A. K. et al. High-throughput microfluidic single-cell RT-qPCR. Proc. Natl. Acad. Sci. U.

S. A. 108, 13999–14004 (2011).

48. Yoshimoto, N. et al. An automated system for high-throughput single cell-based breeding. Sci.

Rep. 3, 1191 (2013).

49. Lovatt, D. et al. Transcriptome in vivo analysis (TIVA) of spatially defined single cells in live

tissue. Nat. Methods 11, 190–196 (2014).

50. Kalisky, T., Blainey, P. & Quake, S. R. Genomic analysis at the single-cell level. Annu. Rev.

Genet. 45, 431–445 (2011).

51. Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule

sensitivity in single cells. Science 329, 533–538 (2010).

52. Hocine, S., Raymond, P., Zenklusen, D., Chao, J. A. & Singer, R. H. Single-molecule analysis of

gene expression using two-color RNA labeling in live yeast. Nat. Methods 10, 119–121 (2013).

44

53. Hughes, M. et al. High-resolution time course analysis of gene expression from pituitary. Cold

Spring Harb. Symp. Quant. Biol. 72, 381–386 (2007).

54. Bustin, S. A. et al. The MIQE guidelines: minimum information for publication of quantitative

real-time PCR experiments. Clin. Chem. 55, 611–622 (2009).

55. Mullis, K. B. & Faloona, F. A. Specific synthesis of DNA in vitro via a polymerase-catalyzed

chain reaction. Methods Enzymol. 155, 335–350 (1987).

56. Higuchi, R., Fockler, C., Dollinger, G. & Watson, R. Kinetic PCR analysis: real-time monitoring

of DNA amplification reactions. Biotechnol. Nat. Publ. Co. 11, 1026–1030 (1993).

57. Rutledge, R. G. & Cote, C. Mathematics of quantitative kinetic PCR and the application of

standard curves. Nucleic Acids Res. 31, e93 (2003).

58. Rappolee, D. A., Mark, D., Banda, M. J. & Werb, Z. Wound macrophages express TGF-alpha

and other growth factors in vivo: analysis by mRNA phenotyping. Science 241, 708–712 (1988).

59. Delidow, B. C., Peluso, J. J. & White, B. A. Quantitative measurement of mRNAs by polymerase

chain reaction. Gene Anal. Tech. 6, 120–124 (1989).

60. Adamski, M. G., Gumann, P. & Baird, A. E. A method for quantitative analysis of standard and

high-throughput qPCR expression data based on input sample quantity. PLoS ONE 9, e103917

(2014).

61. Citri, A., Pang, Z. P., Südhof, T. C., Wernig, M. & Malenka, R. C. Comprehensive qPCR

profiling of gene expression in single neuronal cells. Nat. Protoc. 7, 118–127 (2012).

62. Beer, N. R. et al. On-chip, real-time, single-copy polymerase chain reaction in picoliter droplets.

Anal. Chem. 79, 8471–8475 (2007).

63. Sanders, R. et al. Evaluation of digital PCR for absolute DNA quantification. Anal. Chem. 83,

6474–6484 (2011).

64. Dalerba, P. et al. Single-cell dissection of transcriptional heterogeneity in human colon tumors.

Nat. Biotechnol. 29, 1120–1127 (2011).

65. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression

patterns with a complementary DNA microarray. Science 270, 467–470 (1995).

66. Tietjen, I. et al. Single-cell transcriptional analysis of neuronal progenitors. Neuron 38, 161–175

(2003).

67. Livesey, F. J. Strategies for microarray analysis of limiting amounts of RNA. Brief. Funct.

Genomic. Proteomic. 2, 31–36 (2003).

68. Kurimoto, K. et al. An improved single-cell cDNA amplification method for efficient high-

density oligonucleotide microarray analysis. Nucleic Acids Res. 34, e42 (2006).

69. Lockwood, W. W., Chari, R., Chi, B. & Lam, W. L. Recent advances in array comparative

genomic hybridization technologies and their applications in human genetics. Eur. J. Hum. Genet.

14, 139–148 (2006).

70. LaFramboise, T. Single nucleotide polymorphism arrays: a decade of biological, computational

and technological advances. Nucleic Acids Res. 37, 4181–4193 (2009).

71. Draghici, S., Khatri, P., Eklund, A. C. & Szallasi, Z. Reliability and reproducibility issues in

DNA microarray measurements. Trends Genet. 22, 101–109 (2006).

72. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of

technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–

1517 (2008).

73. Malone, J. H. & Oliver, B. Microarrays, deep sequencing and the true measure of the

transcriptome. BMC Biol. 9, 34 (2011).

74. Bains, W. & Smith, G. C. A novel method for nucleic acid sequence determination. J. Theor.

Biol. 135, 303–307 (1988).

75. Strezoska, Z. et al. DNA sequencing by hybridization: 100 bases read by a non-gel-based method.

Proc. Natl. Acad. Sci. U. S. A. 88, 10089–10093 (1991).

76. Drmanac, R. et al. DNA sequence determination by hybridization: a strategy for efficient large-

scale sequencing. Science 260, 1649–1652 (1993).

77. Metzker, M. L. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

78. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis.

Cell 133, 523–536 (2008).

45

79. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying

mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

80. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat.

Rev. Genet. 10, 57–63 (2009).

81. Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K. & Liu, X. Comparison of RNA-Seq and

microarray in transcriptome profiling of activated T cells. PloS One 9, e78644 (2014).

82. Ozsolak, F. et al. Direct RNA sequencing. Nature 461, 814–818 (2009).

83. Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the

human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).

84. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382

(2009).

85. Grindberg, R. V. et al. RNA-sequencing from single nuclei. Proc. Natl. Acad. Sci. U. S. A. 110,

19802–19807 (2013).

86. Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse transcriptase

template switching: a SMART approach for full-length cDNA library construction.

BioTechniques 30, 892–897 (2001).

87. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual

circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

88. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat.

Methods 10, 1096–1098 (2013).

89. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex

RNA-seq. Genome Res. 21, 1160–1167 (2011).

90. Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of

tissues into cell types. Science 343, 776–779 (2014).

91. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers.

Nat. Methods 9, 72–74 (2012).

92. Shiroguchi, K., Jia, T. Z., Sims, P. A. & Xie, X. S. Digital RNA sequencing minimizes sequence-

dependent bias and amplification noise with optimized single-molecule barcodes. Proc. Natl.

Acad. Sci. U. S. A. 109, 1347–1352 (2012).

93. Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods

11, 163–166 (2014).

94. Hulett, H. R., Bonner, W. A., Barrett, J. & Herzenberg, L. A. Cell sorting: automated separation

of mammalian cells as a function of intracellular fluorescence. Science 166, 747–749 (1969).

95. Klemm, S. et al. Transcriptional profiling of cells sorted by RNA abundance. Nat. Methods 11,

549–551 (2014).

96. Yu, H., Ernst, L., Wagner, M. & Waggoner, A. Sensitive detection of RNAs in single cells by

flow cytometry. Nucleic Acids Res. 20, 83–88 (1992).

97. Gall, J. G. & Pardue, M. L. Formation and detection of RNA-DNA hybrid molecules in

cytological preparations. Proc. Natl. Acad. Sci. U. S. A. 63, 378–383 (1969).

98. Bauman, J. G., Wiegant, J., Borst, P. & van Duijn, P. A new method for fluorescence

microscopical localization of specific DNA sequences by in situ hybridization of

fluorochromelabelled RNA. Exp. Cell Res. 128, 485–490 (1980).

99. Singer, R. H. & Ward, D. C. Actin gene expression visualized in chicken muscle tissue culture by

using in situ hybridization with a biotinated nucleotide analog. Proc. Natl. Acad. Sci. U. S. A. 79,

7331–7335 (1982).

100. Femino, A. M., Fay, F. S., Fogarty, K. & Singer, R. H. Visualization of single RNA transcripts in

situ. Science 280, 585–590 (1998).

101. Capodieci, P. et al. Gene expression profiling in single cells within tissue. Nat. Methods 2, 663–

665 (2005).

102. Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in

mammalian cells. PLoS Biol. 4, e309 (2006).

103. Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual

mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).

104. Semrau, S. et al. FuseFISH: robust detection of transcribed gene fusions in single cells. Cell Rep.

6, 18–23 (2014).

46

105. Lyubimova, A. et al. Single-molecule mRNA detection and counting in mammalian tissue. Nat.

Protoc. 8, 1743–1758 (2013).

106. Itzkovitz, S. et al. Single-molecule transcript counting of stem-cell markers in the mouse

intestine. Nat. Cell Biol. 14, 106–114 (2012).

107. Levesque, M. J., Ginart, P., Wei, Y. & Raj, A. Visualizing SNVs to quantify allele-specific

expression in single cells. Nat. Methods 10, 865–867 (2013).

108. Hansen, C. H. & van Oudenaarden, A. Allele-specific detection of single mRNA molecules in

situ. Nat. Methods 10, 869–871 (2013).

109. Levsky, J. M., Shenoy, S. M., Pezo, R. C. & Singer, R. H. Single-cell gene expression profiling.

Science 297, 836–840 (2002).

110. Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA

profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).

111. Lubeck, E. & Cai, L. Single-cell systems biology by super-resolution imaging and combinatorial

labeling. Nat. Methods 9, 743–748 (2012).

112. Lu, J. & Tsourkas, A. Imaging individual microRNAs in single mammalian cells in situ. Nucleic

Acids Res. 37, e100 (2009).

113. Kim, D. H., Hong, Y. K., Egholm, M. & Strauss, W. M. Non-disruptive PNA-FISH protocol for

formalin-fixed and paraffin-embedded tissue sections. BioTechniques 31, 472, 475–476 (2001).

114. Yaroslavsky, A. I. & Smolina, I. V. Fluorescence imaging of single-copy DNA sequences within

the human genome using PNA-directed padlock probe assembly. Chem. Biol. 20, 445–453

(2013).

115. Nuovo, G. J. In situ PCR: protocols and applications. PCR Methods Appl. 4, S151–167 (1995).

116. Bagasra, O. Protocols for the in situ PCR-amplification and detection of mRNA and DNA

sequences. Nat. Protoc. 2, 2782–2795 (2007).

117. Long, A. A., Komminoth, P., Lee, E. & Wolfe, H. J. Comparison of indirect and direct in-situ

polymerase chain reaction in cell preparations and tissue sections. Detection of viral DNA, gene

rearrangements and chromosomal translocations. Histochemistry 99, 151–162 (1993).

118. Qian, X. & Lloyd, R. V. Recent developments in signal amplification methods for in situ

hybridization. Diagn. Mol. Pathol. 12, 1–13 (2003).

119. Kern, D. et al. An enhanced-sensitivity branched-DNA assay for quantification of human

immunodeficiency virus type 1 RNA in plasma. J. Clin. Microbiol. 34, 3196–3202 (1996).

120. Player, A. N., Shen, L. P., Kenny, D., Antao, V. P. & Kolberg, J. A. Single-copy gene detection

using branched DNA (bDNA) in situ hybridization. J. Histochem. Cytochem. 49, 603–612 (2001).

121. Battich, N., Stoeger, T. & Pelkmans, L. Image-based transcriptomics in thousands of single

human cells at single-molecule resolution. Nat. Methods 10, 1127–1133 (2013).

122. Choi, H. M. T. et al. Programmable in situ amplification for multiplexed imaging of mRNA

expression. Nat. Biotechnol. 28, 1208–1212 (2010).

123. Ali, M. M. et al. Rolling circle amplification: a versatile tool for chemical biology, materials

science and medicine. Chem. Soc. Rev. 43, 3324–3341 (2014).

124. Banér, J., Nilsson, M., Mendel-Hartvig, M. & Landegren, U. Signal amplification of padlock

probes by rolling circle replication. Nucleic Acids Res. 26, 5073–5078 (1998).

125. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling

DNA nanoarrays. Science 327, 78–81 (2010).

126. Nilsson, M. et al. Padlock probes: circularizing oligonucleotides for localized DNA detection.

Science 265, 2085–2088 (1994).

127. Landegren, U., Kaiser, R., Sanders, J. & Hood, L. A ligase-mediated gene detection technique.

Science 241, 1077–1080 (1988).

128. Luo, J., Bergstrom, D. E. & Barany, F. Improving the fidelity of Thermus thermophilus DNA

ligase. Nucleic Acids Res. 24, 3071–3078 (1996).

129. Lizardi, P. M. et al. Mutation detection and single-molecule counting using isothermal rolling-

circle amplification. Nat. Genet. 19, 225–232 (1998).

130. Larsson, C. et al. In situ genotyping individual DNA molecules by target-primed rolling-circle

amplification of padlock probes. Nat. Methods 1, 227–232 (2004).

131. Larsson, C., Grundberg, I., Söderberg, O. & Nilsson, M. In situ detection and genotyping of

individual mRNA molecules. Nat. Methods 7, 395–397 (2010).

47

132. Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods 10,

857–860 (2013).

133. Sunkin, S. M. Towards the integration of spatially and temporally resolved murine gene

expression databases. Trends Genet. 22, 211–217 (2006).

134. Singh, R. P. et al. High-resolution voxelation mapping of human and rodent brain gene

expression. J. Neurosci. Methods 125, 93–101 (2003).

135. Liu, D. & Smith, D. J. Voxelation and gene expression tomography for the acquisition of 3-D

gene expression maps in the brain. Methods 31, 317–325 (2003).

136. Sforza, D. M. & Smith, D. J. Voxelation methods for genome scale imaging of brain gene

expression. Methods Enzymol. 386, 314–323 (2004).

137. Chin, M. H. et al. A genome-scale map of expression for a mouse brain section obtained using

voxelation. Physiol. Genomics 30, 313–321 (2007).

138. Okamura-Oho, Y. et al. Transcriptome tomography for brain analysis in the web-accessible

anatomical space. PloS One 7, e45373 (2012).

139. Miller, J. A. et al. Transcriptional landscape of the prenatal human brain. Nature 508, 199–206

(2014).

140. Thompson, C. L. et al. A high-resolution spatiotemporal atlas of gene expression of the

developing mouse brain. Neuron 83, 309–323 (2014).

141. Junker, J. P. et al. Genome-wide RNA tomography in the zebrafish embryo. Cell 159, 662–675

(2014).

142. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-

cell RNA-seq. Nature 509, 371–375 (2014).

143. Pollen, A. A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity

and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058

(2014).

144. Zechel, S., Zajac, P., Lönnerberg, P., Ibáñez, C. F. & Linnarsson, S. Topographical transcriptome

mapping of the mouse medial ganglionic eminence by spatially-resolved RNA-seq. Genome Biol.

15, 486 (2014).

145. Laszlo, A. H. et al. Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol.

32, 829–833 (2014).

146. Drndić, M. Sequencing with graphene pores. Nat. Nanotechnol. 9, 743 (2014).

147. Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363

(2014).

148. Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes.

Nat. Biotechnol. 21, 673–678 (2003).

149. Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931–

936 (2007).

150. Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of

universal sequence-dictated positioning. Genome Res. 18, 1051–1063 (2008).

151. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).

152. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724

(2009).

153. Yap, T. A., Gerlinger, M., Futreal, P. A., Pusztai, L. & Swanton, C. Intratumor heterogeneity:

seeing the wood for the trees. Sci. Transl. Med. 4, 127ps10 (2012).

154. Marx, V. Cancer genomes: discerning drivers from passengers. Nat. Methods 11, 375–379

(2014).

155. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).

156. Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene

expression monitoring. Science 286, 531–537 (1999).

157. Vogelstein, B. & Kinzler, K. W. Cancer genes and the pathways they control. Nat. Med. 10, 789–

799 (2004).

158. Gerlinger, M. et al. Cancer: Evolution Within a Lifetime. Annu. Rev. Genet. (2014).

doi:10.1146/annurev-genet-120213-092314

159. Swanton, C. Intratumor heterogeneity: evolution through space and time. Cancer Res. 72, 4875–

4882 (2012).

48

160. Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined

by multiregion sequencing. Nat. Genet. 46, 225–233 (2014).

161. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion

sequencing. N. Engl. J. Med. 366, 883–892 (2012).

162. Junttila, M. R. & de Sauvage, F. J. Influence of tumour micro-environment heterogeneity on

therapeutic response. Nature 501, 346–354 (2013).

163. Haffner, M. C. et al. Tracking the clonal origin of lethal prostate cancer. J. Clin. Invest. 123,

4918–4922 (2013).

164. Polzer, B. et al. Molecular profiling of single circulating tumor cells with diagnostic intention.

EMBO Mol. Med. 6, 1371–1386 (2014).

165. Diaz, L. A. et al. The molecular evolution of acquired resistance to targeted EGFR blockade in

colorectal cancers. Nature 486, 537–540 (2012).

166. Harris, T. J. R. & McCormick, F. The molecular pathology of cancer. Nat. Rev. Clin. Oncol. 7,

251–265 (2010).

167. Kiflemariam, S. et al. In situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic

point mutations and gene expression levels in prostate cancers. J. Pathol. 234, 253–261 (2014).

168. Grundberg, I. et al. In situ mutation detection and visualization of intratumor heterogeneity for

cancer research and diagnostics. Oncotarget 4, 2407–2418 (2013).

169. Weichert, W. et al. KRAS genotyping of paraffin-embedded colorectal cancer tissue in routine

diagnostics: comparison of methods and impact of histology. J. Mol. Diagn. 12, 35–42 (2010).

170. Herreros-Villanueva, M., Chen, C.-C., Yuan, S.-S. F., Liu, T.-C. & Er, T.-K. KRAS mutations:

analytical considerations. Clin. Chim. Acta. 431, 211–220 (2014).

171. Ross, J. S., Hatzis, C., Symmans, W. F., Pusztai, L. & Hortobágyi, G. N. Commercialized

multigene predictors of clinical outcome for breast cancer. The Oncologist 13, 477–493 (2008).

172. Raz, D. J. et al. A multigene assay is prognostic of survival in patients with early-stage lung

adenocarcinoma. Clin. Cancer Res. 14, 5565–5570 (2008).

173. Kawachi, M. H. et al. NCCN clinical practice guidelines in oncology: prostate cancer early

detection. J. Natl. Compr. Cancer Netw. 8, 240–262 (2010).

174. Humphrey, P. A. Gleason grading and prognostic factors in carcinoma of the prostate. Mod.

Pathol. 17, 292–306 (2004).

175. Schmidt, C. Gleason scoring system faces change and debate. J. Natl. Cancer Inst. 101, 622–629

(2009).

176. Schnitt, S. J. Classification and prognosis of invasive breast cancer: from morphology to

molecular taxonomy. Mod. Pathol. 23 Suppl 2, S60–64 (2010).

177. Prat, A. & Perou, C. M. Deconstructing the molecular portraits of breast cancer. Mol. Oncol. 5, 5–

23 (2011).

178. Sundström, M. et al. KRAS analysis in colorectal carcinoma: analytical aspects of

Pyrosequencing and allele-specific PCR in clinical practice. BMC Cancer 10, 660 (2010).

179. Siegelin, M. D. & Borczuk, A. C. Epidermal growth factor receptor mutations in lung

adenocarcinoma. Lab. Investig. 94, 129–137 (2014).

180. Sparano, J. A. & Paik, S. Development of the 21-gene assay and its application in clinical practice

and clinical trials. J. Clin. Oncol. 26, 721–728 (2008).

181. Arora, R. et al. Heterogeneity of Gleason grade in multifocal adenocarcinoma of the prostate.

Cancer 100, 2362–2366 (2004).

182. Algaba, F. & Montironi, R. Impact of prostate cancer multifocality on its biology and treatment.

J. Endourol. Endourol. Soc. 24, 799–804 (2010).

183. Tomlins, S. A. et al. Distinct classes of chromosomal rearrangements create oncogenic ETS gene

fusions in prostate cancer. Nature 448, 595–599 (2007).

184. Linja, M. J. et al. Amplification and overexpression of androgen receptor gene in hormone-

refractory prostate cancer. Cancer Res. 61, 3550–3555 (2001).

185. Kumar, A. et al. Exome sequencing identifies a spectrum of mutation frequencies in advanced

and lethal prostate cancers. Proc. Natl. Acad. Sci. U. S. A. 108, 17087–17092 (2011).

186. Jiang, N., Zhu, S., Chen, J., Niu, Y. & Zhou, L. A-Methylacyl-CoA Racemase (AMACR) and

Prostate-Cancer Risk: A Meta-Analysis of 4,385 Participants. PLoS ONE 8, e74386 (2013).

49

187. Goto, Y., Nonaka, I. & Horai, S. A mutation in the tRNA(Leu)(UUR) gene associated with the

MELAS subgroup of mitochondrial encephalomyopathies. Nature 348, 651–653 (1990).

188. Ghaznavi, F., Evans, A., Madabhushi, A. & Feldman, M. Digital imaging in pathology: whole-

slide imaging and beyond. Annu. Rev. Pathol. 8, 331–359 (2013).

189. Wilbur, D. C. Digital pathology: Get on board-the train is leaving the station. Cancer Cytopathol.

122, 791–795 (2014).

190. Landegren, U., Chen L., Wu D., Nong Y. & Gallant C., Localised rca-based amplification

method. Patent WO/2014/076209. 22 May 2014. Print.

191. Poguzhelskaya, E., Artamonov, D., Bolshakova, A., Vlasova, O. & Bezprozvanny, I. Simplified

method to perform CLARITY imaging. Mol. Neurodegener. 9, 19 (2014).

192. Ertürk, A. et al. Three-dimensional imaging of solvent-cleared organs using 3DISCO. Nat.

Protoc. 7, 1983–1995 (2012).

193. Piepenburg, O., Williams, C. H., Stemple, D. L. & Armes, N. A. DNA detection using

recombination proteins. PLoS Biol. 4, e204 (2006).

194. Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features

associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).

195. Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors

complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).

196. Langer, L. et al. Computer-aided diagnostics in digital pathology: automated evaluation of early-

phase pancreatic cancer in mice. Int. J. Comput. Assist. Radiol. Surg. (2014). doi:10.1007/s11548-

014-1122-9

in situ sequencing - diva portalsu.diva-portal.org/smash/get/diva2:768864/fulltext01.pdf · m. in...

Documents