in situ sequencing - diva portalsu.diva-portal.org/smash/get/diva2:768864/fulltext01.pdf · m. in...
TRANSCRIPT
In situ Sequencing Methods for spatially-resolved transcriptome analysis
Marco Mignardi
In situ Sequencing Methods for spatially-resolved transcriptome analysis
Marco Mignardi
Abstract It is well known that cells in tissues display a large heterogeneity in gene
expression due to differences in cell lineage origin and variation in the local
environment at different sites in the tissue, a heterogeneity that is difficult to
study by analyzing bulk RNA extracts from tissue. Recently, genome-wide
transcriptome analysis technologies have enabled the analysis of this variation
with single-cell resolution. In order to link the heterogeneity observed at
molecular level with the morphological context of tissues, new methods are
needed which achieve an additional level of information, such as spatial
resolution.
In this thesis I describe the development and application of padlock probes and
rolling circle amplification (RCA) as molecular tools for spatially-resolved
transcriptome analysis. Padlock probes allow in situ detection of individual
mRNA molecules with single nucleotide resolution, visualizing the molecular
information directly in the cell and tissue context. Detection of clinically
relevant point mutations in tumor samples is achieved by using padlock probes
in situ, allowing visualization of intra-tumor heterogeneity. To resolve more
complex gene expression patterns, we developed in situ sequencing of RCA
products combining padlock probes and next-generation sequencing methods.
We demonstrated the use of this new method by, for the first time, sequencing
short stretches of transcript molecules directly in cells and tissue. By using in
situ sequencing as read-out for multiplexed padlock probe assays, we measured
the expression of tens of genes in hundreds of thousands of cells, including point
mutations, fusions transcripts and gene expression level.
These molecular tools can complement genome-wide transcriptome analyses
adding spatial resolution to the molecular information. This level of resolution is
important for the understanding of many biological processes and potentially
relevant for the clinical management of cancer patients.
Keywords
Padlock probes; rolling circle amplification; in situ sequencing; spatially-resolved transcriptomics;
molecular diagnostics.
To my beloved family,
for every minute spent away from them.
“It's not that I'm so smart, it's just that I stay
with problems longer.”
Albert Einstein
Read more at
http://www.brainyquote.com/quotes/quotes/a/alb
erteins106192.html#cczl3Tfa4iYUq3kL.99
List of Publications
This thesis is based on the following papers:
Grundberg I, Kiflemariam S, Mignardi M, Imgenberg-Kreuz J, Edlund K,
Micke P, Sundström M, Sjöblom T, Botling J, Nilsson M. In situ mutation
detection and visualization of intratumor heterogeneity for cancer research and
diagnostics. Oncotarget 4, 2407-2418 (2013).
Ke R*, Mignardi M*, Pacureanu A, Svedlund J, Botling J, Wählby C, Nilsson
M. In situ sequencing for RNA analysis in preserved tissue and cells. Nat
Methods. 10, 857-860 (2013).
Kiflemariam S*, Mignardi M*, Ali MA, Bergh A, Nilsson M, Sjöblom T. In
situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic point
mutations and gene expression levels in prostate cancers. J Pathol. 234, 253-61
(2014).
Mignardi M*, Mezger A*, Larsson C and Nilsson M. Oligonucleotide gap-fill
ligation for mutation detection and sequencing in situ. Submitted manuscript.
* These authors contributed equally.
Reprints were made with permission from the respective publishers.
Related work by the author
Mignardi M, Nilsson M. Fourth-generation sequencing in the cell and the
clinic. Genome Med. 6, 31 (2014).
Weibrecht I*, Lundin E*, Kiflemariam S, Mignardi M, Grundberg I, Larsson
C, Koos B, Nilsson M*, Söderberg O*. In situ detection of individual mRNA
molecules and protein complexes or post-translational modifications using
padlock probes combined with the in situ proximity ligation assay. Nat Protoc.
8, 355-372 (2013).
* These authors contributed equally.
Contents
Introduction .................................................................................... 9
Cellular heterogeneity: origins and implications .................................. 10
Classic methods for the analysis of gene expression ............................ 12
Resolution, multiplexibility and throughput of transcriptome analysis .. 12
In vitro methods .......................................................................... 14
In situ methods ........................................................................... 17
Spatially-resolved transcriptome analysis ........................................... 21
In situ sequencing ........................................................................ 23
Application of in situ sequencing in tumor biology ............................... 25
In situ sequencing for research ...................................................... 25
Padlock probe assay in the clinic .................................................... 28
Present investigations ..................................................................... 30
In situ mutation detection and visualization of intratumor
heterogeneity for cancer research and diagnostics. ........................... 30
In situ sequencing for RNA analysis in preserved tissue and cells ........ 31
In situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic
point mutations and gene expression levels in prostate cancers. ......... 32
Oligonucleotide gap-fill ligation for mutation detection and sequencing
in situ. ....................................................................................... 33
Concluding remarks and future perspectives ...................................... 35
Populärvetenskaplig sammanfattning på svenska ................................ 37
Acknowledgments .......................................................................... 39
References .................................................................................... 42
Abbreviations
ACTB actin beta
AMACR alpha-methylacyl CoA racemase
AR androgen receptor
bDNA branched DNA
cDNA complementary DNA
CNV copy number variation
CRC colorectal cancer
CTC circulating tumor cell
DNAnb DNA nano-ball
dsDNA double-stranded DNA
EGFR epidermal growth factor receptor
ER estrogen receptor
ERG ETS-related gene
FACS fluorescence activated cell sorting
FFPE formalin-fixed paraffin embedded
FISH fluorescent in situ hybridization
FISSEQ Fluorescent In Situ SEQuencing
H&E hematoxylin and eosin
HCR hybridization chain reaction
HER2 human epidermal growth factor receptor 2
ISH in situ hybridization
KRAS kirsten rat sarcoma viral oncogene homolog
LCM laser-capture microdissection
LNA locked nucleic acid
LOH loss-of-heterozygosity
MELAS mitochondrial encephalomyopathy, lactic acidosis
and stroke-like episodes syndrome
mtDNA mitochondrial DNA
NGS next-generation sequencing
nt nucleotide
OLA oligonucleotide ligation assay
PCa prostate cancer
PNA peptide nucleic acid
qPCR quantitative PCR
RCA rolling circle amplification
RCP rolling circle product
RNA-seq RNA sequencing
RT-PCR reverse-transcription qPCR
SBH sequencing-by-hybridization
SBL sequencing-by-ligation
smFISH single-molecule FISH
SNP single nucleotide polymorphism
TMA tissue microarray
TP53 tumor protein 53
VIM vimentin
9
Introduction
The understanding of cellular processes at the molecular level in physiological
and pathological conditions is necessary for the future development of sensitive
diagnostics tools and, ultimately, of personalized treatments of diseases. Since
the first methods were described in 1977, nucleic acids sequencing technologies
have improved at an extraordinary pace. Thanks to the last generations of
sequencing methods, collectively called next-generation sequencing (NGS)
technologies, the full map of the genome and the transcriptome can be obtained
at an unprecedented speed and resolution. Many years after the introduction of
these methods, we are now attempting an even more accurate description of
complex biological processes that takes into account the dynamic expression of
the cellular genome through space and time. This next step will require a further
level of resolution.
My doctoral work focused on the development of the first method for
sequencing nucleic acids directly in cells and tissues. The peculiarity of in situ
sequencing is that it allows the direct observation and measurement of the
fundamental unit of the human transcriptome, a single nucleotide, at the
resolution of single molecules in individual cells. This new molecular method
has the potential to become a fundamental tool for tissue analysis providing
spatial context to the molecular information.
In this thesis, I will briefly describe the known biological causes and
consequences of the heterogeneity observed in human tissues and the most
common methods to study this heterogeneity at the transcriptome level. These
different methods have different magnitude of analysis and I will review their
performance in terms of throughput, cellular and molecular resolution and
spatial information. I will then describe in situ sequencing, and how it differs
from the described methods and how it can complement other molecular
methods for the analysis of tissues. Finally, I will highlight some of the
applications that could benefit from the use of the molecular tools described in
this work, both in basic research and in molecular diagnostics.
10
Cellular heterogeneity: Origins
and implications
The instructions necessary to build human cells and to execute their
physiological functions are enclosed in their genome, 23 pairs of
deoxyribonucleic acid (DNA) called chromosomes. These instructions are used
to synthetize hundreds of thousands of ribonucleic acid (RNA) molecules of
different size and functions which overall, constitute the cellular transcriptome.
Some of these RNA molecules are translated into proteins, the building blocks
of the cell and of the human body. A “mistake” along this flow of information
can affect the structure or the function of the cell, sometimes with dramatic
effects for the whole organism. A collection of cells forms a tissue, which, in
turn, is organized in organs that execute and coordinate the physiological
functions. Cells can be grouped in different “types” according to their
phenotype. The definition of cell type is much in debate1, however it implies that
large variation exists among cells of different tissues and within the same tissue.
In fact, despite sharing the same genome, cells of the human body appear to be
phenotypically extremely heterogeneous. So, how does this diversity arise?
Two sources of variation mainly contribute to generate such cell diversity:
genetic and non-genetic2. An evident manifestation of cellular heterogeneity
arising from genotypic variation among a cell population is in diseased tissues3–
5, particularly in tumor tissues in which genetic heterogeneity is promoted by the
high genome instability of cancer cells6–8
. However, new compelling evidence
indicates that genotypic variation is a widespread event among normal tissues9–
13. Non-genetic heterogeneity is the result of a differential expression of the
genome among a clonal population of cells, due to, for example, epigenetic
regulation14–16
, the cell cycle17,18
and the stochastic nature of gene expression19–
21. This wealth of diversity is reached through a complex scenario of interactions
between genome, transcriptome, proteome, metabolome and the environment
surrounding the cells22
and results in a differential expression of the cellular
genome in space and time23,24
.
Technologies with the necessary resolution to analyze heterogeneity in gene
expression are fundamental tools to understand biological processes such as
development25
, cell fate and differentiation26
, immunity27
and the maintenance
and differentiation of stem cells28
. Transcriptome heterogeneity is of particular
interest in tumor biology. Genotypic and phenotypic variation between tumors
(intertumor heterogeneity) and within the same tumor (intratumor heterogeneity)
is observed across all types of cancer29,30
. The capacity of tumor cells to
differentiate and proliferate through clonal expansion is what currently limits
cancer treatment31–34
. Investigating tumor heterogeneity in greater detail is
11
shedding light on the molecular mechanisms underlying tumor genesis,
progression and metastasis35,36
. This holds great potential for the identification
of novel therapeutic targets and of biomarkers for improved cancer
diagnostics37–40
.
The accurate description of tissue heterogeneity is not a trivial task. It requires
the adaption and tweaking of molecular technologies originally developed for
bulk analysis to achieve higher resolution as well as the introduction of new
methods. Some of these methods and their recent use for spatial transcriptomics
analysis will be discussed in the following chapters.
12
Classic methods for the analysis
of gene expression
Resolution, multiplexibility and throughput of transcriptome analysis
The ability to accurately describe and “measure” heterogeneity is first of all a
technological challenge. For instance, when comparing different tissues an
average obtained by a bulk measurement may represent the difference between
samples but it will fail to detect differences within the same tissue41
. Also,
differences within the same tissue may be localized, constituting regional
variations or be distributed randomly across the tissue. Regional distributions
may reflect important characteristics of the tissue42
, but not all the methods can
characterize such spatial information (Figure 1).
Figure 1. Analysis of tissue heterogeneity. The four samples exemplify four tissue sections composed
of three distinct cell types, represented by different colors, in different proportions. An average
measurement can distinguish sample 1 from the others, but samples 2, 3 and 4 will give the same
result in a bulk analysis although they differ in the tissue organization. To detect and identify this
regional variation, highlighted by the dashed lines, the method must yield spatial resolution.
Nowadays, there are many methods used for transcriptome analysis, all based on
nucleic acid hybridization and enzymatic reactions (i.e. polymerization or
ligation) or most often a combination of the two. There is great difference in the
depth and type of information that each method can provide. Beside their
technical performances (such as analytical sensitivity, specificity, limit of
detection or dynamic range as defined in Jenning et al.43
), there are other
parameters that become of critical importance when analyzing tissue samples.
These parameters are described below.
13
The analytical resolution defines the smallest unit that can be discriminated by
a specific method and can be used in relation to the sample or to the molecular
target. Cellular resolution refers to the sample and therefore subcellular
compartments are the smallest units. Molecular resolution refers to the
molecular target where the smallest unit is a single nucleotide of an individual
RNA molecule. Maximum resolution is achieved by methods that are able to
discriminate single nucleotides with subcellular localization. Some methods are
suitable for detection of transcript abundance from tissue lysates and therefore
the cellular resolution is minimal. Of note, resolution, as intended here, is
distinct from the limit of detection (LOD) which is the lowest amount of analyte
that is statistically distinguishable from background or a negative control43
.
Of consideration is the throughput of the assay, here referred to as the capacity
of a system to analyze a large number of samples, or rather the rate of samples
processed. Translated to tissue transcriptome analysis, throughput can refer to
the number of cells that are effectively analyzed from a single sample. Small
organs or samples of tissue removed from organs are sliced in thin tissue
sections, containing 104-10
6 cells or more. Even though several techniques exist
to extract single cells from tissue samples18,44–49
and to analyze the RNA content
from single cells50
, capture and analysis of many cells from a single section can
be technically challenging (especially if manual manipulation is required),
limited by the cost of the downstream molecular tests, or it can be achieved at
the loss of information (e.g. the localization in the case of bulk analysis, as
described below).
Multiplexibility refers to the amplitude of transcriptome that can be targeted by
a method. Some methods are limited to the analysis of a few transcripts at a
time. For example, fluorescence-based in situ methods are limited by the
resolvable fluorescent spectra. In general, sequencing, meant as an iterative
process of measurement, is a successful way to increase the multiplexibility of
RNA analysis to include the whole transcriptome.
Gene expression is a dynamic process in the cell, subjected to variations in the
four dimensions (three spatial and one temporal) across the tissues. The ability
of a method to capture spatial and temporal resolution is crucial to dissect
tissue heterogeneity. Spatial resolution differs from cellular resolution because it
does not refer to the section of tissue from which information is derived, but
rather to the ability to relate many sections to each other (Figure 1). For
instance, single cell resolution can be achieved but information of which cells
are connected to each other or grouped in the same region may not be known. In
situ techniques intrinsically have spatial resolution and its depth, down to
subcellular compartments, is dictated by the size and density of the signal
generated. In vitro assays maintain spatial resolution by extracting single cells
14
from tissue sections and recording (typically by imaging) the position of the
cells in the section before extraction. The throughput of these methods, however,
is typically low unless enzymatic dissociation is used, in which case spatial
resolution is lost. Time-resolved measurements are perhaps the most difficult to
achieve in tissue analysis. Time-resolved methods usually rely upon insertion of
genetically-engineered vectors for expression of fluorescence-tagged molecules
that can be detected by high-resolution imaging. Because these assays require
living samples, they are typical used for analysis of cell cultures51,52
.
Alternatively, RNA measurements can be performed at different time-points,
although this implies the analysis of different samples53
.
The ideal assay should be able to analyze the whole transcriptome with single
nucleotide resolution of every cell composing a tissue and at the same time
preserve the spatial information and relationship of all the cells analyzed. At
present, such an idealized assay does not exist and therefore a combination of
methods is necessary to explore biological questions. The most common
methods to study gene expression are described below and are for simplicity
grouped into two types: in vitro methods, where nucleic acids are extracted from
the cells and the analysis is performed typically in a test tube or a fluidic system;
or in situ methods, where the analysis is performed directly on preserved cells
and tissue samples.
In vitro methods
The gold standard for RNA analysis is reverse-transcription qPCR (RT-PCR), a
technique based on kinetic PCR, better known as quantitative PCR (qPCR) or
real-time PCR (the nomenclature used in this thesis follow the MIQE guidelines
proposed by Bustin et al.54
). qPCR is an adaption of the polymerase chain
reaction (PCR)55
, in which the accumulation of double-stranded DNA (dsDNA)
is monitored in real time during the course of the reaction56
. Fluorescent
reporters are used to intercalate or hybridize to the newly formed DNA products
and thus the increase in fluorescent intensity over time relates to the amount of
starting target material in the sample57
. In RT-PCR, RNA molecules are first
converted to cDNA by reverse transcription and subsequently amplified and
measured by qPCR58,59
. The main advantages of this method are the high
specificity and analytical sensitivity down to a single molecule of target RNA59
,
which makes it a suitable method to work with minimum amounts of starting
material, like a single cell58
. RT-PCR is also characterized by a large dynamic
range spanning several orders of magnitude. Relative quantification can be
performed comparing RNA extracted from different samples while absolute
quantification can be achieved with the use of internal controls to generate a
standard curve60
. Because based on fluorescent read-out and given the technical
limitations in multiplex PCR, RT-PCR has been limited to the analysis of few
15
targets at a time. However, recent micro- and nanofluidic systems for high
throughput qPCR analysis allow measurement of 100s of transcripts in 100s or
1000s of single cells47,61
. Furthermore, the development of digital PCR and the
implementation of real-time PCR measurement in picoliter droplets have the
potential to increase precision, sensitivity and reproducibility of this method
even further62,63
. Despite the technical feasibility of performing RT-PCR on
single cells and the availability of high throughput systems able to analyze
hundreds of cells at a time, it is not possible to identify the original position of
the individual cells on the tissue of origin. Therefore, spatial resolution is what
limits this assay when studying tissue heterogeneity. However, subpopulations
of cells from the same tissue can be reconstructed based on statistical clustering
algorithms that associate cells with similar expression patterns64
.
Before the development of high throughput gene expression analysis platforms,
global transcriptome analysis from tissue extracts was enabled by the microarray
technology. The first microarray technology was developed already in 1995 by
depositing cDNA libraries on 96-well microtiter plates followed by
hybridization of fluorescent-labelled specific reverse-transcribed mRNA from
Arabidopsis thaliana65
. The underlying mechanism has not substantially
changed since then, still relying on immobilization of template oligonucleotides
on a solid support and hybridization of target RNA or DNA molecules combined
with fluorescent read-out. Instead, sample preparation protocols have greatly
improved making this technology suitable for single cell analysis66–68
. Also,
array-based methods have expanded to the analysis of single nucleotide
polymorphisms (SNPs) and copy number variations (CNVs)69,70
. Compared to
RT-PCR thus, microarray technologies are highly multiplex assays able to
achieve single nucleotide and single cell resolution, although generally lacking
spatial information. Less sensitive and not as accurate in absolute quantification
of low abundant transcripts as qPCR-based assays, the main technical drawback
is represented by low specificity of probe hybridization (background level of
hybridization and cross-reactivity), which limits the dynamic range71
. Also,
array design requires prior knowledge of the target regions, making the
identification of splice variants or de novo transcriptome analysis
challenging72,73
. These limitations have been partially solved by high throughput
RNA sequencing technologies, which are now becoming the method of choice
for transcriptome analysis even though microarrays are still a cheaper and
reliable alternative in many applications73
.
The foundations for high throughput sequencing of DNA were laid already at
the end of the ’80s, and comprised carrying out sequencing outside gel capillary
systems and picturing DNA not as a string of nucleotides but as a sequence of
overlapping oligomers74,75
. The first method that followed this logic was the
sequencing-by-hybridization (SBH) method, described in 199376
. It took about a
decade, however, before the first commercial systems for high throughput
16
analysis of DNA were launched. NGS technologies differ substantially in the
sequencing chemistry used and in technical performances. They are
comprehensively reviewed in Metzker, 201077
. NGS technologies can be used to
sequence the transcriptome or parts of it, a method called RNA sequencing
(RNA-seq)78,79
. Except for the most recent sequencers, which are able to directly
read single transcripts, the RNA is extracted from the sample, cDNA is
synthetized using random primers or oligo(dT) primers (to capture only the pool
of poly(A) transcripts), fragmented and ligated to special adaptors before being
sequenced in one of the NGS platforms80
. Sequencing RNA molecules offers
several advantages over array-based methods. First of all, it is an untargeted
approach meaning that prior knowledge of the sequencing targets is not
necessary. Second, despite the general concordance between RNA-seq and
microarray experiments for transcriptome profiling, the former is superior in
sensitivity and has a broader dynamic range because it is less sensitive to
background fluorescence73,81
. Third, it intrinsically has single nucleotide
resolution and in most recent technologies even single molecule resolution can
be achieved82,83
. Therefore, most of the sequence variation at RNA level can be
detected by this method. Protocols for sample preparation and cDNA synthesis
have been adapted to work with just a few picograms of starting material, the
mRNA content of a single cell50,84
or even a fraction of it85
. Recent technical
improvements have enabled robust sequencing of full-length mRNA
molecules86–88
, the use of molecular barcodes to simultaneously sequence many
individual cells maintaining cellular resolution89,90
, and molecular tagging for
precise RNA quantification and tracking of transcription polarity91–93
.
One of the most widely used methods to isolate single cells after enzymatic
digestion of tissues, is fluorescence activated cell sorting (FACS). Described in
1969 for the first time94
, it is now a fundamental tool for single cell analysis.
FACS instruments are multiparametric flow cytometers equipped with lasers for
detection of fluorescent markers on the cells. Cells contained in droplets are
individually read by the lasers, classified and counted based on fluorescence
intensity and morphological parameters and sorted based on a defined gating
strategy in plates or tubes. Although mostly used for expression analysis of
surface markers labelled with fluorescent-antibodies, RNA analysis by FACS
and flow cytometry has also been demonstrated95,96
. The great advantages of
FACS analysis are the high flow rate, which allows counting of thousands of
cells per second, and the simplicity of the protocols. On the other hand the
fluorescent-based read-out limits the multiplexibility of the method to few
markers per cell (up to 12 in some cases) and the sensitivity is quite low
compared to other techniques described.
17
In situ methods
Spatial resolution is an intrinsic property of in situ techniques because the
molecular reactions are performed directly on the tissue section and the spatial
information is readily visible. The first molecular staining was published in 1969
by Gall and Pardue97
who for the first time performed in situ hybridization (ISH)
of nucleic acids using radiolabelled probes. Since then, radioactive read-out has
been replaced with fluorescent or chromogenic substrates and a variety of
different molecular techniques have been developed to allow detection,
visualization and counting of RNA in situ. Current in situ methods generally
share three steps: fixation of the nucleic acids and permeabilization of the
sample; detection of the target RNA and signal amplification to achieve the
highest signal-to-noise ratio; and imaging and analysis of the stained sample.
Based on the modality of target recognition, the methods can be divided into two
main approaches: hybridization-based and amplification-based systems.
Fluorescent in situ hybridization (FISH) is the fluorescence version of the more
general ISH technique and is certainly the most well known and established of
all the in situ assays. The first version of this technique, described in 198098
made use of 3’-end fluorescently-labelled RNA probes to detect specific DNA
sequences in cells. mRNA detection came soon after by combining the use of
biotinylated oligonucleotides and fluorescent-antibodies for detection99
. The first
version of FISH has limited molecular resolution allowing relative
quantification but not absolute counting. Also, because of high fluorescent
background, the method can detect only highly abundant transcripts and
multiplexibility is limited by the number of resolvable fluorophores, typically 3
or 4. These limitations, except for the latter, have been solved by the second
generation of FISH probes, collectively called single-molecule FISH (smFISH).
The original kb-long probes are replaced by multi-labelled 50 bp-long probes
that hybridize along the target sequence, generating a diffraction-limited spot
allowing single molecule quantification in situ100
(Figure 2). Other
improvements in probe design and the use of low-noise charge-coupled devices
(CCD) cameras contribute to the high resolution of smFISH. Detection of five
genes at a time has been demonstrated by this technique in mammalian cells and
applied to whole tissue sections101,102
. However, the variation in fluorescence
intensity among detected transcripts affects the sensitivity of the method. To
solve this issue, a second more recent design comprises the use of approximately
50 single-labelled probes each only 20 nucleotide-long103
. The key advantages
of this more optimized protocol are the high sensitivity and improved specificity
which allows robust detection and quantification of transcripts, including
products of gene fusions, in a variety of tissue samples104–106
. Because of the
micrometer or even sub-micrometer size of the signals, smFISH inherently
reaches single molecule detection at cellular or subcellular resolution. Single
18
nucleotide resolution has been technically demonstrated, but application to
detection of clinically relevant mutations or analysis of tissues is currently not
feasible with these methods107,108
. Also the issue of the low multiplexibility has
been addressed by employing sequential hybridization or coupling combinatorial
fluorescent labelling with high resolution imaging techniques109–111
. High
multiplexibility can be achieved, but practical limitations and questions on the
scalability when using such methods for tissue analysis still remain.
The introduction of probes containing modified nucleotides confers higher
efficiency and specificity to hybridization-based assays, enabling detection of
short RNA targets. Two technologies, locked nucleic acids (LNA) and peptide
nucleic acids (PNA), are now broadly used, particularly for detection of miRNA
or in combination with other classic methods112–114
.
Amplification of the starting target material is a common strategy in nearly all
molecular methods and for in situ analysis as well. Exponential amplification in
situ of DNA and cDNA have been demonstrated previously115,116
. However, this
approach resulted in lower specificity and diffusion of the amplified target117,118
.
An alternative approach is to amplify the signal after hybridization of the target
RNA molecule. The general idea behind it is to increase the local density of
fluorophores or reporter molecules to increase signal-to-noise ratio and
consequently the sensitivity of the read-out. Branched DNA (bDNA) technology
is one way to obtain signal amplification of hybridization-based methods in
which a number of self-complementary probes build a DNA “tree” upon
recognition of the target molecule, making available hundreds of target
sequences for reporter probes, the “branches”. Sensitivity of detection is greatly
enhanced and even specificity is increased by the requirement of more than one
hybridization event to occur119–121
. Multiplex read-out using bDNA has been
proposed but not yet shown. Another amplification method based on
hybridization is hybridization chain reaction (HCR) in which an initiator probe
hybridizes to the RNA target and triggers self-assembly of florescent RNA
hairpin reporters122
. With this method five transcripts can be visualized
simultaneously as demonstrated in zebrafish embryos, but HCR requires long
incubation times and does not achieve single molecule resolution.
A well exploited strategy for signal amplification, both in situ and in vitro, is
DNA linear amplification via rolling circle amplification (RCA) by ϕ29 DNA
polymerase123
. RCA consists of an isothermal amplification of a circular single-
stranded DNA molecule: when a primer with an available 3’-end hybridizes to
the circle, ϕ29 DNA polymerase can extend the primer making a long strand of
complementary DNA. A 100 nt-long circle is replicated several hundred times
per hour thanks to the high processivity of this polymerase (1x103 nt/min)
124.
Some of the characteristics of this mechanism make RCA an exquisite signal
19
amplification system, especially for in situ applications. First of all, the speed
and ease of the reaction, which does not depend upon temperature cycling.
Second, RCA is a localized amplification that produces a tether of DNA, which
spontaneously collapses into a micrometer-sized bundle of DNA (a rolling circle
product, RCP). If hybridized by fluorescent-labelled probes, a single RCP
appears as an intense spot-like signal easy to identify over the fluorescence
background of the tissue with a conventional epi-fluorescent microscope.
Finally, the RCA reaction generates a clonally-amplified substrate suitable for
sequencing by NGS technologies125
.
RCA is exploited by another in situ RNA analysis technology, the padlock
probe, on which all the work presented in this thesis is based. Described for the
first time by Nilsson et al. in 1994126
as a variant of the oligonucleotide ligation
assay (OLA)127
, a padlock probe is a single oligonucleotide whose end-
sequences are joined together by a DNA ligase upon hybridization on a
complementary target, resulting in a closed single-stranded DNA circle.
Important features of this mechanism are: i) the specificity given by the high
fidelity of DNA ligases in discriminating mismatches at the ligation site126,128
and, because of this, ii) the method achieves single nucleotide resolution; iii)
tolerance to washes because after ligation the padlock probe is topologically
locked to the target126
; and iv) the single molecule resolution, achieved by using
RCA to amplify and detect ligated padlock probes129
. Because of these
characteristics, padlock probes can be used for genotyping of mitochondrial
DNA (mtDNA) and single mRNA molecules in human cells and tissues130,131
.
For mRNA detection, cDNA is first synthesized by in situ reverse-transcription
of specific transcripts and then made single stranded by RNaseH treatment. A
padlock probe hybridizes to the cDNA target and is circularized by a DNA
ligase leaving a 3’-end overhang on the cDNA strand that is digested by ϕ29
DNA polymerase before priming RCA of the padlock probe (a mechanism
termed target-primed RCA)130
. The RCP is stained by hybridization of hundreds
of fluorescently-labelled oligonucleotides which generates a bright dot clearly
detectable under a fluorescent microscope. Mismatched padlock probes cannot
be circularized and amplified via RCA and therefore no signal is generated131
.
The method is able to detect and quantify 30% of actin beta (ACTB) transcripts
in single cells although sensitivity is lower in tissue sections or when multiple
padlock probes are combined131
in which case FISH, in comparison, proves to
be superior. Another disadvantage is the overall complexity due to the many
enzymatic steps involved and incubations at different temperatures which makes
it difficult to implement the protocol in the existing automated slide stainers.
Originally limited to four distinguishable probes per assay, multiplexibility has
been increased by sequencing RCPs in situ132
.
20
Figure 2. Single-molecule in situ techniques. In smFISH (I) multi-labelled probes (in gray and red)
hybridize to the target mRNA molecule (in black) generating a diffraction-limited spot visible under a
fluorescence microscope. In a newer version of the method, smFISH (II) multi-labelled reporter probes
are replaced by shorter single-labelled oligonucleotides. In the bDNA technique, oligonucleotides are
added to target a specific RNA region and to build a DNA “tree” (in blue) to which reporter probes
can hybridize. In the padlock probe assay, a series of enzymatic reactions are carried out to synthesize
cDNA from a specific mRNA. Padlock probes (in green) are hybridized and ligated and the circular
padlock probe is amplified by RCA. Fluorescent probes hybridize to the RCA product generating a
diffraction-limited spot. The images of the cell nuclei and fluorescent signals are taken, in order, from
ref. 102, 103, 121 and 131.
21
Spatially-resolved transcriptome
analysis
The most comprehensive description of gene expression heterogeneity in a
tissue sample requires the use of spatially-resolved transcriptomics methods in
which transcriptome analysis is complemented by spatial information. In other
words, gene expression analysis is pushed to include all four parameters
discussed in the previous chapter: analytical resolution, throughput,
multiplexibility and spatial resolution. At present, there is no single method
capable of acquiring the full depth of information in all parameters. Instead,
current approaches combine techniques in order to maximize the amount of
molecular information retrieved from whole tissues and organs (Figure 3).
Figure 3. Comparison of throughput and multiplexibility of the methods used for tissue transcriptome
analysis. The ideal method, in black, maximizes performances for both parameters, maintaining spatial
resolution. In situ-based techniques, in green, generally have a lower multiplexibility but can
simultaneously investigate a large number of cells or the entire tissue sample. Array- and sequencing-
based methods (respectively in blue and yellow) are limited in throughput but can analyze the whole
transcriptome. High-throughput RT-PCR, in red, is in between. The numbers in brackets indicate the
reference number to which each dot refers. The symbol on the in vitro methods indicates how the
single cells have been extracted. Explicitly, whether or not the method maintains spatial resolution as
described in the text. & = enzymatic dissociation; # = laser-capture microdissection; ¤ = mechanical
homogenization; £ = performed on cell line.
22
Spatially-resolved transcriptome analysis is not a new approach. A number of
large projects are on-going with the aim of fine molecular mapping of whole
organs with particular focus on the brain133
. This molecular characterization has
been long done by a combination of systematic sampling, called voxelation, and
gene expression analysis methods of lower cellular resolution than the most
recent technologies can offer134–140
. However, transcriptome-wide, single cell
and single nucleotide resolution will be required to investigate other
mechanisms like allele specific expression or RNA editing. With the constant
improvement of the protocols for sample preparation, it is plausible to envisage
that RNA-seq will be the method of choice for this type of analysis.
Spatial localization of single cell transcriptomes back in the organ of origin is
one of the few technical limitations left for tissue analysis by RNA-seq. RNA
tomography (Tomo-seq) partially circumvents this limitation by cryosectioning
18 µm thick sections of a whole zebrafish embryo and subjecting each one to
RNA-seq141
. Gene expression patterns are reconstructed along the three main
body axes but the method does not achieve cellular resolution. The improvement
of methods to isolate single cells from tissue sections could solve this problem:
hundreds to thousands of single cells can be isolated from normal tissues and
cell hierarchies can be reconstructed by different means of cluster analysis of the
transcriptome obtained from each cell by single-cell RNA-seq90,142,143
. In a very
recent work lead by Sten Linnarsson, high resolution topographical mapping of
genome-wide gene expression has been demonstrated for the first time. By using
laser-capture microdissection (LCM), a small area of a mouse brain section was
dissected in small “voxels”, 50x50x50 µm3 in size, and subjected to RNA-seq
144.
By this method, near single cellular resolution is obtained preserving spatial
information, but only a small region of the tissue was sampled. Extending this
method to the whole tissue, in tens of consecutive sections, is practically very
difficult and not cost-effective with the current technology. Once the sequencing
costs will decrease further, high resolution whole-transcriptome tissue analysis
will be limited only by the number of available reads per sequencing run. Future
sequencing technologies based on single molecule sequencing through
nanopores may be a solution to this issue145,146
. A new approach has been
proposed by Lundeberg and colleagues, in which a tissue section is deposited
onto a chip array containing high-density oligo(dT). The oligonucleotides are
organized in squares of about the size of a single cell and barcoded in order to
distinguish each individual square. After imaging the tissue, the cells are
permeabilized and the poly(A) fraction of the transcriptome is captured by the
barcoded oligonucleotides. After sequencing the captured transcripts, the spatial
location on the chip can be inferred by the barcodes and, because of limited
diffusion between cells, assigned to a specific cell of the tissue. This approach,
not yet published, has the potential to achieve genome-wide information
maintaining spatial and cellular resolution.
23
In situ sequencing
An alternative to the extraction of single cells from tissues is sequencing of
transcripts directly on tissue sections132,147
. The padlock probe assay is able to
sequence the minimum unit of a transcript, a single nucleotide. Expanding the
method to “read” a stretch of nucleotides requires two additional steps: first, a
longer target sequence needs to be clonally amplified via RCA and, second, a
sequencing chemistry able to sequence such material in situ. Solutions for both
of these steps already existed. The gap-fill reaction has been described several
times in literature as a method to capture and circularize DNA in solution-phase
assays129,148,149
. The same approach is used in situ although it shows lower
efficiency when performed directly in tissues. Sequencing of RCPs (also called
DNA nano-balls, DNAnb) by sequencing-by-ligation (SBL) chemistry has been
described by Drmanac et al. and is the sequencing technology used by the
private company Complete Genomics125
. DNA nanoballs technology is used to
sequence the entire human genome and comprises fragmentation and capture of
genomic DNA, clonal amplification via RCA and spotting of the RCPs on a
solid surface where the subsequent sequencing reactions and imaging are carried
out. In situ sequencing makes use of the same principles except for the fact that
the surface on which the RCPs are bound is a cell instead of an array (Figure 4).
The same unchain SBL chemistry can be used to sequence RCPs generated in
situ, with a minimal read length of six bases. However, there are three key
differences between the two sequencing methods: first, the size of the
sequencing substrate is <300 nm for DNAnb and about 1 µm for RCPs meaning
that, when imaging, more pixels are occupied by a single RCP; second,
Complete Genomics makes use of a proprietary technology (self-assembling
DNB Arrays) to prepare patterned arrays, spotting ultra-high density of single
DNAnb with 90% efficiency. Instead, because of the nature of the target, in situ
sequencing generates a sort of random arrays, which in fact limits the density of
the RCPs per area. For comparison, Complete Genomics’ first array reaches a
density of about 530.000 DNAnb/mm2 (1 billion DNAnb in 25x75 mm area)
while in situ sequencing estimates maximum 270.000 RCPs/mm2. The third
difference is the higher error rate of in situ sequencing (1.0% compared to
0.001% for Complete Genomics)125,132
. The read length of in situ sequencing, six
bases, is too short to discriminate among the whole set of genes but it offers the
possibility to read 4096 (46) barcodes. Gene-specific padlock probes can be
tagged with a molecular barcode increasing multiplexibility of the assay.
However, this is a targeted approach and given the current size of RCPs and the
imaging system used, only a density of about 100 RCPs per cell can be
achieved.
Some of these limitations have been addressed by a method described by George
Church and colleagues called Fluorescent In Situ SEQuencing (FISSEQ)147
. In
24
this approach, RCPs are generated in the cells after self-circularization of
randomly synthesized cDNA molecules. Sequencing is generated in an
untargeted fashion and sequenced by Sequencing by Oligonucleotide Ligation
and Detection (SOLiD) chemistry150
achieving 27 bases read length. This allows
mapping of most of the sequenced RCPs to a single transcript. The use of
confocal imaging and partition sequencing consents to obtain 4x higher read
density (about 400 RCPs/cell) although more than 50% mapping to rRNA.
Figure 4. In situ sequencing. a) In situ synthesis of cDNA is carried out using random or specific
primers followed by RNaseH treatment of the original mRNA molecule (b). Two different strategies
can be used. c) For sequencing of native mRNA molecules, a padlock gap probe is hybridized to the
cDNA and the gap region, in red, is filled by DNA polymerization. d) Alternatively, a multiplexed
padlock probe assay can be used instead, utilizing a classic padlock probe with a gene-specific
molecular barcode, in red, in the backbone. e) In situ RCA of the circularized padlock probes
generates a RCA product containing the clonally-amplified target region or the molecular barcode
which is subjected to several cycles of sequencing-by-ligation with an imaging step at every cycle (f)
and the images are analyzed for base-calling at every cycle (g). Five RCPs are represented in the
enlargements in (f) and the corresponding base-calling is showed in (g). Reprinted from ref. 132.
25
Application of in situ sequencing
in tumor biology Next generation sequencing technologies have been widely applied in tumor
biology for molecular characterization of many cancer tissues. At least two
observations seem to be widespread in nearly all cancer types. The first is the
presence of a large spectrum of different somatic mutations including single
nucleotide variations as well as structural variations151–154
. The second is that the
landscape of mutations differs among patients affected by the same tumor
(intertumor heterogeneity) and even within the same tumor tissue (intratumor
heterogeneity)34,153,155
. This heterogeneity is not confined to mutations but is also
observed at the level of expression of wild-type genes156
, a result of cellular
pathways being altered by the presence of mutations and conferring the
neoplastic phenotype to the tumor cells157
. These two observations have
extremely important implications for the study, diagnosis and treatment of
cancers.
Because cancer is a disease characterized by a high degree of heterogeneity in
which the spatial context and the distribution of this heterogeneity constitute
important information, in situ sequencing is a novel tool that can aid researchers
and clinicians in the molecular analysis of tumor samples and its application in
research and diagnostics will be discussed below.
In situ sequencing for research
Recent literature depicts tumor growth like a tree where a number of mutations
constitute a main dynamic population of neoplastic cells, the trunk, from which
a number of sub-populations branches out as they acquire new
mutations153,158,159
. This explains the origin of the observed heterogeneity and
may explain the failure of current pharmacological treatments. Multiregional
sampling and sequencing of different tumor areas is a successful approach to
study this occurrence6,160,161
. On a sample of similar size, the regional
distribution of selected biomarkers of interest can be visualized and quantified in
situ by the multiplex padlock probe assay with cellular resolution and with no
need for sub-sampling to enrich for tumor content, which carries the risk of
introducing biases by random selection of mutant clones (Figure 5). Beside the
neoplastic cells, the role of tumor microenvironment and of the stromal tissue
surrounding the tumor is increasingly a subject of research because of their
potential contribution to the tumor development162
. Different tissue components
(cancer associated fibroblasts, immune cells, the vasculature and others) can be
visualized in situ by detection of tissue-specific transcripts together with the
26
molecular composition of local tumor clones. Specific interactions could be
found by analyzing a large collection of samples which is technically feasible by
in situ sequencing. Another fascinating application is the ability to obtain
specific molecular expression or mutational profiles from circulating tumor cells
(CTCs), cell-free circulating DNA or clones from metastatic tumor sites and
trace them back to the primary tumor site to identify potential markers
conferring resistance to therapy or aggressive phenotype to the cancer163–165
.
These expression profiles may be confined to a small fraction of tumor cells
present in the primary site and may be missed by bulk RNA sequencing. In situ
sequencing data may instead be able to resolve and identify specific patterns
even if these are localized in one or more regions of the tissue sample.
In all these potential applications, padlock probe technology and in situ
sequencing are used to detect or profile known markers, identified before by
other methods. In a reverse approach instead, in situ sequencing can be used to
define sub-clonal populations of cells that, on consecutive tissue material, can be
selected by e.g. laser capture microdissection for molecular characterization at
deeper resolution.
27
Figure 5. Spatially-resolved gene expression analysis by in situ sequencing. Morphological staining of
a tissue sample is combined with molecular staining obtained with multiple gene-specific padlock
probes. Single or multiple markers can be used to define specific tissue regions, e.g. tumor and stroma
compartments. In addition, gene expression patterns of specific regions can be acquired and linked to
the morphology of the tissue.
28
Padlock probe assay in the clinic
Molecular biology is increasingly entering the clinical laboratory and molecular
pathology is setting the stage for the future of personalized clinical management
of cancer patients166
. The more molecular markers will be validated and enter
the clinic, the more there will be a need for methods that are able to analyze
samples at the desired level of resolution, including spatial resolution. These
methods should meet the technical standards for diagnostics and have the least
possible impact on the existing laboratories in terms of infrastructure and
management.
Bright-field microscopy still plays a central role in the diagnostic routine.
Visualizing the mutational status of the sample, as an extra level of information,
is convenient and presents several advantages over other methods. Padlock
probe technology achieves single nucleotide resolution and is the only in situ
technique able to robustly and specifically detect point mutations. Also, it can
quantify gene expression and over-expression of tumor markers167
providing
spatially-resolved molecular profiles directly overlaid to the tumor tissue
histology (Figure 5). Importantly, padlock probes can be applied to the most
commonly used tissue sample preparations like formalin-fixed paraffin
embedded (FFPE) or fresh-frozen tissue sections, tissue microarrays (TMAs)
and tumor touch imprints168
. Finally, the assay has a relatively short turnaround
time and, for mutation detection, requires relatively simple data interpretation.
Other in vitro assays have, in principle, similar characteristics. So, what is the
potential advantage of using an in situ analysis?
Mutational analysis of the kirsten rat sarcoma viral oncogene homolog (KRAS)
gene in colorectal cancer (CRC) patients is recommended before anti-epidermal
growth factor receptor (anti-EGFR) because treatment is ineffective in the
presence of mutant forms of this gene169,170
. The most widely used sample, in
CRC diagnostics, is FFPE tissue and a number of different assays are used for
the analysis of DNA extracted from the tissue. The sensitivity of these assays
has been evaluated between 1 and 20% of the tumor content (homozygous tumor
cells/wild-type)170
. The performance of these assays is highly affected by the
tumor content of the samples, which therefore has to be subjected to
histopathological examination and manually enrichment by microdissection in
case of low tumor content (usually when below 10% of tumor cells)169
. This
adds an extra step with associated additional risks for errors and costs. The
padlock probe assay has been shown to have comparable sensitivity, down to
1% mutant cells and high specificity although this has to be further evaluated in
a clinical trial168
. The advantage of such assay is that no extra sample is required
because histopathological and molecular assessment can be done on the same
sample. This also means that spatial information is preserved and can be directly
29
related to the tissue morphology identifying for example different clones.
Conversely to in vitro measurements, the relative expression of different tumor
regions can be quantified and in the future can provide potential clinical
information.
Recently, a number of multigene expression profile tests that score the risk of
tumor relapse and prognosis have entered the clinical practice166,171,172
. These
tests are bulk analyses performed on the RNA extracted from tumor samples. By
in situ sequencing, the padlock probe assay can be multiplexed to a desired
number of genes, including point mutations and other structural variations, to
visualize expression profiles directly on the tissue samples. A digitalized and
quantifiable measurement of the molecular state of the tissue should result in a
better classification and grading of the tumor, supporting or replacing the
traditional morphological examination. For instance, the Gleason score for
grading prostate cancer, a particularly heterogeneous type of cancer, is based on
the histological evaluation of a number, usually 12, of needle biopsies173
. Many
lesions are often present in different biopsies and with different levels of
differentiation (scored 1-5)174
. The grade is assigned based on the areas that
make up most of the cancer. Two limitations of this approach that have been
amply discussed are the subjective interpretation by the pathologist and the fact
that regions with higher grade (poorly differentiated) but under-represented are
not taken into account175
. Molecular staining of the sample by multiple padlock
probes can, in principle, identify and quantify more accurately the composition
of the tumor samples, returning an objective score based upon digital
information. This approach is not limited to prostate but can be applied to all
tumors for which an association between known markers and the cancer
phenotype exists. For instance, the molecular classification of breast cancers is
based on the expression level of estrogen receptor (ER), progesterone receptor
(PR) and human epidermal growth factor receptor 2 (HER2/neu or HER2) and
other markers (like the marker of proliferation Ki67, cytokeratin CK5/6 and
epidermal growth factor receptor EGFR)166,176,177
. This molecular classification,
together with other factors, is used to decide on the clinical management of
breast cancer patients. All these and other markers can be implemented in a
multiplex padlock probe-based assay, performed on the same histological
sample and, in some years, in automatic fashion. Other tumors are already
screened for different mutations or molecular signatures of clinical utility and
their number is likely to increase; therefore, methods that can quickly implement
their use in the clinic, as for example the padlock probe assay, are a precious
resource.
30
Present investigations
In situ mutation detection and visualization of
intratumor heterogeneity for cancer research and diagnostics.
In this work we applied padlock probes and RCA for in situ detection of
clinically relevant point mutations in solid tumors.
Before starting EGFR antibody therapy, patients with CRC are tested for
activating mutations in KRAS, which make the therapy ineffective169
. Seven of
these point mutations are located in codon 12 and 13 representing almost 95% of
all mutations found in KRAS in these patients178
. The same mutations are also
found in lung adenocarcinoma, associated with poor prognosis, while some
EGFR mutations in presence of wild-type KRAS indicate that the tumor is more
sensitive to EGFR inhibitors179
. Although many molecular tests to detect these
variants already exist, new methods able to additionally capture the distribution
of these mutations in different regions of the tissue are needed.
We designed padlock probes to target a total of 14 different point mutations and
the corresponding wild-type alleles in KRAS, EGFR and tumor protein 53
(TP53) genes in colorectal and lung cancers. In this assay, gene specific primers
are used to synthetize cDNA to which complementary padlock probes hybridize
and are ligated. Circularized padlock probes are amplified by in situ RCA and
detected by hybridization of fluorescent-labelled complementary
oligonucleotides. Up to four colors can be easily distinguished. In total, 84
different samples were analyzed, using different combinations of padlock probes
to target different mutations. Of the 84 tumors, 44 represented prospective
samples of unknown molecular KRAS status and were subjected to multiplex
analysis of the seven most frequent KRAS mutations. For all samples tested by
the padlock in situ assay, DNA was extracted from the same tissues and
analyzed by pyrosequencing, one of the standard methods used in clinical
laboratories to detect the presence of KRAS mutations. For all samples tested,
the result of the in situ padlock assay was confirmed by DNA pyrosequencing
demonstrating 100% concordance between these two methods. However, more
information can be obtained by the in situ assay. For example, on the same
tissue section we were able to visualize heterogeneity of KRAS mutant
expression, which related to tumor progression and histological heterogeneity.
The same heterogeneity of expression was observed for an EGFR mutation that
differed in distinctive histological regions. By targeting several mutations in
situ, it was possible to distinguish different patterns of mutations in different
tissue compartments in lung cancer cases as well as loss-of-heterozygosity
31
(LOH). All this information is of potential clinical relevance, for instance for
monitoring tumor progression or prediction of response to therapy, but is lost by
methods that rely on bulk analysis of DNA extracted from the tumor tissue
without retaining spatial context.
To demonstrate the applicability of the method in a clinical setting, the padlock
probe assay was performed in a variety of different sample types, including
FFPE material, which is the most commonly used specimen in diagnostics and
tissue microarrays (TMAs), useful for high-throughput screening of point
mutations. Many of the samples included in the study presented a tumor cell
fraction lower than 10% and all were correctly genotyped with no need for
enrichment of the tumor fraction. Finally, by using cytological preparations
made by touch imprints from the fresh tumor, the KRAS mutation status was
obtained on the day of sample arrival. In conclusion the result of this work
shows the potential and applicability of the in situ padlock probe and RCA assay
for research and diagnostics.
In situ sequencing for RNA analysis in preserved tissue
and cells
In this work we demonstrated, for the first time in situ sequencing of single
mRNA molecules, providing a highly multiplex read-out to the padlock probe
assay.
The light spectrum resolved by an epi-fluorescent microscope limits the padlock
probe technology to detect only four targets at a time. Although this level of
multiplexibility is suitable for some applications, for example detection of
specific point mutations, a higher level of multiplexibility is needed to resolve
more complex patterns of gene expression or to allow for screening of many
markers simultaneously. Next generation sequencing or high-multiplex PCR can
be used for genome-wide gene expression analysis with cellular resolution.
Laser-assisted cell extraction from tissue section preserves the geographical
information but is practically limited in throughput to hundreds of cells and may
not represent differential gene expression in distinct regions of the tissue. Cells
isolated by enzymatic dissociation and FACS analysis, instead, lack the spatial
information. Sequencing RCPs generated in the tissue context allows multiplex
detection of mRNA molecules for direct gene expression analysis preserving
morphological information of the sample.
We used SBL chemistry125
to sequence the RCPs generated by clonal
amplification via RCA of padlock probes, in breast cancer fresh-frozen tissue
sections. First, we demonstrated sequencing of native transcripts by using
padlock gap probes and gap-fill polymerization to capture a short fraction, (four
32
nucleotides) of the retro-transcribed mRNA. After circularization and
amplification of the padlock gap probe, the targeted four nucleotide region was
sequenced by repeated cycles of SBL and imaging. In this way we were able to
distinguish HER2 and ACTB expression on a breast cancer section, and showed
that HER2 expression was confined to the tumor region. Because a four-
nucleotides read length is too short to uniquely distinguish a high number of
genes, we used gene-specific molecular barcodes inserted in the padlock probe
sequence to distinguish up to 256 (44) genes. We designed padlock probes to
target 39 transcripts in one ER-negative and two ER-positive breast cancer tissue
sections, including a panel of 21 genes used to predict distant tumor recurrence
in breast cancer patients180
. After random cDNA synthesis, we applied the
padlock probes and sequenced the RCPs, validating expression data from 31
genes. The data from the three samples corresponded to the known ER status.
We used HER2 and vimentin (VIM) expression as a marker to differentiate
between tumor and stromal compartments and used hematoxylin and eosin
(H&E) staining of the same section to confirm the histological localization.
When comparing the expression of genes detected in the HER2 or VIM
compartments from the ER-negative sample with published RNA-seq data, the
highest correlation was found with an ER-negative breast-cancer derived cell
line and normal breast tissue respectively. Later comparison with RNA-seq data
obtained from the same ER-negative tumor sample showed even stronger
correlation (unpublished data). Overall, this work demonstrates the feasibility of
using next generation sequencing to visualize gene expression profiles directly
in cells and tissue sections.
In situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic point mutations and gene expression levels in prostate cancers.
In this work we demonstrated in situ sequencing for simultaneous detection of
point mutations, gene expression and gene fusions and for visualization of intra
and inter-tumor heterogeneity in prostate cancer (PCa).
Prostate cancers are highly heterogeneous tumors, very often characterized by
multifocality which refers to the presence of two tumors separated by normal
tissue181,182
. The Gleason grading system assigns an overall score to the prostate
tumor based on morphological examination. Because of heterogeneity between
different tumor foci, the score is made by taking into account the two most
represented grades. Thus, the score of individual tumors does not always
represent the overall score181
. Different mutations are frequently found in PCa.
Fusion of androgen-regulated transmembrane protease serine 2 (TMPRSS2) with
the ETS-related gene (ERG) transcription factor is found in about 50% of
PCa183
. Amplification of androgen receptor (AR) gene is present in about 30% of
33
the cases184
, while a point mutation is present in TP53 in 10-35% of the cases185
.
Visualization and measurement of heterogeneity at molecular level may improve
grading of PCa and development of therapeutic strategies.
We designed barcoded padlock probes to target the five most frequent
TMPRSS2-ERG fusion variants described in literature: T1/E5, T1/E4, T1/E5,
T2/E4 and T2/E5 (numbers indicate in what exon of the two genes the fusion
junction is located). Furthermore, we designed barcoded padlock probes to
detect AR, a specific point mutation in TP53 gene and alpha-methylacyl CoA
racemase (AMACR), also found over-expressed in PCa186
. The pool of probes
and the sequencing of barcodes were tested on VCaP cells carrying T1/E4 fusion
transcripts, AR amplification and a specific TP53 point mutation and LNCaP
cells as negative control for these mutations. We specifically identified the
mutations and increased expression levels of AR in VCaP cells compared to
LNCaP cells. We then tested padlock probes for detection of TMPRSS2-ERG
fusion transcripts in 13 FFPE tumor tissues that had been previously evaluated
by immunohistochemistry staining. We scored seven cases positive for fusion
transcripts and six as negative with 100% concordance between padlock probes
and antibody-based detection of ERG overexpression. By sequencing the
molecular barcodes the identity of the fusion variants could be revealed showing
heterogeneity among different tissue regions. We therefore sequenced fusion
variant-specific padlock probes in fresh-frozen tissue sections from four of the
cases being positive for TMPRSS2-ERG. Two samples showed a similar pattern
of fusion variants (~90% T1/E4 and ~10% T1/E5 of the total reads)
homogeneously distributing over the tissue. The other two samples showed
different fusion variant compositions: in one PCa T1/E4 was present at 92% of
the total reads and T2/E4 representing 4% of the reads but only found in one
fifth of the imaged area. In the last sample, T2/E4 was the most prevalent variant
(76% of the total reads), while 13% of the reads resulted T1/E4 and were found
only in a third of the imaged area. Despite the small number of sample analyzed,
we demonstrated that padlock probes and in situ sequencing can be used to
visualize and quantify intra and inter-tumor heterogeneity in prostate cancer.
Oligonucleotide gap-fill ligation for mutation detection
and sequencing in situ.
In this work we demonstrated the use of oligonucleotide gap-fill ligation as a
mechanism for mutation detection and as an alternative to gap-fill
polymerization for targeted in situ sequencing.
Padlock gap probes differ from classic padlock probes in the fact that upon
hybridization to a template, a gap is left between the ends of the two target-
complementary arms. This gap can be either filled by polymerization or by
34
ligation of an oligonucleotide matching the gap in size and sequence
complementarity. The padlock gap probe can then be circularized by ligation
and amplified via RCA. When gap-fill ligation is used to circularize the padlock
probe, two separate ligation events need to occur. We investigated the
specificity and efficiency of this mechanism when a pool of oligonucleotides,
called gap probes, was used to circularize padlock gap probes and demonstrated
the feasibility to reveal the identity of the incorporated oligonucleotide by
sequencing it in situ.
To test this approach, we chose a six nucleotide-long region on the mtDNA
containing a point mutation associated with mitochondrial encephalomyopathy,
lactic acidosis and stroke-like episodes syndrome (MELAS)187
. We designed a
fully complementary gap probe and other gap probes having a single nucleotide
mismatch at different positions. When seven gap probes were combined, only
the perfect match being 5’-phosphorylated (a requirement for the DNA ligation
to occur) we were able to distinguish between a wild-type cell line and a mutant
cell line by presence or absence of signals. A parallel reaction where instead the
MELAS specific gap probe was 5’-phosphorylated, showed presence of signal
in the mutant cell line but not in the wild-type. We estimated this reaction to be
about 20% less efficient than a single DNA ligation of a padlock probe. We also
tested the method to genotype mRNA molecules and reveal the identity of the
incorporated gap probe by in situ sequencing. By using two allele-specific gap
probes, both 5’-phosphorylated, we were able to distinguish between human and
mouse fibroblasts and to assess the incorporation of the correct gap probes by
sequencing a targeted SNP on the ACTB transcripts of the two cell lines. This
method has a simpler and cheaper design compared to the use of multiple
padlock probes for detection of several mutations in a short sequence region.
Also, the double ligation provides high specificity to the reaction although the
overall efficiency is lower. If necessary, the identity of the gap probe
incorporated can be revealed by sequencing.
35
Concluding remarks and future
perspectives
In situ sequencing, a padlock probe-based technology described in this thesis,
has enormous potential as the method of choice for tissue analysis in research
and in clinical settings. This molecular tool can detect clinically relevant point
mutations as well as other structural genetic aberrations and resolve gene
expression patterns directly in the context of cells and tissues. This technology
can be used to realize the so called “digital pathology” where optical images are
transformed into digital signals and these in turn are translated into information
of various types188,189
. In the near future, tissue samples will be automatically
processed for morphological and molecular staining, and images of the whole
slide will be acquired and automatically analyzed through different image
analysis software. To that end, at least three major improvements of in situ
sequencing are required.
First, the procedure has to be automated to increase speed, throughput and
reproducibility. Implementation of the protocol into an existing NGS machine
may enable a higher level of automation. The fluorescent read-out and the
sequencing chemistry used are, in principle, compatible with some of the
existing NGS systems. Automation will be necessary to scale up the number of
samples analyzed and to test the robustness of the method in a large study
cohort. Fluidic systems that necessitate lower volume of reagents will likely
increase the efficiency of the molecular reactions and at the same time, decrease
time and cost of the assay.
Second, the sensitivity of in situ sequencing needs to be increased to extend the
analysis to low abundant genes or the detection of rare cells, which would be of
great importance for clinical applications. This can be achieved in different
ways. One way is to increase the sensitivity of the detection. In situ padlock
probe assays already exploit the RCA method, but further signal amplification
can be achieved in various ways190
. To increase the signal-to-noise ratio,
however, it may be necessary to decrease the fluorescent background of the
specimen. Several protocols have been described for decreasing sample
autofluorescence and background, including the most recent tissue clarification
techniques191,192
. Some of these protocols may be compatible with the padlock
probe assay. Alternatively, one can increase the analytical sensitivity of the
method by amplification of the target material. A method based on recombinases
which allows exponential isothermal amplification of DNA molecules193
has
been tested in situ in combination with padlock probe detection (unpublished
data). Diffusion of the amplified target between cells thereby altering the spatial
36
definition of expression profiles may be problematic; however, spatial resolution
may be of less importance in case of mutation detection where presence or
absence of a clinical mutation is the primary question.
Finally, an extensive use of this method will generate large amounts of data.
With the imaging setting used in this work, imaging 50 mm2 of tissue at 20x
magnification generates about 10 gigabytes of data for one single cycle. Imaging
many samples for four rounds of sequencing cycles will produce terabytes of
data. High computational power will be needed to store and analyze such data.
Solutions developed to handle data produced by NGS machines can potentially
be used for in situ sequencing as well. To fully integrate in situ sequencing in
digital pathology, however, the method requires the integration of software and
algorithms able to analyze digital information and transform them into
biological information. These may be easy to implement for some applications
like detecting point mutations and scoring of tissues based on the expression of a
specific mutation. More difficult tasks will be the recognition of a specific
pattern based on an a priori hypothesis, finding new patterns from the molecular
profile and relating the molecular information to the morphological structures.
Some work in this direction has already been done, mostly for H&E stained
samples194–196
. Integrating or developing new software for molecular analysis is
the biggest challenge that in situ sequencing method will face.
37
Populärvetenskaplig
sammanfattning på svenska
Ett sätt att försöka fastställa vad en vävnad gör och hur den mår är att mäta
aktiviteten av gener hos cellerna som utgör vävnadens byggstenar. Alla celler i
vävnaden har samma uppsättning gener, men skälet till att olika celltyper ser
olika ut och har olika funktion i vävnaden beror på vilka gener som är aktiva i
varje enskild cell. Genaktiviteten kan uppskattas genom att mäta mängden
mRNA som produceras från varje enskild aktiv gen. Traditionellt har man gjort
denna analys genom att extrahera RNA från vävnaden och sedan mäta hur
mycket det finns av varje mRNA. Problemet med den typen av analys är att
vävnader består av en mängd olika celltyper och dessa kan befinna sig i olika
tillstånd på grund av mikromiljön i olika delar av vävnaden. Informationen om
denna vävnadsheterogenitet går förlorad då mätvärdena blandas ihop till ett
genomsnitt för hela den vävnadsdel som analyserats. Ny teknologi möjliggör
analys av det totala mRNA innehållet (transkriptomet) i enskilda celler som
isolerats från vävnader, men informationen om den ursprungliga placeringen i
vävnaden går förlorad, och det är heller inte möjligt att använda denna teknologi
för att analysera alla celler i en vävnad. Alltså behövs nya metoder som tillför en
spatiell dimension till data för att kunna koppla ihop informationen om mRNA-
uttryck i enskilda celler med vävnadstruktur och morfologi.
I denna avhandling presenterar jag utveckling och tillämpning av en klass av
molekylära redskap som kallas padlock prober (molekylära hänglås) och
rullande cirkel amplifiering (RCA) för spatiellt upplöst transkriptomanalys.
Padlock prober tillåter in situ (på plats i celler) detektion av enskilda mRNA
molekyler med en upplösning som är tillräcklig för att särskilja enskilda bas-
skillnader mellan mRNA sekvenser, och på så sätt visualisera molekylär
information i vävnaden. Till exempel visas i min avhandling hur kliniskt
relevanta cancerframkallande mutationer är utbredda i snitt från tumörer, och
visualiserar därmed den heterogenitet som finns inom tumörer med avseende på
dessa mutationer. För att spatiellt kunna kartlägga mer komplexa
genuttrycksmönster i vävnader, har vi utvecklat en teknik som tillåter
sekvensering av RCA producter in situ. Vi kombinerade då vår padlock prob
teknik med en sekvenseringskemi som används inom massiv parallel
sekvensering (next generation sequencing). Vi visade för första gången att det
går att sekvensera korta bitar av mRNA direkt i fixerade celler och vävnader
med metoden. Genom att använda in situ sekvensering går det att läsa av mer
komplexa pooler av padlock prober som samtidigt detekterar och identifierar
många olika mRNA molekyler i samma cell. Vi använde denna strategi för att
38
mäta aktiviteten av tiotals gener i hundratusentals celler, i analyser som
inkluderar punktmutationer, fusionstranskript samt uttrycksnivå.
De presenterade molekylära verktygen komplementerar vanlig
transkriptomanalys genom att tillföra en spatiell dimension till den molekylära
informationen. Den nivå av upplösning som de nya metoderna ger är viktig för
att fullt ut förstå många biologiska processer och kan komma att bli viktig för att
välja rätt behandling till cancerpatienter.
39
Acknowledgments
First of all, I would like to thank all of you who have read my thesis. Many
people contributed, in a way or another, to this work and I would like to thank
all of them, hoping that if I forget someone he/she will forgive me.
I’d like to start with my supervisor, Mats. It’s not easy to express in few
sentences all my gratitude for everything you did for me. Thanks for never
giving up in front of my endless pessimism, for teaching me that there are other
important things in life beside lab work and for showing me that it is possible to
face everyday problems with a smile and calm! I really had a lot of fun working
with you!
I also would like to thank my opponent, Prof. Sunney Xie and the members of
my PhD committee, Prof. Kamila Czene, Prof. Joakim Lundeberg and Prof.
Stefan Nordlund for reading my thesis and finding the time for being here
today to discuss with me.
Half of this work has been done at Uppsala University and I would like to thank
Ulf Landegren for the beautiful years I have spent sharing the lab with his
group and, with Ola and Masood, for creating an exciting scientific
environment. Thank you all!
“Seeing is believing”: a special thanks goes to Carolina Wählby and to her
group in Uppsala, particularly to Alexandra, for making us believe that in situ
sequencing was really working and for being the best collaborators one could
wish to work with!
I want to thank my other collaborators especially for the patience they showed
working with a moody person like me! Rongqin, thanks for the long time spent
together trying to make in situ seq working, for the many talks and all the times
we shared room at retreats and conferences! Now that you are finally with your
family, I wish you to realize all your other dreams! Chatarina, thanks a lot for
introducing me to the lab and for teaching me padlocks! Sara K, thanks for the
patience showed working with me on the prostate paper and for all the times that
you cleaned my messy bench! Ida, thanks for the many nice meetings and
discussions in the lab and outside the lab.
I want to thank the wonderful people that surrounded me every day in the lab
making it a fun place to work in. Anja, how could I have done without you? I
will never thank you enough for everything you did, for all the help at work and
outside work, for the morning therapeutic sessions and for making much nicer
the gray mornings on the train. Thanks for sharing the “up and down” moments
40
of the PhD, the sugar crisis and for the many laughs…thanks because you made
everything smooth for me! Ivan, thanks for being a good friend and for bringing
me out to concerts and beers from time to time. And also thanks for
demonstrating me that time is truly relative, particularly for Colombians!
Thanks Annika for always taking care of every person in the lab and always
being supportive. Elin L, thanks for taking great care of Philippa and, with
Annika, for being such a nice desk mate! Tom, thanks for the patience
demonstrated having me as supervisor and for being a great colleague
afterwards. And thanks for all the time spent designing the lab website! Di, it’s
great to have you in the group, thanks for all the good discussions, the help in
the lab and for sharing your bright ideas. Malte, thanks for always promoting
social after-work activities and with Thomas for taking care of the beer supplies
in the lab. Thomas, thanks for sharing your curiosity and experience on the
most exotic culinary places in Stockholm. Jessica, thanks for sharing the lab
bench and never get upset for the mess that I always leave! Thanks Pavan for
always having nice words for me and for the great Indian food we had at your
place. Xiaoyan, thanks for the chats around a beer in the lab and for all the time
spent to help me out with image analysis. Amel, thanks for wishing me good
morning, every morning, despite my usual bad morning mood! Thanks Tagrid
for your smile, it cheers me up everytime! Chenling, thanks for your enthusiasm
on the new projects and for the tea you brought to me. Thanks to Mustapha and
all the other students for making the lab such a nice environment!
Anna, thanks for the glasses of wine while discussing about work and life, for
your happy laugh and all the fun we had in the lab and, with Jonas, for giving
me the giant palm tree!
And a big thank to all of you guys for filling up my small apartment with
balloons and for giving me Philippa!
On the Uppsala side, thanks Elin F, Lotte and Camilla for sharing the office with
me: Elin, I miss the jumping cows on your screen! Lotte, thanks for the nice
discussions during late lunches in Rudbeck and Camilla, thanks a lot for all
your help with the confocal during teaching! Thanks Lucy for the nice chats in
Rudbeck, David for the many stories about travels and different cultures and
Erik, thanks for your “hidden” work and your stimulating questions about
research.
Thanks to all the friends which I have met here in Stockholm for the nice time
spent together: Tim, Anatoly and the people from Achour group that shared the
office space with me. Thanks to Lara, Cecilia and Rnjana for the great party
nights and the good time spent outside the lab. Thanks to the Italian crew in
41
SciLife: Walter, Claudia and particularly Davide and Elena for the long coffee
breaks, the fun and the nice moments down in the basement! ; )
I had a great time when I was in Uppsala and I would like to thank all the people
I have met and shared the lab with: Christina and Elin E, thanks for taking
good care of the lab; Juhnong, for always updating me on the most recent
gossips! Lei for the many good discussions, Carl-Magnus for sharing the office
with me when I first came and for introducing me to the Swedish culture; Tonge
for your beautiful and contagious laugh and your crazy spirit; Gaelle for the nice
evenings spent outside in Uppsala and Stockholm; Caroline, thanks a lot for all
your help with the thesis! And thanks Liza, Johan O, Karin, Linda, Agata,
Andries, Maria, Abdullah, Johakim, Johan V, Irene and Pier for all the good
time I have spent with you in the lab.
Thanks to Laura, Chiara and Marco for sharing frustrations and complaints
about the weather, food and everything that is different from home! Mr Jeffrey
and Mr Tjerk thanks for the great Valborg we had and for all the crazy nights
spent together at the Nations! Ammar, thanks for being a good friend and with
Hanan thank for your great hospitality and all the chats; Jonathan, buddy,
thanks a lot for being always around, for the many evenings spent together with
Spyros, for all the good time we had and for the tough moments we have been
through together!
I had some tough moments during these years and I want to say thanks to
Alessio for always being there to help and listen and for attending to a
conference in Umeå just to have an excuse to come and visit me! And thanks to
all my good friends in Italy that always supported me!
Two persons have been particularly important for me in these many years.
Spyros, thanks for all the fun, the trips, the long talks and many, many laughs
and from time to time for reminding me about the superiority of the Greek
culture (particularly at complaining!). But, above everything, thanks for always
telling me what you think and not just what I’d like to hear.
Rachel, thanks for every moment we have spent together…you have been my
most demanding reviewer, my most passionate supporter, my best friend.
Thanks, because you can always make me smile! : )
Finally, I want to thank my beloved family that, even if far away, I have always
felt here with me. Grazie Luca, Simona e Alice per essere qui con me oggi e per
tutte le feste fatte ogni volta che torno a casa! Grazie dada per tutti i momenti in
cui mi sei stata vicina in questi anni. E grazie mamma e papa’ per tutto il
vostro affetto e supporto. Spero che oggi tutti voi possiate essere orgogliosi di
me.
42
References 1. Schwartz, S. M. The definition of cell type. Circ. Res. 84, 1234–1235 (1999).
2. Huang, S. Non-genetic heterogeneity of cells in development: more than just noise. Dev. Camb.
Engl. 136, 3853–3862 (2009).
3. McClellan, J. & King, M.-C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
4. Cai, X. et al. Single-cell, genome-wide sequencing identifies clonal somatic copy-number
variation in the human brain. Cell Rep. 8, 1280–1289 (2014).
5. Kim, J. et al. Somatic deletions implicated in functional diversity of brain cells of individuals
with schizophrenia and unaffected controls. Sci. Rep. 4, 3807 (2014).
6. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
7. Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of
genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).
8. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
9. Piotrowski, A. et al. Somatic mosaicism for copy number variation in differentiated human
tissues. Hum. Mutat. 29, 1118–1124 (2008).
10. Mkrtchyan, H. et al. The human genome puzzle - the role of copy number variation in somatic
mosaicism. Curr. Genomics 11, 426–431 (2010).
11. O’Huallachain, M., Karczewski, K. J., Weissman, S. M., Urban, A. E. & Snyder, M. P. Extensive
genetic variation in somatic human tissues. Proc. Natl. Acad. Sci. U. S. A. 109, 18018–18023
(2012).
12. Zong, C., Lu, S., Chapman, A. R. & Xie, X. S. Genome-wide detection of single-nucleotide and
copy-number variations of a single human cell. Science 338, 1622–1626 (2012).
13. Quinlan, A. R. & Hall, I. M. Characterizing complex structural variation in germline and somatic
genomes. Trends Genet. 28, 43–53 (2012).
14. Sproul, D., Gilbert, N. & Bickmore, W. A. The role of chromatin structure in regulating the
expression of clustered genes. Nat. Rev. Genet. 6, 775–781 (2005).
15. Hayashi, K., Lopes, S. M. C. de S., Tang, F. & Surani, M. A. Dynamic equilibrium and
heterogeneity of mouse pluripotent stem cells with distinct functional and epigenetic states. Cell
Stem Cell 3, 391–401 (2008).
16. Heinz, S. et al. Effect of natural genetic variation on enhancer selection and function. Nature 503,
487–492 (2013).
17. Zopf, C. J., Quinn, K., Zeidman, J. & Maheshri, N. Cell-cycle dependence of transcription
dominates noise in gene expression. PLoS Comput. Biol. 9, e1003161 (2013).
18. Sasagawa, Y. et al. Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing
method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 14, R31 (2013).
19. Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its
consequences. Cell 135, 216–226 (2008).
20. Li, G.-W. & Xie, X. S. Central dogma at the single-molecule level in living cells. Nature 475,
308–315 (2011).
21. Choi, P. J., Xie, X. S. & Shakhnovich, E. I. Stochastic switching in gene networks can occur by a
single-molecule event or many molecular steps. J. Mol. Biol. 396, 230–244 (2010).
22. Oltvai, Z. N. & Barabási, A.-L. Systems biology. Life’s complexity pyramid. Science 298, 763–
764 (2002).
23. Luscombe, N. M. et al. Genomic analysis of regulatory network dynamics reveals large
topological changes. Nature 431, 308–312 (2004).
24. Yu, P. et al. Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation.
Genome Res. 23, 352–364 (2013).
25. Biressi, S., Molinaro, M. & Cossu, G. Cellular heterogeneity during vertebrate skeletal muscle
development. Dev. Biol. 308, 281–293 (2007).
26. Rusnakova, V. et al. Heterogeneity of astrocytes: from development to injury - single cell gene
expression. PloS One 8, e69734 (2013).
43
27. Gordon, S. & Plűddemann, A. Tissue macrophage heterogeneity: issues and prospects. Semin.
Immunopathol. 35, 533–540 (2013).
28. Nolan, D. J. et al. Molecular signatures of tissue-specific microvascular endothelial cell
heterogeneity in organ maintenance and regeneration. Dev. Cell 26, 204–219 (2013).
29. Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis
project. Nat. Genet. 45, 1113–1120 (2013).
30. Michor, F. & Weaver, V. M. Understanding tissue context influences on intratumour
heterogeneity. Nat. Cell Biol. 16, 301–302 (2014).
31. Gottesman, M. M. Mechanisms of cancer drug resistance. Annu. Rev. Med. 53, 615–627 (2002).
32. Kim, J. J. & Tannock, I. F. Repopulation of cancer cells during therapy: an important cause of
treatment failure. Nat. Rev. Cancer 5, 516–525 (2005).
33. Fisher, R., Pusztai, L. & Swanton, C. Cancer heterogeneity: implications for targeted
therapeutics. Br. J. Cancer 108, 479–485 (2013).
34. Bedard, P. L., Hansen, A. R., Ratain, M. J. & Siu, L. L. Tumour heterogeneity in the clinic.
Nature 501, 355–364 (2013).
35. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide
resolution. Nature 461, 809–813 (2009).
36. Walter, M. J. et al. Clonal architecture of secondary acute myeloid leukemia. N. Engl. J. Med.
366, 1090–1098 (2012).
37. La Thangue, N. B. & Kerr, D. J. Predictive biomarkers: a paradigm shift towards personalized
cancer medicine. Nat. Rev. Clin. Oncol. 8, 587–596 (2011).
38. Schink, J. C. et al. Biomarker testing for breast, lung, and gastroesophageal cancers at NCI
designated cancer centers. J. Natl. Cancer Inst. 106, dju256 (2014).
39. Rosenblum, D. & Peer, D. Omics-based nanomedicine: the future of personalized oncology.
Cancer Lett. 352, 126–136 (2014).
40. Huang, M., Shen, A., Ding, J. & Geng, M. Molecularly targeted cancer therapy: some lessons
from the past decade. Trends Pharmacol. Sci. 35, 41–50 (2014).
41. Toriello, N. M. et al. Integrated microfluidic bioprocessor for single-cell gene expression
analysis. Proc. Natl. Acad. Sci. U. S. A. 105, 20173–20178 (2008).
42. Bayraktar, O. A., Fuentealba, L. C., Alvarez-Buylla, A. & Rowitch, D. H. Astrocyte
Development and Heterogeneity. Cold Spring Harb. Perspect. Biol. (2014).
doi:10.1101/cshperspect.a020362
43. Jennings, L., Van Deerlin, V. M., Gulley, M. L. & College of American Pathologists Molecular
Pathology Resource Committee. Recommended principles and practices for validating clinical
molecular pathology tests. Arch. Pathol. Lab. Med. 133, 743–755 (2009).
44. Bonner, R. F. et al. Laser capture microdissection: molecular analysis of tissue. Science 278,
1481–1483 (1997).
45. Kurimoto, K., Yabuta, Y., Ohinata, Y. & Saitou, M. Global single-cell cDNA amplification to
provide a template for representative high-density oligonucleotide microarray analysis. Nat.
Protoc. 2, 739–752 (2007).
46. Suarez-Quian, C. A. et al. Laser capture microdissection of single cells from complex tissues.
BioTechniques 26, 328–335 (1999).
47. White, A. K. et al. High-throughput microfluidic single-cell RT-qPCR. Proc. Natl. Acad. Sci. U.
S. A. 108, 13999–14004 (2011).
48. Yoshimoto, N. et al. An automated system for high-throughput single cell-based breeding. Sci.
Rep. 3, 1191 (2013).
49. Lovatt, D. et al. Transcriptome in vivo analysis (TIVA) of spatially defined single cells in live
tissue. Nat. Methods 11, 190–196 (2014).
50. Kalisky, T., Blainey, P. & Quake, S. R. Genomic analysis at the single-cell level. Annu. Rev.
Genet. 45, 431–445 (2011).
51. Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule
sensitivity in single cells. Science 329, 533–538 (2010).
52. Hocine, S., Raymond, P., Zenklusen, D., Chao, J. A. & Singer, R. H. Single-molecule analysis of
gene expression using two-color RNA labeling in live yeast. Nat. Methods 10, 119–121 (2013).
44
53. Hughes, M. et al. High-resolution time course analysis of gene expression from pituitary. Cold
Spring Harb. Symp. Quant. Biol. 72, 381–386 (2007).
54. Bustin, S. A. et al. The MIQE guidelines: minimum information for publication of quantitative
real-time PCR experiments. Clin. Chem. 55, 611–622 (2009).
55. Mullis, K. B. & Faloona, F. A. Specific synthesis of DNA in vitro via a polymerase-catalyzed
chain reaction. Methods Enzymol. 155, 335–350 (1987).
56. Higuchi, R., Fockler, C., Dollinger, G. & Watson, R. Kinetic PCR analysis: real-time monitoring
of DNA amplification reactions. Biotechnol. Nat. Publ. Co. 11, 1026–1030 (1993).
57. Rutledge, R. G. & Cote, C. Mathematics of quantitative kinetic PCR and the application of
standard curves. Nucleic Acids Res. 31, e93 (2003).
58. Rappolee, D. A., Mark, D., Banda, M. J. & Werb, Z. Wound macrophages express TGF-alpha
and other growth factors in vivo: analysis by mRNA phenotyping. Science 241, 708–712 (1988).
59. Delidow, B. C., Peluso, J. J. & White, B. A. Quantitative measurement of mRNAs by polymerase
chain reaction. Gene Anal. Tech. 6, 120–124 (1989).
60. Adamski, M. G., Gumann, P. & Baird, A. E. A method for quantitative analysis of standard and
high-throughput qPCR expression data based on input sample quantity. PLoS ONE 9, e103917
(2014).
61. Citri, A., Pang, Z. P., Südhof, T. C., Wernig, M. & Malenka, R. C. Comprehensive qPCR
profiling of gene expression in single neuronal cells. Nat. Protoc. 7, 118–127 (2012).
62. Beer, N. R. et al. On-chip, real-time, single-copy polymerase chain reaction in picoliter droplets.
Anal. Chem. 79, 8471–8475 (2007).
63. Sanders, R. et al. Evaluation of digital PCR for absolute DNA quantification. Anal. Chem. 83,
6474–6484 (2011).
64. Dalerba, P. et al. Single-cell dissection of transcriptional heterogeneity in human colon tumors.
Nat. Biotechnol. 29, 1120–1127 (2011).
65. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression
patterns with a complementary DNA microarray. Science 270, 467–470 (1995).
66. Tietjen, I. et al. Single-cell transcriptional analysis of neuronal progenitors. Neuron 38, 161–175
(2003).
67. Livesey, F. J. Strategies for microarray analysis of limiting amounts of RNA. Brief. Funct.
Genomic. Proteomic. 2, 31–36 (2003).
68. Kurimoto, K. et al. An improved single-cell cDNA amplification method for efficient high-
density oligonucleotide microarray analysis. Nucleic Acids Res. 34, e42 (2006).
69. Lockwood, W. W., Chari, R., Chi, B. & Lam, W. L. Recent advances in array comparative
genomic hybridization technologies and their applications in human genetics. Eur. J. Hum. Genet.
14, 139–148 (2006).
70. LaFramboise, T. Single nucleotide polymorphism arrays: a decade of biological, computational
and technological advances. Nucleic Acids Res. 37, 4181–4193 (2009).
71. Draghici, S., Khatri, P., Eklund, A. C. & Szallasi, Z. Reliability and reproducibility issues in
DNA microarray measurements. Trends Genet. 22, 101–109 (2006).
72. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of
technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–
1517 (2008).
73. Malone, J. H. & Oliver, B. Microarrays, deep sequencing and the true measure of the
transcriptome. BMC Biol. 9, 34 (2011).
74. Bains, W. & Smith, G. C. A novel method for nucleic acid sequence determination. J. Theor.
Biol. 135, 303–307 (1988).
75. Strezoska, Z. et al. DNA sequencing by hybridization: 100 bases read by a non-gel-based method.
Proc. Natl. Acad. Sci. U. S. A. 88, 10089–10093 (1991).
76. Drmanac, R. et al. DNA sequence determination by hybridization: a strategy for efficient large-
scale sequencing. Science 260, 1649–1652 (1993).
77. Metzker, M. L. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46 (2010).
78. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis.
Cell 133, 523–536 (2008).
45
79. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying
mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
80. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat.
Rev. Genet. 10, 57–63 (2009).
81. Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K. & Liu, X. Comparison of RNA-Seq and
microarray in transcriptome profiling of activated T cells. PloS One 9, e78644 (2014).
82. Ozsolak, F. et al. Direct RNA sequencing. Nature 461, 814–818 (2009).
83. Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the
human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).
84. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382
(2009).
85. Grindberg, R. V. et al. RNA-sequencing from single nuclei. Proc. Natl. Acad. Sci. U. S. A. 110,
19802–19807 (2013).
86. Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse transcriptase
template switching: a SMART approach for full-length cDNA library construction.
BioTechniques 30, 892–897 (2001).
87. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual
circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
88. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat.
Methods 10, 1096–1098 (2013).
89. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex
RNA-seq. Genome Res. 21, 1160–1167 (2011).
90. Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of
tissues into cell types. Science 343, 776–779 (2014).
91. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers.
Nat. Methods 9, 72–74 (2012).
92. Shiroguchi, K., Jia, T. Z., Sims, P. A. & Xie, X. S. Digital RNA sequencing minimizes sequence-
dependent bias and amplification noise with optimized single-molecule barcodes. Proc. Natl.
Acad. Sci. U. S. A. 109, 1347–1352 (2012).
93. Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods
11, 163–166 (2014).
94. Hulett, H. R., Bonner, W. A., Barrett, J. & Herzenberg, L. A. Cell sorting: automated separation
of mammalian cells as a function of intracellular fluorescence. Science 166, 747–749 (1969).
95. Klemm, S. et al. Transcriptional profiling of cells sorted by RNA abundance. Nat. Methods 11,
549–551 (2014).
96. Yu, H., Ernst, L., Wagner, M. & Waggoner, A. Sensitive detection of RNAs in single cells by
flow cytometry. Nucleic Acids Res. 20, 83–88 (1992).
97. Gall, J. G. & Pardue, M. L. Formation and detection of RNA-DNA hybrid molecules in
cytological preparations. Proc. Natl. Acad. Sci. U. S. A. 63, 378–383 (1969).
98. Bauman, J. G., Wiegant, J., Borst, P. & van Duijn, P. A new method for fluorescence
microscopical localization of specific DNA sequences by in situ hybridization of
fluorochromelabelled RNA. Exp. Cell Res. 128, 485–490 (1980).
99. Singer, R. H. & Ward, D. C. Actin gene expression visualized in chicken muscle tissue culture by
using in situ hybridization with a biotinated nucleotide analog. Proc. Natl. Acad. Sci. U. S. A. 79,
7331–7335 (1982).
100. Femino, A. M., Fay, F. S., Fogarty, K. & Singer, R. H. Visualization of single RNA transcripts in
situ. Science 280, 585–590 (1998).
101. Capodieci, P. et al. Gene expression profiling in single cells within tissue. Nat. Methods 2, 663–
665 (2005).
102. Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in
mammalian cells. PLoS Biol. 4, e309 (2006).
103. Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual
mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).
104. Semrau, S. et al. FuseFISH: robust detection of transcribed gene fusions in single cells. Cell Rep.
6, 18–23 (2014).
46
105. Lyubimova, A. et al. Single-molecule mRNA detection and counting in mammalian tissue. Nat.
Protoc. 8, 1743–1758 (2013).
106. Itzkovitz, S. et al. Single-molecule transcript counting of stem-cell markers in the mouse
intestine. Nat. Cell Biol. 14, 106–114 (2012).
107. Levesque, M. J., Ginart, P., Wei, Y. & Raj, A. Visualizing SNVs to quantify allele-specific
expression in single cells. Nat. Methods 10, 865–867 (2013).
108. Hansen, C. H. & van Oudenaarden, A. Allele-specific detection of single mRNA molecules in
situ. Nat. Methods 10, 869–871 (2013).
109. Levsky, J. M., Shenoy, S. M., Pezo, R. C. & Singer, R. H. Single-cell gene expression profiling.
Science 297, 836–840 (2002).
110. Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA
profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).
111. Lubeck, E. & Cai, L. Single-cell systems biology by super-resolution imaging and combinatorial
labeling. Nat. Methods 9, 743–748 (2012).
112. Lu, J. & Tsourkas, A. Imaging individual microRNAs in single mammalian cells in situ. Nucleic
Acids Res. 37, e100 (2009).
113. Kim, D. H., Hong, Y. K., Egholm, M. & Strauss, W. M. Non-disruptive PNA-FISH protocol for
formalin-fixed and paraffin-embedded tissue sections. BioTechniques 31, 472, 475–476 (2001).
114. Yaroslavsky, A. I. & Smolina, I. V. Fluorescence imaging of single-copy DNA sequences within
the human genome using PNA-directed padlock probe assembly. Chem. Biol. 20, 445–453
(2013).
115. Nuovo, G. J. In situ PCR: protocols and applications. PCR Methods Appl. 4, S151–167 (1995).
116. Bagasra, O. Protocols for the in situ PCR-amplification and detection of mRNA and DNA
sequences. Nat. Protoc. 2, 2782–2795 (2007).
117. Long, A. A., Komminoth, P., Lee, E. & Wolfe, H. J. Comparison of indirect and direct in-situ
polymerase chain reaction in cell preparations and tissue sections. Detection of viral DNA, gene
rearrangements and chromosomal translocations. Histochemistry 99, 151–162 (1993).
118. Qian, X. & Lloyd, R. V. Recent developments in signal amplification methods for in situ
hybridization. Diagn. Mol. Pathol. 12, 1–13 (2003).
119. Kern, D. et al. An enhanced-sensitivity branched-DNA assay for quantification of human
immunodeficiency virus type 1 RNA in plasma. J. Clin. Microbiol. 34, 3196–3202 (1996).
120. Player, A. N., Shen, L. P., Kenny, D., Antao, V. P. & Kolberg, J. A. Single-copy gene detection
using branched DNA (bDNA) in situ hybridization. J. Histochem. Cytochem. 49, 603–612 (2001).
121. Battich, N., Stoeger, T. & Pelkmans, L. Image-based transcriptomics in thousands of single
human cells at single-molecule resolution. Nat. Methods 10, 1127–1133 (2013).
122. Choi, H. M. T. et al. Programmable in situ amplification for multiplexed imaging of mRNA
expression. Nat. Biotechnol. 28, 1208–1212 (2010).
123. Ali, M. M. et al. Rolling circle amplification: a versatile tool for chemical biology, materials
science and medicine. Chem. Soc. Rev. 43, 3324–3341 (2014).
124. Banér, J., Nilsson, M., Mendel-Hartvig, M. & Landegren, U. Signal amplification of padlock
probes by rolling circle replication. Nucleic Acids Res. 26, 5073–5078 (1998).
125. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling
DNA nanoarrays. Science 327, 78–81 (2010).
126. Nilsson, M. et al. Padlock probes: circularizing oligonucleotides for localized DNA detection.
Science 265, 2085–2088 (1994).
127. Landegren, U., Kaiser, R., Sanders, J. & Hood, L. A ligase-mediated gene detection technique.
Science 241, 1077–1080 (1988).
128. Luo, J., Bergstrom, D. E. & Barany, F. Improving the fidelity of Thermus thermophilus DNA
ligase. Nucleic Acids Res. 24, 3071–3078 (1996).
129. Lizardi, P. M. et al. Mutation detection and single-molecule counting using isothermal rolling-
circle amplification. Nat. Genet. 19, 225–232 (1998).
130. Larsson, C. et al. In situ genotyping individual DNA molecules by target-primed rolling-circle
amplification of padlock probes. Nat. Methods 1, 227–232 (2004).
131. Larsson, C., Grundberg, I., Söderberg, O. & Nilsson, M. In situ detection and genotyping of
individual mRNA molecules. Nat. Methods 7, 395–397 (2010).
47
132. Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods 10,
857–860 (2013).
133. Sunkin, S. M. Towards the integration of spatially and temporally resolved murine gene
expression databases. Trends Genet. 22, 211–217 (2006).
134. Singh, R. P. et al. High-resolution voxelation mapping of human and rodent brain gene
expression. J. Neurosci. Methods 125, 93–101 (2003).
135. Liu, D. & Smith, D. J. Voxelation and gene expression tomography for the acquisition of 3-D
gene expression maps in the brain. Methods 31, 317–325 (2003).
136. Sforza, D. M. & Smith, D. J. Voxelation methods for genome scale imaging of brain gene
expression. Methods Enzymol. 386, 314–323 (2004).
137. Chin, M. H. et al. A genome-scale map of expression for a mouse brain section obtained using
voxelation. Physiol. Genomics 30, 313–321 (2007).
138. Okamura-Oho, Y. et al. Transcriptome tomography for brain analysis in the web-accessible
anatomical space. PloS One 7, e45373 (2012).
139. Miller, J. A. et al. Transcriptional landscape of the prenatal human brain. Nature 508, 199–206
(2014).
140. Thompson, C. L. et al. A high-resolution spatiotemporal atlas of gene expression of the
developing mouse brain. Neuron 83, 309–323 (2014).
141. Junker, J. P. et al. Genome-wide RNA tomography in the zebrafish embryo. Cell 159, 662–675
(2014).
142. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-
cell RNA-seq. Nature 509, 371–375 (2014).
143. Pollen, A. A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity
and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058
(2014).
144. Zechel, S., Zajac, P., Lönnerberg, P., Ibáñez, C. F. & Linnarsson, S. Topographical transcriptome
mapping of the mouse medial ganglionic eminence by spatially-resolved RNA-seq. Genome Biol.
15, 486 (2014).
145. Laszlo, A. H. et al. Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol.
32, 829–833 (2014).
146. Drndić, M. Sequencing with graphene pores. Nat. Nanotechnol. 9, 743 (2014).
147. Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363
(2014).
148. Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes.
Nat. Biotechnol. 21, 673–678 (2003).
149. Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931–
936 (2007).
150. Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of
universal sequence-dictated positioning. Genome Res. 18, 1051–1063 (2008).
151. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
152. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724
(2009).
153. Yap, T. A., Gerlinger, M., Futreal, P. A., Pusztai, L. & Swanton, C. Intratumor heterogeneity:
seeing the wood for the trees. Sci. Transl. Med. 4, 127ps10 (2012).
154. Marx, V. Cancer genomes: discerning drivers from passengers. Nat. Methods 11, 375–379
(2014).
155. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
156. Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene
expression monitoring. Science 286, 531–537 (1999).
157. Vogelstein, B. & Kinzler, K. W. Cancer genes and the pathways they control. Nat. Med. 10, 789–
799 (2004).
158. Gerlinger, M. et al. Cancer: Evolution Within a Lifetime. Annu. Rev. Genet. (2014).
doi:10.1146/annurev-genet-120213-092314
159. Swanton, C. Intratumor heterogeneity: evolution through space and time. Cancer Res. 72, 4875–
4882 (2012).
48
160. Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined
by multiregion sequencing. Nat. Genet. 46, 225–233 (2014).
161. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion
sequencing. N. Engl. J. Med. 366, 883–892 (2012).
162. Junttila, M. R. & de Sauvage, F. J. Influence of tumour micro-environment heterogeneity on
therapeutic response. Nature 501, 346–354 (2013).
163. Haffner, M. C. et al. Tracking the clonal origin of lethal prostate cancer. J. Clin. Invest. 123,
4918–4922 (2013).
164. Polzer, B. et al. Molecular profiling of single circulating tumor cells with diagnostic intention.
EMBO Mol. Med. 6, 1371–1386 (2014).
165. Diaz, L. A. et al. The molecular evolution of acquired resistance to targeted EGFR blockade in
colorectal cancers. Nature 486, 537–540 (2012).
166. Harris, T. J. R. & McCormick, F. The molecular pathology of cancer. Nat. Rev. Clin. Oncol. 7,
251–265 (2010).
167. Kiflemariam, S. et al. In situ sequencing identifies TMPRSS2-ERG fusion transcripts, somatic
point mutations and gene expression levels in prostate cancers. J. Pathol. 234, 253–261 (2014).
168. Grundberg, I. et al. In situ mutation detection and visualization of intratumor heterogeneity for
cancer research and diagnostics. Oncotarget 4, 2407–2418 (2013).
169. Weichert, W. et al. KRAS genotyping of paraffin-embedded colorectal cancer tissue in routine
diagnostics: comparison of methods and impact of histology. J. Mol. Diagn. 12, 35–42 (2010).
170. Herreros-Villanueva, M., Chen, C.-C., Yuan, S.-S. F., Liu, T.-C. & Er, T.-K. KRAS mutations:
analytical considerations. Clin. Chim. Acta. 431, 211–220 (2014).
171. Ross, J. S., Hatzis, C., Symmans, W. F., Pusztai, L. & Hortobágyi, G. N. Commercialized
multigene predictors of clinical outcome for breast cancer. The Oncologist 13, 477–493 (2008).
172. Raz, D. J. et al. A multigene assay is prognostic of survival in patients with early-stage lung
adenocarcinoma. Clin. Cancer Res. 14, 5565–5570 (2008).
173. Kawachi, M. H. et al. NCCN clinical practice guidelines in oncology: prostate cancer early
detection. J. Natl. Compr. Cancer Netw. 8, 240–262 (2010).
174. Humphrey, P. A. Gleason grading and prognostic factors in carcinoma of the prostate. Mod.
Pathol. 17, 292–306 (2004).
175. Schmidt, C. Gleason scoring system faces change and debate. J. Natl. Cancer Inst. 101, 622–629
(2009).
176. Schnitt, S. J. Classification and prognosis of invasive breast cancer: from morphology to
molecular taxonomy. Mod. Pathol. 23 Suppl 2, S60–64 (2010).
177. Prat, A. & Perou, C. M. Deconstructing the molecular portraits of breast cancer. Mol. Oncol. 5, 5–
23 (2011).
178. Sundström, M. et al. KRAS analysis in colorectal carcinoma: analytical aspects of
Pyrosequencing and allele-specific PCR in clinical practice. BMC Cancer 10, 660 (2010).
179. Siegelin, M. D. & Borczuk, A. C. Epidermal growth factor receptor mutations in lung
adenocarcinoma. Lab. Investig. 94, 129–137 (2014).
180. Sparano, J. A. & Paik, S. Development of the 21-gene assay and its application in clinical practice
and clinical trials. J. Clin. Oncol. 26, 721–728 (2008).
181. Arora, R. et al. Heterogeneity of Gleason grade in multifocal adenocarcinoma of the prostate.
Cancer 100, 2362–2366 (2004).
182. Algaba, F. & Montironi, R. Impact of prostate cancer multifocality on its biology and treatment.
J. Endourol. Endourol. Soc. 24, 799–804 (2010).
183. Tomlins, S. A. et al. Distinct classes of chromosomal rearrangements create oncogenic ETS gene
fusions in prostate cancer. Nature 448, 595–599 (2007).
184. Linja, M. J. et al. Amplification and overexpression of androgen receptor gene in hormone-
refractory prostate cancer. Cancer Res. 61, 3550–3555 (2001).
185. Kumar, A. et al. Exome sequencing identifies a spectrum of mutation frequencies in advanced
and lethal prostate cancers. Proc. Natl. Acad. Sci. U. S. A. 108, 17087–17092 (2011).
186. Jiang, N., Zhu, S., Chen, J., Niu, Y. & Zhou, L. A-Methylacyl-CoA Racemase (AMACR) and
Prostate-Cancer Risk: A Meta-Analysis of 4,385 Participants. PLoS ONE 8, e74386 (2013).
49
187. Goto, Y., Nonaka, I. & Horai, S. A mutation in the tRNA(Leu)(UUR) gene associated with the
MELAS subgroup of mitochondrial encephalomyopathies. Nature 348, 651–653 (1990).
188. Ghaznavi, F., Evans, A., Madabhushi, A. & Feldman, M. Digital imaging in pathology: whole-
slide imaging and beyond. Annu. Rev. Pathol. 8, 331–359 (2013).
189. Wilbur, D. C. Digital pathology: Get on board-the train is leaving the station. Cancer Cytopathol.
122, 791–795 (2014).
190. Landegren, U., Chen L., Wu D., Nong Y. & Gallant C., Localised rca-based amplification
method. Patent WO/2014/076209. 22 May 2014. Print.
191. Poguzhelskaya, E., Artamonov, D., Bolshakova, A., Vlasova, O. & Bezprozvanny, I. Simplified
method to perform CLARITY imaging. Mol. Neurodegener. 9, 19 (2014).
192. Ertürk, A. et al. Three-dimensional imaging of solvent-cleared organs using 3DISCO. Nat.
Protoc. 7, 1983–1995 (2012).
193. Piepenburg, O., Williams, C. H., Stemple, D. L. & Armes, N. A. DNA detection using
recombination proteins. PLoS Biol. 4, e204 (2006).
194. Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features
associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).
195. Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors
complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).
196. Langer, L. et al. Computer-aided diagnostics in digital pathology: automated evaluation of early-
phase pancreatic cancer in mice. Int. J. Comput. Assist. Radiol. Surg. (2014). doi:10.1007/s11548-
014-1122-9