genomique du cancer´csuros/ift6299/h2014/content/prez15-canc… · n stem cells that seed a...

19
Cancer ? IFT6299 H2014 ? UdeM ? Mikl´ os Cs˝ ur¨ os G ´ ENOMIQUE DU CANCER

Upload: others

Post on 20-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros

GENOMIQUE DU CANCER

Page 2: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Origine du cancer

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros ii

cancer = maladie du genomeaccumulation de mutations dans les cellules souches⇒ cellules somatiques aberrantes (evitant la mort cellulaire)

suppose that one initial cell divides exponentially to produce

N stem cells that seed a renewing tissue such as the skin

or colon. The classical Luria-Delbruck distribution(34,35)

describes the probability that a frequency x of those initial

stem cells carries a mutation or, equivalently, that a total of

Nx initial stem cells carry mutations.

A mutation in the early rounds of exponential cell growth

carries forward to all descendants, causing a high frequency

of mutated cells. For this reason, the frequency of mutated

cells can occasionally be very high, causing rare individuals to

carry the same mutation in a large fraction of initial stem cells.

By contrast, mutations during the linear history of stem cell

division remain localized in a single compartment, unless

cancer causes invasive growth.

The number of rounds of cellular division to produceN stem

cells from one initial cell is approximately ln(N), ignoring cell

death. To make N¼ 107 stem cells requires cellular lineages

with, on average, ln(N)" 16 cell divisions back to the initial

progenitor cell; forN¼ 109, there are approximately ln(N)" 21

cell divisions per lineage.

What is the probability that a stem cell carries a mutation at

the end of exponential growth and before the linear phase of

division and tissue renewal? If the mutation rate per cell

division during exponential growth is ue, then the probability

of mutation is roughly the mutation rate multiplied by the

number of cell divisions, !xx " ueln(N). The probability that any

particular initial stem cell has a mutation is small, but the

averagenumber of initial stemcellswithmutations,N !xx , can be

significant.

Put another way, we can expect roughly N !xx compart-

ments to begin life with mutated stem cells. Those mutated

compartments begin one step further along in the progression

to cancer than compartments that begin with pristine stem

cells. Although initiallymutated compartments are only a small

fraction of the total compartments, those compartments with

initial mutations may contribute significantly to cancers later in

life because of their much higher risk of transformation.(33)

Do mutation rates rise and then fall duringtransformation to cancerous growth?Cancer progression requires broad changes in cellular

physiology and often demands rapid adaptation to novel

environments. This demand for change suggests that the

Figure 1. The structure of epithelial tissuecompartments influences the accumulation ofsomatic mutations and progression to cancer.This figure illustrates a crypt, the compartmentalunit of colon tissue. The stem cells reside at thebase of the crypt. Each stem cell division typicallygives rise to one stem cell that remains at the cryptbase and one transit cell thatmoves up. The transitcell then divides several times, pushing the cellsabove toward the colon surface, where the surfacecells undergo apoptosis and are shed. The stemcells form the only long-lived cell lineages, fromwhich other crypt cells derive. Thus, cancerprogression mostly follows the accumulation ofmutations to stem cell lineages.

Figure 2. The phases of cellular growth in epithelial tissues.Cell populations expand exponentially during development,shown by a branching phase of division. At the end of devel-opment, stem cells differentiate in each tissue compartment.Stemcells reneweachcompartment bydividing to formanearlylinear cellular history—each stem cell division gives rise to onedaughter stem cell that continues to renew the tissue and onedaughter transit cell that divides rapidly to produce a short-livedtransit lineage that fills the tissue.

Problems and paradigms

294 BioEssays 26.3

suppose that one initial cell divides exponentially to produce

N stem cells that seed a renewing tissue such as the skin

or colon. The classical Luria-Delbruck distribution(34,35)

describes the probability that a frequency x of those initial

stem cells carries a mutation or, equivalently, that a total of

Nx initial stem cells carry mutations.

A mutation in the early rounds of exponential cell growth

carries forward to all descendants, causing a high frequency

of mutated cells. For this reason, the frequency of mutated

cells can occasionally be very high, causing rare individuals to

carry the same mutation in a large fraction of initial stem cells.

By contrast, mutations during the linear history of stem cell

division remain localized in a single compartment, unless

cancer causes invasive growth.

The number of rounds of cellular division to produceN stem

cells from one initial cell is approximately ln(N), ignoring cell

death. To make N¼ 107 stem cells requires cellular lineages

with, on average, ln(N)" 16 cell divisions back to the initial

progenitor cell; forN¼ 109, there are approximately ln(N)" 21

cell divisions per lineage.

What is the probability that a stem cell carries a mutation at

the end of exponential growth and before the linear phase of

division and tissue renewal? If the mutation rate per cell

division during exponential growth is ue, then the probability

of mutation is roughly the mutation rate multiplied by the

number of cell divisions, !xx " ueln(N). The probability that any

particular initial stem cell has a mutation is small, but the

averagenumber of initial stemcellswithmutations,N !xx , can be

significant.

Put another way, we can expect roughly N !xx compart-

ments to begin life with mutated stem cells. Those mutated

compartments begin one step further along in the progression

to cancer than compartments that begin with pristine stem

cells. Although initiallymutated compartments are only a small

fraction of the total compartments, those compartments with

initial mutations may contribute significantly to cancers later in

life because of their much higher risk of transformation.(33)

Do mutation rates rise and then fall duringtransformation to cancerous growth?Cancer progression requires broad changes in cellular

physiology and often demands rapid adaptation to novel

environments. This demand for change suggests that the

Figure 1. The structure of epithelial tissuecompartments influences the accumulation ofsomatic mutations and progression to cancer.This figure illustrates a crypt, the compartmentalunit of colon tissue. The stem cells reside at thebase of the crypt. Each stem cell division typicallygives rise to one stem cell that remains at the cryptbase and one transit cell thatmoves up. The transitcell then divides several times, pushing the cellsabove toward the colon surface, where the surfacecells undergo apoptosis and are shed. The stemcells form the only long-lived cell lineages, fromwhich other crypt cells derive. Thus, cancerprogression mostly follows the accumulation ofmutations to stem cell lineages.

Figure 2. The phases of cellular growth in epithelial tissues.Cell populations expand exponentially during development,shown by a branching phase of division. At the end of devel-opment, stem cells differentiate in each tissue compartment.Stemcells reneweachcompartment bydividing to formanearlylinear cellular history—each stem cell division gives rise to onedaughter stem cell that continues to renew the tissue and onedaughter transit cell that divides rapidly to produce a short-livedtransit lineage that fills the tissue.

Problems and paradigms

294 BioEssays 26.3

Frank & Nowak Bioessays 26 :291 (2004)

Page 3: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Mutations somatiques

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros iii

genre de mutations dans le genome d’une cellule cancereuse :? mutations de point (missense, nonsense) et petits indels (frame shift)

a function of the number of somatic cell divisions prior to ini-tiation of the tumor, the exposure to environmental muta-gens—notably UV radiation and tobacco leaf by-products—andin some cancers, altered fidelity of the tumor DNA replicationsystem. At the low end of the range are pediatric cancers, fol-lowed by adult leukemia and adult solid tumors. Tumors thatexceed 10 coding mutations per megabase pair (Mbp) are oftenfound deficient in mismatch repair, either through mutationor epigenetic silencing of MLH1 or one of the other mismatchrepair enzymes. Tumors with coding mutation frequencies of100 per Mbp or greater are mutated in the exonuclease domainof POLE, one of two DNA replicative enzymes of the cells (The

Cancer Genome Atlas Research Network 2013). These patternsmay have important implications for clinical testing in thatcolorectal patients with high rates of mutation due to mismatchrepair (Walther et al. 2009) or replicative dysfunction tend tohave improved survival compared to their lower mutation ratecounterparts for the same tumor type. At the other end of thescale, many pediatric patients have so few coding mutationsthat DNA sequencing sheds less light on the etiology or prog-nosis of their disease. For the broad range of adult cancers withintermediate rates, mutation discovery is becoming increasinglyimportant in subclassifying disease for prognosis and treatment(e.g., Patel et al. 2012).

Figure 2. Frequencies of somatic mutations in cancer patients. All data represents primary tumors. Only nonsilent mutations (missense, nonsense,frameshift, and splice site) were counted. (A) Overall frequencies of somatic mutations. Each black dot represents a tumor. The light blue shaded groupindicates pediatric tumors, and the deeper blue shaded group indicates adult tumors. Red horizontal lines within each cluster of points indicate medianvalue of the mutation frequency of each tumor type. (ALL) Acute lymphoblastic leukemia; (AML) acute myeloid leukemia; (C) carcinoma; (GCT) germ celltumor; (CRC) colorectal carcinoma; (MSI) microsatellite instability; (MSS) microsatellite stable; (POLE) patients with somatic mutation in the nuclease(proofreading) domain of the POLE gene. The outlier in the low-grade glioma patient with >100 mutations per Mb is also POLE-mutated. (B) Frequencyclassification of tumors. The pie charts divide the patients into three groups based on frequency of nonsilent mutation: 0 detectable somatic mutations, lessthan 30, and greater than or equal to 30 for selected representative tumor types (30 mutations represent a frequency of 1 per Mbp in A). The nestedhistograms below the pie charts show the percentage of patients with no significantly mutated genes (SMG, calculated by MutSig, q # 0.1), no cancercensus genes (CGC), or no mutations at all. The sequencing data for all the pediatric tumors, CRC, and hepatocellular carcinoma were generated at theHuman Genome Sequencing Center at Baylor College of Medicine. The sequencing data for all other adult tumors were from the TCGA Genome DataAnalysis Center (https://confluence.broadinstitute.org/display/GDAC/Home). Pediatric AML, ALL, and Wilm’s Tumor data were obtained from theTARGET project (http://www.targetproject.net/).

Wheeler and Wang

1056 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on December 13, 2013 - Published by genome.cshlp.orgDownloaded from

? rearrangements (fusions, translocations, duplications), changement de ploıdie? epigenomique (hypermethylation)role des mutations :? driver : avantage pour la lignee clonale (selection positive)? passenger : «auto-stoppeur» sans avantage particulier

Wheeler & Wang Genome Research 23 :1054 (2012)

Page 4: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Analyse par sequencage

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros iv

on peut detecter les modifications du genome par sequencage :

Transformation assayThe measurement of cell phenotypes to assess oncogenic changes.

Digital karyotypingA method to quantify DNA copy number. Short sequence-derived tags that cover the genome are used to read-out relative copy number.

alteration functionally significant? Across a set of sam-ples, statistical significance of a gene (or an alteration) can be assessed by comparison to the sample-specific background mutation rates in the specific nucleotide context (for example, background rate of C-to-T tran-sitions in CG dinucleotides often differs from the rate of mutations in A nucleotides27,28) and correcting for multiple hypotheses testing (that is, the higher chance of observing unlikely events when looking across many genes). Various computational tools have been devel-oped to attempt to assess functional significance of a mutation. These tools predict the effect of an amino acid change on the protein structure and function, and some tools (for example, SIFT, CanPredict, PolyPhen and CHASM81–85) aim to distinguish ‘driver’ from ‘pas-senger’ alterations. In general, experimental validation of the function of mutations by approaches such as transformation assays86 is the most powerful method; however, functional validation is limited because there is no suite of functional assays that are suitable for assessing all the types of pathways that can be altered in cancer.

Copy number. Array-based measurements have proven to be a powerful approach to determine the pattern of copy number alterations in cancer, from the gain or loss of chromosome arms to focal amplifications and deletions that might range from tens of kilobases

to tens of megabases in size87–94. Sequence-based approaches to copy number were applied even before the development of second-generation sequencing technologies, using digital karyotyping approaches95,96, which are based on sequencing large numbers of short sequence tags97.

Second-generation sequencing methods offer sub-stantial benefits for copy number analysis, including higher resolution (up to the level of the single-base insertion or deletion) and precise delineation of the breakpoints of copy number changes22,26,98. The digital nature of second-generation sequencing allows us to estimate the tumour-to-normal copy number ratio at a genomic locus by counting the number of reads in both tumour and normal samples at this locus. Unlike array-based measurements, counting sequences does not suffer from saturation and therefore allows accu-rate estimation of high copy number levels. It is, how-ever, affected by sequencing biases caused by sequence context, such as GC content.

Genome-wide sequence-based methods are par-ticularly valuable for copy number changes of between approximately 100 and 1000 bases — reflecting the maximum size that can be easily detected by PCR-based locus-specific sequencing and the minimum current resolution limit of array technologies, respectively. Copy number measurements by sequencing also allow defini-tion of the sequence on the other side of the breakpoint.

Nature Reviews | Genetics

Gain

Copy number alterations

Non-humansequence

Reference sequenceChr 1 Chr 5

cccccc

ccc

cc

cccc

A

Point mutationHomozygousdeletion

Hemizygousdeletion

Translocationbreakpoint Indel Pathogen

(KIWTG���^�6[RGU�QH�IGPQOG�CNVGTCVKQPU�VJCV�ECP�DG�FGVGEVGF�D[�UGEQPF�IGPGTCVKQP�UGSWGPEKPI��5GSWGPEGF�HTCIOGPVU�CTG�FGRKEVGF�CU�DCTU�YKVJ�EQNQWTGF�VKRU�TGRTGUGPVKPI�VJG�UGSWGPEGF�GPFU�CPF�VJG�WPUGSWGPEGF�RQTVKQP�QH�VJG�HTCIOGPV�KP�ITG[��4GCFU�CTG�CNKIPGF�VQ�VJG�TGHGTGPEG�IGPQOG�HQT�GZCORNG��OQUVN[�EJTQOQUQOG���KP�VJKU�GZCORNG���6JG�EQNQWTU�QH�VJG�UGSWGPEGF�GPFU�UJQY�YJGTG�VJG[�CNKIP�VQ��&KHHGTGPV�V[RGU�QH�IGPQOKE�CNVGTCVKQPU�ECP�DG�FGVGEVGF��HTQO�NGHV�VQ�TKIJV��RQKPV�OWVCVKQPU�KP�VJKU�GZCORNG�#�VQ�%��CPF�UOCNN�KPUGTVKQPU�CPF�FGNGVKQPU�KPFGNU��KP�VJKU�GZCORNG�C�FGNGVKQP�UJQYP�D[�C�FCUJGF�NKPG��CTG�FGVGEVGF�D[�KFGPVKH[KPI�OWNVKRNG�TGCFU�VJCV�UJQY�PQP�TGHGTGPEG�UGSWGPEG��EJCPIGU�KP�UGSWGPEKPI�FGRVJ�TGNCVKXG�VQ�C�PQTOCN�EQPVTQN��CTG�WUGF�VQ�KFGPVKH[�EQR[�PWODGT�EJCPIGU�UJCFGF�DQZGU�TGRTGUGPV�CDUGPV�QT�FGETGCUGF�TGCFU�KP�VJG�VWOQWT�UCORNG���RCKTGF�GPFU�VJCV�OCR�VQ�FKHHGTGPV�IGPQOKE�NQEK�KP�VJKU�ECUG��EJTQOQUQOG����CTG�GXKFGPEG�QH�TGCTTCPIGOGPVU��CPF�UGSWGPEGU�VJCV�OCR�VQ�PQP�JWOCP�UGSWGPEGU�CTG�GXKFGPEG�HQT�VJG�RQVGPVKCN�RTGUGPEG�QH�IGPQOKE�OCVGTKCN�HTQO�RCVJQIGPU�

4'8+'95

NATURE REVIEWS | )'0'6+%5� VOLUME 11 | OCTOBER 2010 | ���

© 20 Macmillan Publishers Limited. All rights reserved10

Myerson, Gabriel & Getz Nature Reviews Genetics 11 :685 (2010)

Page 5: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Histoire

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros v

to be involved in tumorigenesis, were revealed for a cancer. Thefact that the most frequently mutated genes they observed, APC,TP53, and KRAS for colon cancer and TP53 for breast cancer, re-capitulated what was already known, validated the approachand paved the way for expanded application of genome-scalesequencing.

The introduction of DNA sequence enrichment technologiesfrom NimbleGen and Agilent (Albert et al. 2007; Gnirke et al. 2009)enabled WES on large scales. WES has additional advantages overWGS in that the average depth of coverage is about fivefold greater,and the cost of sequencing, data processing and storage are allmuch less. Given the relative tractability of interpreting variationin the coding sequence compared to intergenic or intronic muta-tions, the period between 2004 and 2013 has seen a profusion oftumor types analyzed in large cohorts (100–500 patients), mainlyby WES (see http://www.sanger.ac.uk/genetics/CGP/cosmic/papers/for a comprehensive listing). WGS for a variety of tumors has alsobeen reported and, in spite of the smaller numbers of patients, hasled to surprising insights into cancer biology, based largely onanalysis of structural variation in tumor genomes. Using WGS,genetic alterations observed in the DNA of the cancer cell spansix orders of magnitude, from single-base point mutations tochromosome-scale amplification, using different modes of se-quence analysis (see Chin et al. 2011) available today.

With these tools in hand, The Cancer Genome Atlas (TCGA)(http://cancergenome.nih.gov/), the Cancer Genome Project(http://www.sanger.ac.uk/genetics/CGP/), the International CancerGenome Consortium (ICGC) (Hudson et al. 2010), TherapeuticallyApplicable Research to Generate Effective Treatments (http://target.cancer.gov/), and other privately funded large-scale pro-jects (Downing et al. 2012) began in earnest to systematically

catalog all the mutations in a wide variety of adult and pediatriccancers (see Garraway and Lander 2013 for a recent tally of large-scale projects).

WGS and WES sequencing have been augmented by cDNAsequencing (referred to as RNA-seq) to explore alterations tothe transcriptome. RNA-seq provides not only gene expressionlevels, but also aberrant splicing, chimeric gene fusion transcriptscharacteristic of cancer cells and expressed somatic mutations(Bainbridge et al. 2006; Dong et al. 2009; Maher et al. 2009; Shahet al. 2009; Berger et al. 2010; Tuch et al. 2010; Wang et al. 2012).Analysis of chromatin modification is in its infancy as applied tothe cancer cell, but the recent reporting of the ENCODE ProjectConsortium’s genome-wide results (The ENCODE Project Con-sortium 2012) may provide the tools and technologies to enablenew approaches. The technology behind DNA sequencing is im-proving rapidly in accuracy, cost reduction, and speed, makingadvances in cancer biology and clinical testing, all based on anal-ysis of the primary sequence of the tumor genome, an essentialstrategy in the war on cancer. However, the coordinated acquisi-tion and integrated interpretation of all this data has been possiblebecause of a reference genome for comparison. What have welearned so far?

Mutation frequencies

By patient

The median frequency of point mutation varies over more thanthree orders of magnitude across human tumors; within a giventumor type, the variation in frequency is about one order ofmagnitude (Fig. 2A). The variation in mutation frequency is

Figure 1. Major events in a decade of cancer genomics. (Dark blue) Major advances in massively parallel sequencing platforms and targeted en-richment technologies; (black) major large-scale projects designed to catalog genomic variations of normal human individuals; (red) cancer genomics.(dbSNP) Database of single nucleotide polymorphism; (HapMap) haplotype map of the human genome; (ENCODE) Encyclopedia of DNA Elements;(COSMIC) Catalog of Somatic Mutations in Cancer; (TCGA) The Cancer Genome Atlas; (GA) genome analyzer; (CRC) colorectal carcinoma; (WES) whole-exome sequencing; (ICGC) International Cancer Genome Consortium; (TSP) tumor sequencing project; (AML) acute myeloid leukemia; (WGS) whole-genome sequencing; (OSCC) ovarian small cell carcinoma.

Genome Research 1055www.genome.org

Genomics and cancer

Cold Spring Harbor Laboratory Press on December 13, 2013 - Published by genome.cshlp.orgDownloaded from

Wheeler & Wang Genome Research 23 :1054 (2012)

Page 6: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Inference

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros vi

En general, on veut sequencer le cancer et le genome normal pour comparaison

Nature Reviews | Genetics

DNA isolation

Elute Elute

Hybridization

Gene-specificoligonucleotides(baits)

Normal DNA(pond)

Normal DNA

+

DNA isolation

Matchednormal(blood)

Tumour material

+

Sequencing

Alignment

Sequencing

Alignment

AA

AA

AA

AA

Somatic mutation ‘A’, evidence in tumour, none in normal

Tumour DNA(pond)

Tumour DNA

Gene(referencesequence)

Hybridization

Gene-specificoligonucleotides(baits)

(KIWTG���^�5GSWGPEG�ECRVWTG�HQT�ECPEGT�IGPQOKEU��#�UEJGOCVKE�FKCITCO�QH�J[DTKF�UGNGEVKQP�VQ�ECRVWTG�URGEKHKE�TGIKQPU�QH�VJG�IGPQOG�HTQO�VWOQWT�&0#�NGHV�RCPGN��DNWG��CPF�PQTOCN�&0#�TKIJV�RCPGN��TGF���&0#�HTQO�VJG�UVCTVKPI�OCVGTKCN�VJG�nRQPFo��KU�UJGCTGF�CPF�J[DTKFK\GF�VQ�QNKIQPWENGQVKFGU�VJCV�CTG�URGEKHKE�HQT�VJG�TGIKQPU�QH�KPVGTGUV�HQT�GZCORNG��GZQPU�KP�IGPGU�HTQO�C�RCTVKEWNCT�RCVJYC[�QT�VJG�YJQNG�GZQOG��VJG�nDCKVUo���6JG�DCKVU�JCXG�C�VCI�VJCV�CNNQYU�VJGO�VQ�DG�KUQNCVGF�HQT�GZCORNG��D[�KOOQDKNK\CVKQP�QP�DGCFU���6JG�ECRVWTGF�&0#�KU�GNWVGF��RTGRCTGF�KPVQ�UGSWGPEKPI�NKDTCTKGU��UGSWGPEGF�CPF�CNKIPGF�VQ�VJG�DCKV�UGSWGPEGU��$GECWUG�VJKU�VGEJPKSWG�CNNQYU�ITGCVGT�FGRVJ�QH�EQXGTCIG�HQT�VJG�TGIKQPU�QH�KPVGTGUV��UQOCVKE�OWVCVKQPU�KP�VJG�VWOQWT�&0#�ECP�DG�FGVGEVGF�HTQO�CFOKZGF�RQRWNCVKQPU�EQPVCKPKPI�VWOQWT�CPF�PQTOCN�&0#�FGTKXGF�TGCFU�

4'8+'95

NATURE REVIEWS | )'0'6+%5� VOLUME 11 | OCTOBER 2010 | ���

© 20 Macmillan Publishers Limited. All rights reserved10

difficultes :

? matched normal contient des cellules somatiques (→ mutations dans la ligneegerminale ?)

? echantillon de tumeur est mixte (normal+lignees clonales)? loss-of-heterozygosity? erreurs du mappeur

Myerson, Gabriel & Getz Nature Reviews Genetics 11 :685 (2010)

Page 7: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Genotypage joint — JointSNVMix

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros vii

Copyedited by: ES MANUSCRIPT CATEGORY: ORIGINAL PAPER

[12:29 12/3/2012 Bioinformatics-bts053.tex] Page: 908 907–913

A.Roth et al.

Screening the set of predicted SNVs in a tumour against databasessuch as dbSNP (Sherry et al., 2001) provides one method to addressthis issue. The challenge with this approach is that there are 3–15million SNVs per individual; early results from the 1000 genomesindicate that 10–50% of these are novel events (Durbin et al., 2010).This suggest that possibly millions of SNVs in a single individualwill be uncatalogued in polymorphism databases. These SNVs willbe falsely identified as somatic mutations if the primary strategyfor distinguishing somatic and germline events is screening againstpublic databases. In the future, as SNV databases become morecomprehensive the fraction of novel SNVs found in an individualwill decrease. However, even if databases were to capture 99%of all germline SNVs present in an individual and that individualcarried 5 million SNVs, 50 000 SNVs would remain uncatalogued.This number is likely on the same order as the number of somaticmutations present in a tumour. Hence, there is a danger that thesomatic mutations signal in a dataset could be overwhelmed by thesignal from these germline events.

A more robust approach to identifying somatic mutations isto sequence a paired sample of DNA from normal and tumourtissue from the same patient. The normal tissue can then act asa control against which SNVs detected in the tumour can bescreened. A number of methods for discovering SNVs in NGS datahave been developed (DePristo et al., 2011; Goya et al., 2010;Koboldt et al., 2009; McKenna et al., 2010). Tools specificallytailored to somatic mutation discovery in normal/tumour pairs areunder-represented in the literature [although we note very recentexceptions (Ding et al., 2012; Larson et al., 2012)]. As such, ad hocapproaches for detecting somatic mutations involve using standardSNV discovery tools on the normal and tumour samples separatelyand then contrasting the results post hoc using so-called ‘subtractive’analysis. However, due to technical sources of noise, variant allelesin both tumour and normal samples can be observed at frequenciesthat are less than expected and can be difficult to detect. We showthat ad hoc methods would result in premature thresholding ofreal signals and, in particular, result in loss of specificity whendetecting somatic mutations. We propose that simultaneous analysisof tumour and normal datasets from the same individual will likelyresult in an increased ability to detect shared signals (arising fromgermline polymorphisms or technical noise). Moreover, we expectthat real somatic mutations that emit weak observed signals can bemore readily detected if there is strong evidence of a non-variantgenotype in the normal sample. Therefore, our hypothesis is thatjoint modelling of a tumour–normal pair will result in increasedspecificity and sensitivity compared with independent analysis.

To address this question, we developed a novel probabilisticframework called JointSNVMix to jointly analyse tumour–normalpair sequence data for cancer studies and a suite of more standardcomparison methods based on independent analyses and frequentiststatistical approaches. We show how the JointSNVMix methodallows us to better capture the shared signal between samples andremove false positive predictions caused by miscalled germlineevents, owing to statistical strength that can be borrowed betweendatasets. The article outline is as follows: in Sections 2.1–2.4 we formulate the problem, describe the JointSNVMixprobabilistic model and discuss our implementation of the learningalgorithm. Section 2.5 describes synthetic benchmark datasetsand data obtained from 12 previously published diffuse largeB-cell lymphomas (DLBCL) cases using a tumour–normal pair

experimental design (Morin et al., 2011). Ten of these cases weresequenced to ∼30× aligned coverage in tumour and normal usingwhole genome shotgun sequencing. For the remaining two samples,∼8 GB were sequenced in tumour and normal using exon capturesequencing. Section 2.6 describes the comparison methods weimplemented in this study. Section 3 shows how our approach resultsin increased specificity without loss of sensitivity when comparedwith independent standard analysis. Finally, in Section 4, we discusslimitations to our method and propose future directions for theapproach of simultaneous analysis of multiple-related NGS cancersamples.

2 METHODS

2.1 Problem formulationGiven tumour–normal paired allelic counts obtained from NGS sequencedata aligned to the human reference genome, we focus on the problem ofidentifying the joint-genotype (see below) of the samples at every location inthe data with coverage. For simplicity, and following standard convention,we imagine that each position has only two possible alleles, A and B. Theallele A indicates that the nucleotide at a position matches the referencegenome and B indicates that the nucleotide is a mismatch. In NGS data,we can measure the presence of these alleles using binary count data thatexamines all reads at a given site i and counts the number of matches, ai, andmismatches, bi (Goya et al., 2010). In Figure 1, we see how this formalismcan be extended to tumour–normal paired samples.

For a diploid genome, we consider all pairs of alleles that gives rise to theset, G={AA,AB,BB}, the set of diploid genotypes. Now given two diploidsamples, the set of possible joint-genotypes consists of all combinations ofdiploid genotypes, which is equivalent to the Cartesian product of G withitself, i.e. G×G={(gN ,gT ) :gN ,gT ∈G}.

We assume the joint genotype of a given position can be mapped ontothe more biologically interpretable set of marginal genotypes accordingto Table 1. This can be done by assigning the joint genotype to the most

Fig. 1. Hypothetical example of the JointSNVMix analysis process. Readsare first aligned to the reference genome (green). Next the allelic counts,which are the number of matches and depth of reads at each position aretabulated. Allelic count information can then be used to identify germline(blue) and somatic positions (red). At the bottom of the Figure, we show thehypothetical probabilities of the nine joint genotypes based on the count datafor the somatic position (AA, AB).

908

at Universite de M

ontreal on April 10, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

Copyedited by: ES MANUSCRIPT CATEGORY: ORIGINAL PAPER

[12:29 12/3/2012 Bioinformatics-bts053.tex] Page: 909 907–913

JointSNVMix

probable state, or marginalizing together the joint genotype probabilities fora given state. As an example of marginalization, we compute P(Somatic)=P((AA,AB))+P((AA,BB)), i.e. the sum of probabilities of a wild-typegenotype in the normal data and a variant genotype in the tumour data.

2.2 JointSNVMix modelsJointSNVMix1 and JointSNVMix2 are generative probabilistic models thatdescribe the joint emission of the allelic count data observed at position iin the normal and tumour samples. Figure 2 shows the graphical modelsrepresenting JointSNVMix1 and JointSNVMix2. A complete description ofthe notation and model parameters is given in Table 2.

Table 1. The nine possible joint genotypes and their associated mappingsonto biologically interpretable marginal genotypes

gN\gT AA AB BB

AA Wild-type Somatic SomaticAB LOH Germline LOHBB Errora Error Germline

Wild-type [no change: P(AA,AA)], Somatic [wild-type normal and variant tumour:P(AA,AB)+P(AA,BB)], Germline [variant normal and tumour: P(AB,AB)+P(BB,BB)]and loss of heterozygosity [LOH–heterozygous normal and homozygous tumour:P(AB,AA)+P(AB,BB)].aWe treat the joint genotypes (BB,AB) and (BB,AA) as errors since this would implythat a homozygous variant mutates back to the reference base, which is a possible, butunlikely event. It is more plausible that these cases are simply errors due to alignmentor base calling.

We introduce a random variable Gi as a Multinomial indicator vectorrepresenting the joint genotype of the samples. More explicitly Gi =(Gi

(AA,AA),Gi(AA,AB),...,G

i(BB,BB)) where Gi

(gN ,gT ) =1 if the joint genotype of

position i is (gN ,gT ), and Gi(gN ,gT ) =0 otherwise. We assume the count data

from the two samples are jointly emitted from Gi thus capturing correlationsbetween the variables, and allowing statistical strength to be borrowed acrossthe samples. This is the key insight that differentiates this model from runningan independent analysis of each sample and joining the inferred genotypespost hoc.

Given the joint genotype of the sample, we model the normal andtumour sample as being conditionally independent. For JointSNVMix1, theconditional distribution for each sample is modelled as a three componentmixture of Binomial densities, where the densities correspond to thegenotypes AA,AB,BB. These conditional densities are the same as used bySNVMix1 model (Goya et al., 2010). For JointSNVMix2, the conditionaldensities are the same as SNVMix2 (Goya et al., 2010), which allows forthe incorporation of base and mapping quality information. A completedescription of the model is available in the Supplementary Material.

2.3 Inference and parameter estimationWe use the expectation maximisation (EM) algorithm to perform maximum aposteriori (MAP) estimation of the values of the model parameters and latentvariables. One could hand-set parameters of the model to intuitive values;however, we expect that fitting the model will allow for sample-specificadjustments to inter-experimental technical variability and inter-samplevariation from tumour–normal admixture (so called tumour cellularity) in thetumour samples. A full derivation of the update equations for JointSNVMix1and JointSNVMix2 is given in the Supplementary Material.

(a) (b)

Fig. 2. Probabilistic graphical model representing the (a) JointSNVMix1 and (b) JointSNVMix2 model. Shaded nodes represent observed values or fixedvalues, while the values of unshaded nodes are learned using EM. Only the distributions for the normal are shown below, the tumour distributions are thesame. We have defined f (q|a,z)=z[qa+(1−q)(1−a)]+0.5(1−z) and g(r|z)=zr+(1−z)(1−r). Description of all random variables is given in Table 2.

909

at Universite de M

ontreal on April 10, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

LOH=loss-of-heterozygosity? inference par probabilite posterieure d’etat joint? ignore impurete? ignore QUAL

? ignore MAPQ

Roth & al Bioinformatics 28 :907 (2012)

Page 8: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Genotypage somatique

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros viii

bcp de difference entre les outils

designed to remove sequencing and mapping errors. Sites wereremoved if any of the following criteria were met (consideringvariant bases in the cancer sample for somatic candidates andvariant bases in the normal sample for LOH candidates): variantbases emanate exclusively from one strand; mean variant basequality is less than 15; no variant has base quality over 30; meanvariant mapping quality is less than 15; no variant has mappingquality over 40; more than two candidate SNVs (identified byany algorithm) are within 50 bp either side; spanning deletionscontribute420% to the overall depth in either sample; or can-didates are immediately adjacent to indels in420% of reads ineither sample. A detailed breakdown of filtering results is avail-able in the Supplementary Information.By far the most significant filtering metric was the requirement

to have at least one relevant variant base on each strand, with5437 out of 10 438 candidates removed for having 100% strandbias. Although this filter was designed to remove systematicsequencing errors, complete strand bias can occur solely as aresult of random sampling between the two strands, especiallyat low depths. For a site with a total of v variant reads, thechance that all of those reads occur on one strand is 2! 0:5v.Using these per-site strand bias probabilities for this dataset, theexpected number of sites with 100% strand bias is 1272, less thana quarter of the number actually observed. When applying thesame methods to whole genome sequencing cancer–normal data(not shown), the expected number of sites with 100% strand biaswas less than half the number observed. The relative extent ofstrand bias may be greater in exome sequencing because of theexome capture design tending to cover sequences just outside thetargeted regions from one direction (and one strand) only.However, the profusion of strand biased variants in bothexome and whole genome data supports the descriptions of sys-tematic errors of sequencing by Nakamura et al. (2011) andMeacham et al. (2011).There were significant differences between the filter pass rates

of the four output sets, as presented in Table 1. The raw outputfrom Strelka was least susceptible to these indicators of sequen-cing and mapping error, while the output from JSM2 was mostsusceptible.

3.3 Comparison and characterization of candidate sites

After filtering, 2920 candidate sites remained, including 812 som-atic and 475 LOH VarScan candidates; 862 somatic and 85 LOH

SomaticSniper candidates; 470 somatic and 455 LOH JSM2 can-didates; and 268 somatic and 28 LOH Strelka candidates, withoverlaps between the four sets of somatic candidates illustratedin Figure 3.Figures 4 and 5 present some effects of increasing each algo-

rithm’s somatic probability score threshold for inclusion to one,and thus reducing the number of candidate sites in their outputto the top calls.Figure 4 illustrates that the proportion of somatic sites found

by any other caller (at any probability threshold) improves as thecandidate sets are reduced to their top calls. VarScan’s top 22somatic candidates were all returned with probability 1.00 by allfour callers. Of SomaticSniper’s top 49 candidates, 11 were notreturned at any probability level by the other three algorithms,

Fig. 4. Proportion of somatic sites found by multiple callers as the prob-ability score threshold of each caller is increased to 1.0 and the number ofcandidate sites reduces

Fig. 3. Overlaps between somatic SNV candidate sets in the filteredoutput for the CML exome

Table 1. Pass rates (%) of candidate sites (somatic and LOH) throughthe strand bias filter and the combination of all other filters

Algorithm name Strand bias All other filters

VarScan 50.3 58.3SomaticSniper 50.4 66.2JSM2 45.8 47.9Strelka 70.9 89.9

The combined filters for variant base and mapping quality, nearby SNVs, spanningdeletions and adjacent indels were applied after the removal of sites with 100%strand bias.All differences are significant, except between VarScan and SomaticSniper with thestrand bias filter.

2226

N.D.Roberts et al.

Roberts & al Bioinformatics 29 :2223 (2013)

Page 9: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Etudes a grande echelle

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros ix

overexpression, we systematically searched for mutually exclusivegenomic events using the MEMo method29. We found a pattern ofnear exclusivity (corrected P , 0.01) of IGF2 overexpression withgenomic events known to activate the PI3K pathway (mutations ofPIK3CA and PIK3R1 or deletion/mutation of PTEN; Fig. 3c andSupplementary Table 5). The IRS2 gene, encoding a protein linkingIGF1R (the receptor for IGF2) with PI3K, is on chromosome 13, whichis frequently gained in CRC. The cases with the highest IRS2 expressionwere mutually exclusive of the cases with IGF2 overexpression(P 5 0.04) and also lacked mutations in the PI3K pathway(P 5 0.0001; Fig. 3c). These results strongly suggest that the IGF2–IGF1R–IRS2 axis signals to PI3K in CRC and imply that therapeutictargeting of the pathway could act to block PI3K activity in this subsetof patients.

TranslocationsTo identify new chromosomal translocations, we performed low-pass,paired-end, whole-genome sequencing on 97 tumours with matchednormal samples. In each case we achieved sequence coverage of,3–4-fold and a corresponding physical coverage of 7.5–10-fold.Despite the low genome coverage, we detected 250 candidateinterchromosomal translocation events (range, 0–10 per tumour).Among these events, 212 had one or both breakpoints in an intergenicregion, whereas the remaining 38 juxtaposed coding regions of twogenes in putative fusion events, of which 18 were predicted to code forin-frame events (Supplementary Table 6). We found three separatecases in which the first two exons of the NAV2 gene on chromosome11 are joined with the 39 coding portion of TCF7L1 on chromosome 2(Supplementary Fig. 5). TCF7L1 encodes TCF3, a member of theTCF/LEF class of transcription factors that heterodimerize withnuclear b-catenin to enable b-catenin-mediated transcriptional regu-lation. Intriguingly, in all three cases, the predicted structure of theNAV2–TCF7L1 fusion protein lacks the TCF3 b-catenin-bindingdomain. This translocation is similar to another recurrent transloca-tion identified in CRC, a fusion in which the amino terminus ofVTI1A is joined to TCF4, which is encoded by TCF7L2, a homologueof TCF7L1 that is deleted or mutated in 12% of non-hypermutatedtumours4. We also observed 21 cases of translocation involvingTTC28 located on chromosome 22 (Supplementary Table 6). In all

cases the fusions predict inactivation of TTC28, which has been iden-tified as a target of P53 and an inhibitor of tumour cell growth30.Eleven of the 19 (58%) gene–gene translocations were validated byobtaining PCR products or, in some cases, sequencing the junctionfragments (Supplementary Fig. 5).

Altered pathways in CRCIntegrated analysis of mutations, copy number and mRNA expressionchanges in 195 tumours with complete data enriched our understand-ing of how some well-defined pathways are deregulated. We groupedsamples by hypermutation status and identified recurrent alterationsin the WNT, MAPK, PI3K, TGF-b and p53 pathways (Fig. 4,Supplementary Fig. 6 and Supplementary Table 1).

We found that the WNT signalling pathway was altered in 93% ofall tumours, including biallelic inactivation of APC (SupplementaryTable 7) or activating mutations of CTNNB1 in ,80% of cases. Therewere also mutations in SOX9 and mutations and deletions in TCF7L2,as well as the DKK family members and AXIN2, FBXW7 (Supplemen-tary Fig. 7), ARID1A and FAM123B (the latter is a negative regulatorof WNT–b-catenin signalling12 found mutated in Wilms’ tumour31).A few mutations in FAM123B have previously been described inCRC32. SOX9 has been suggested to have a role in cancer, but nomutations have previously been described. The WNT receptorfrizzled (FZD10) was overexpressed in ,17% of samples, in someinstances at levels of 1003 normal. Altogether, we found 16 differentaltered WNT pathway genes, confirming the importance of thispathway in CRC. Interestingly, many of these alterations were foundin tumours that harbour APC mutations, suggesting that multiplelesions affecting the WNT signalling pathway confer selective advantage.

Genetic alterations in the PI3K and RAS–MAPK pathways arecommon in CRC. In addition to IGF2 and IRS2 overexpression, wefound mutually exclusive mutations in PIK3R1 and PIK3CA as well asdeletions in PTEN in 2%, 15% and 4% of non-hypermutated tumours,respectively. We found that 55% of non-hypermutated tumours havealterations in KRAS, NRAS or BRAF, with a significant pattern ofmutual exclusivity (Supplementary Fig. 6 and Supplementary Table 1).We also evaluated mutations in the erythroblastic leukemia viraloncogene homolog (ERBB) family of receptors because of the trans-lational relevance of such mutations. Mutations or amplifications in

Proliferation, stem/progenitor phenotype

Nucleus

APC81% 53%

FZD1019% 13%

FBXW710% 43%

FAM123B7% 37%

TCF7L212% 30%

ARID1A5% 37%

AXIN24% 23%

SOX94% 7%

CTNNB15% 7%

CTNNB15% 7%

MYC

LRP5

WNT

TCF7

WNT signalling

Per cent of cases (%)

Inactivated050 50

Activated

Frequency

Non-hypermutated Hypermutated

Upregulated

TGF-β signalling

TGF-β Activin

TGFBR1<1% 17%

TGFBR22% 43%

ACVR2A1% 60%

ACVR1B4% 20%

SMAD26% 13%

SMAD32% 17%

SMAD415% 20%

Proliferation, cell survival, translation

IGF222% 0%

IGF1R

IRS27% 3%

ERBB26% 13%

ERBB34% 20%

PIK3CA15% 37%PIK3R1

2% 17%

PTEN4% 20%

NRAS10% 10%

KRAS43% 30%

BRAF3% 47%

DNAreplication

stressOncogenic

stressProliferationCell survival

ATM7% 37%

TP5359% 17%

PI3K signalling

P53 signalling

Protein activation

Protein inhibition

Transcriptional activation

Transcriptional inhibitionComplex

<1% 10%

DKK1-44% 33%

Altered92% 97%

Altered27% 87%

Altered50% 53%

Altered64% 47%

RTK–RAS signalling

Altered59% 80%

nHM H

M

nHM H

M

nHM H

M

nHM H

M

nHM H

M

WNT

TP53

RTK/RASPI3K

TGF-E

Hypermutated tumoursNon-hypermutated tumoursPathway alteration pattern Pathway activated Pathway inactivated

Figure 4 | Diversity and frequency of genetic changes leading toderegulation of signalling pathways in CRC. Non-hypermutated (nHM;n 5 165) and hypermutated (HM; n 5 30) samples with complete data wereanalysed separately. Alterations are defined by somatic mutations, homozygousdeletions, high-level focal amplifications, and, in some cases, by significant

up- or downregulation of gene expression (IGF2, FZD10, SMAD4). Alterationfrequencies are expressed as a percentage of all cases. Red denotes activatedgenes and blue denotes inactivated genes. Bottom panel shows for each sampleif at least one gene in each of the five pathways described in this figure is altered.

ARTICLE RESEARCH

1 9 J U L Y 2 0 1 2 | V O L 4 8 7 | N A T U R E | 3 3 3

Macmillan Publishers Limited. All rights reserved©2012

TCGA Nature 487 :330 (2012)

Page 10: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Rearrangements genomiques

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros x

breakage/fusion/bridge

B/F/B cycles can loopout to formDMchromosomes.(34,40) The

amplified regions can then be transferred to other chromo-

somes, since DM chromosomes can reintegrate at other

locations.(43,104,105) Amplified regions can also be transferred

to other chromosomes following fusion of the chromosome

containing the amplified DNA with other chromosomes

(Fig. 2).(21) As a result, many amplification events resulting

from B/F/B cycles in cancer cells will not be located on the end

of the chromosomeonwhich they originated, and thereforewill

not be recognizable ashaving originated throughB/F/Bcycles.

The initiation of gene amplification by B/F/B cycles would

explain why inverted repeats are commonly observed in

amplified regions in human cancer cells.(106)

A fourth type of rearrangement caused by telomere loss

is translocation, which can be either duplicative or nonreci-

procal, both of which are commonly associated with human

cancer.(73,86) The analysis of cells actively undergoing B/F/B

cycles demonstrates that these translocations are one of the

most commonmechanisms for telomere addition during B/F/B

cycles (unpublished observation). The presence of short

Figure 2. Types of chromosome rearrangements resulting from telomere loss andB/F/B cycles. The first event following telomere loss isdegradationof theendof the chromosome,which is then followedbyeither theadditionof anew telomereproducing relatively small terminaldeletions, or fusion of sister chromatids after DNA replication. Due to the presence of two centromeres, the fused sister chromatids thenbreak during anaphase, leading to the formation of inverted repeats on the end of the chromosome in one daughter cell and a terminaldeletion in the other. If the chromosome fails to acquire a new telomere, these chromosomeswill again bewithout a telomere in the next cellcycle, leading to additionalB/F/Bcyclesand furtherDNAamplification.B/F/B cyclescan lead to nonreciprocal translocations,which result inthe loss of the telomere on the donor chromosome, transferring theB/F/B cycles to these chromosomes. Fusion of themarker chromosomewith other chromosomes can also result in transfer of the amplified region to these chromosomes, which can then also initiate B/F/B cycles.Finally, looping out of the amplified DNA as a result of the inherent instability of inverted repeats can also lead to the formation of DMchromosomes that can be involved in high-copy gene amplification or reintegrate back into other chromosomes.

Review articles

1170 BioEssays 26.11

Murnane & Sabatier Bioessays 26 :1164 (2004)

Page 11: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

End-sequence profiling

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xi

1. sequencer les extremites des chormosomes artificiels du genome de tumeur

2. mapping

3. inferer les rearrangements

B.Raphael et al.

5. × 108 1. ×109 1.5 ×109 2. ×109 2.5 ×109 3. × 109

5. ×108

1. ×109

1.5 ×109

2. ×109

2.5 ×109

3. ×109

(a)

1.1148 ×108 1.1154 ×108 1.116 ×108

2.541 ×109

2.54103 ×109

2.54106 ×109

2.54109 ×109

(c)

5.506 ×108 5.5105 ×108 5.515 ×108

5.505 ×108

5.52 ×108

5.535 ×108

(b)

Fig. 1. (a) ESP data from the MCF7 tumor genome (June 1, 2003 dataset). Each point (x, y) corresponds to a BES pair, where x and yare the genomic coordinates in the human genome of the first and second read from the pair. The chromosomes are concatenated to form asingle coordinate system, and for illustrative purposes, points are drawn in exaggerated scale. With maximum insert length L = 200 kb, 5856out of a total of 6239 BES pairs satisfy the BAC length constraint (black points), while the remaining 383 invalid BES pairs correspond tocomposite and chimeric BACs. 256 out of 383 invalid BES pairs are isolated (red points), while 127 of the remaining invalid BES pairs form30 clusters that suggest composite BACs (shown by slightly enlarged blue points that represents two or more invalid BES pairs). 5 out of 30clusters (containing 15 BES pairs) are located near the main diagonal with y − x varying from L = 200 kb to 1.2 Mb. Such BES pairs maybe signs of microrearrangements in the tumor genome (compare with Pevzner and Tesler (2003a)). Note that due to scaling issues, multipleclusters may appear as a single blue point in the figure. (b) Expanded view of region from chromosome 3. The two blue points form a clusterthat appeared as a single blue point in (a). (c) Expanded view of a cluster of BES pairs that indicate a chromosome 1;17 translocation.

x1 y1

1G

x2

y2

x1 y1

GB C

E

A

-C -BAx2 x3 y3

(a)

x1 y1 y3 x3G' =ρG

-C BAx2

y2x3 y3

-D

y2

E

2

ρ1

E

D

D

ρ

x1 y1

1G

x1 y1

G

-B

CA

A

x1 y1G' =ρG

A

y2x3 y3

D

Eρ 2

ρ1

x2

B

B -C E

y2

y3 x3

x2

x2 y2

x3 y3

-C

-D

D

(b)

E

Fig. 2. Schematic view of ESP data with three BES pairs shown asarcs connecting BES elements represented as colored squares. (a)A sequence of two reversals ρ1, ρ2 produces G′ = ρG with allBES pairs valid, where ρ = ρ1 · ρ2. (b) A different sequence oftwo reversals ρ1, ρ2 produces G′ = ρG with all BES pairs valid,where ρ = ρ1 · ρ2. Thus, more than one rearrangement scenario isconsistent with ESP data.

invalid BES pair in a single step. We say that BES pairs(x1, y1) and (x2, y2) are correlated if |x2 − x1| + |y2 −y1| ≤ 2L . It is easy to see that if (x1, y1) and (x2, y2)are invalid BES pairs and there exists ρ = ρs,t suchthat (ρx1, ρy1) and (ρx2, ρy2) are valid, then (x1, y1)and (x2, y2) and correlated. The converse is also true iscertain cases, and one can construct examples of chainsof correlated pairs. Our heuristic approach to analyzingMCF7 data is based on the observation that compositeBACs sharing rearrangement breakpoints for the samereversals/translocations typically produce correlated BESpairs.† If an ESP project has a high BAC coverage, mostpairs of rearrangement breakpoints are covered by twoor more BACs, providing the possibility to recover pairsof breakpoints that describe all (or almost all) individualrearrangements. However, information about pairs of‘correlated’ breakpoints is not sufficient for reconstructionof genomic architecture. For example, a correlated pair

† At the time of writing this paper, the complexity status of the ESP SortingProblem remains unknown. However, the sparse nature of the existing ESPdata allowed us to come up with a heuristic that works well and leadsto provably optimal solutions for existing ESP data. At the same time,our heuristic approach might fail in the case of ESP experiments withtens/hundreds of thousands of BACs, and more rearranged genomes thanMCF7.

ii164

B.Raphael et al.

1 1 1 1 1 1 1

1

2 2

2

3 3 3 3 3 3 3 3 3

3

9 9 9 9

9

10 10 10 10 10

10

14 14

14

17 17 17 17 17 17 17

17

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

20

22 22

22

1.1 20.8 20.2 20.9 1.2 17.2 3.5 17.4 20.6 20.15

2.1 1.6 1.4 3.3 3.7 20.4 20.14 20.7 20.1

3.1 9.3 9.1

10.1 10.4 10.3 10.2 10.5

14.1 22.2

17.1 3.6 20.5 17.5 20.11 3.9

22.1 14.2

1.7 2.2

9.4 3.2 3.8 20.10 20.3 20.13 17.6 20.12 17.7

Fig. 6. The nine human chromosomes (left) that are rearranged in the MCF7 tumor genome (right), as derived from current ESP data. Otherchromosomes are unchanged. Genomic blocks are color coded and oriented (indicated by arrows) according to their chromosomal locationsand orientation in the human genome. Using the GRIMM program (Tesler, 2002), we find parsimonious rearrangement scenario that producesthe MCF7 genome by a sequence of 5 reversals and 15 translocations.

the locations of MCF7 breakpoints are correlated with thebreakpoints in the human-mouse genomes. As a first step,we examined the 22 breakpoints that we derived in theMCF7 genome with those in the human-mouse genomecomparisons. Some of the tumor breakpoints fall inside orclose to the human-mouse breakpoint regions. However,with the small number of MCF7 breakpoints, it remains tobe seen whether this is a real or chance correlation, andfurther investigation is required.

ACKNOWLEDGEMENTSWe are grateful to Glenn Tesler for the analysis of corre-lations between evolutionary and tumor breakpoints, andto Haixu Tang for helpful discussions. B.R. was supportedby an Alfred P. Sloan Postdoctoral Fellowship. The workof S.V. and C.C. was performed with support from the De-partment of Defense grant DAMD100110500, Breast Can-cer Research Program grant 8WB-0054, Bay Area BreastCancer Special Project Of Research Excellence CA58207and the Avon Foundation. The work of B.R. and P.P. wassupported by National Institutes of Health grant 1 R01HG02366.

REFERENCESAltschul,S., Gish,W., Miller,W., Myers,E. and Lipman,D. (1990)Basic local alignment search tool. J. Mol. Biol., 215, 403–410.

Barlund,M., Monni,O., Weaver,J., Kauraniemi,P., Sauter,G.,Heiskanen,M., Kallioniemi,O. and Kallioniemi,A. (2002)Cloning of BCAS3 (17q23) and BCAS4 (20q13) genes that

undergo amplification, overexpression, and fusion in breastcancer. Genes Chromosomes Cancer, 35, 311–317.

Dunham,M., Badrane,H., Ferea,T., Adams,J., Brown,P.O., Rosen-zweig,F. and Botstein,D. (2002) Characteristic genome rear-rangements in experimental evolution of Saccharomyces cere-visiae. Proc. Natl Acad. Sci. USA, 99, 16144–16149.

Fishel,R., Lescoe,M., Rao,M., Copeland,N., Jenkins,N., Garber,J.,Kane,M. and Kolodner,R. (1993) The human mutator genehomolog MSH2 and its association with hereditary nonpolyposiscolon cancer. Cell, 75, 1027–1038.

Glaz,J., Naus,J. and Wallenstein,S. (2001) Scan Statistics. Springer.Hannenhalli,S. and Pevzner,P. (1995) Transforming men into mice(polynomial algorithm for genomic distance problem. In Pro-ceedings of the 36th Annual IEEE Symposium on Foundations ofComputer Science. Milwaukee, Wisconsin, pp. 581–592.

Heisterkamp,N., Stephenson,J., Groffen,J., Hansen,P., de Klein,A.,Bartram,C. and Grosveld,G. (1983) Localization of the c-ab1oncogene adjacent to a translocation break point in chronicmyelocytic leukaemia. Nature, 306, 239–242.

Ionov,Y., Peinado,M., Malkhosyan,S., Shibata,D. and Perucho,M.(1993) Ubiquitous somatic mutations in simple repeated se-quences reveal a new mechanism for colonic carcinogenesis. Na-ture, 363, 558–561.

Jauch,A., Wienberg,J., Stanyon,R., Arnold,N., Tofanelli,S.,Ishida,T. and Cremer,T. (1992) Reconstruction of genomicrearrangements in great apes and gibbons by chromosomepainting. Proc. Natl Acad. Sci. USA, 89, 8611–8615.

Kallioniemi,A., Kallioniemi,O., Sudar,D., Rutovitz,D., Gray,J.,Waldman,F. and Pinkel,D. (1992) Comparative genomic hy-bridization for molecular cytogenetic analysis of solid tumors.Science, 258, 818–821.

ii170

probleme algorithmique : scenario de rearrangements pour transformer la referencedans le genome du tumeur

Raphael & al. Bioinformatics 19 :ii162 (2003)

Page 12: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Inference de rearrangements

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xii

the breakpoint could be associated with the BAC. Out of 552pooled BACs, at least one breakpoint could be assigned to 316(57%) of them. The remaining BACs fall into the following twogroups: First, in 129 (23%) cases, breakpoint assignment was in-conclusive due to ambiguous mapping of reads onto the referencegenome, mostly due to repetitive DNA regions, apparent overlapsbetween BACs, and other causes; second, in 107 (20%) cases,a single outlining block connected BAC ends, thus indicating lackof any rearrangement, contrary to previous reports (Volik et al.2003, 2006).

To examine the source of the disagreement with the previousreports, the 107 disagreements were examined in detail. Most of thedisagreements could be explained either by the differences betweenreference genome assemblies used in the previous and currentstudies or by mismapping of BAC-end sequence reads or by a com-bination of the two factors. Assemblies used in the previous studieswere NCBI Build 30 of June 2002 (Volik et al. 2003) and NCBI Build34 of July 2003 (Volik et al. 2006), while our study employed NCBIBuild 36 of March 2006. The newer assembly is more likely to bemore correct and complete, but some of the disagreements may alsobe explained by the presence of different structural alleles at sites ofstructural polymorphisms. The disagreements tended to occur inregions containing low copy repeats (LCRs). For example, Voliket al. (2003) identified MCF-7 BAC 9I10 as bridging apparenttranslocation t(11;11)(p11.12;q14.3) and apparently confirmed the

rearrangement by fluorescent in situ hybridization (FISH). Exami-nation of Build 36 reveals copies of an LCR at both 11p11.12 and11q14.3. The LCR was absent from Builds 30 and 34, thusexplaining the aberrant BAC-end sequence mapping and even theerroneous ‘‘confirmation’’ by FISH.

Examination of breakpoint sequences reveals signaturesof DSB repairTo examine breakpoints at the sequence level, all the 157 break-point-spanning amplicons were used as substrates for sequencingfrom both ends. Most amplicons were of small enough size (lessthan 1 kb on average), allowing the Sanger read from at least oneof the ends to reach the breakpoint. Difficultly of sequencingacross breakpoints has been documented (Lee et al. 2007; Liu andCarson 2007), especially in repeat-rich regions. To ameliorate theproblem, we sequenced DNA from specific BAC pools andemployed nested sequencing primers in cases of first-pass se-quencing failures. Breakpoint-straddling sequence could beobtained from 86 (55%) amplicons and could not be obtained forthe remaining 71 (45%). Many of the failures were due to inabilityto design unique primers for sequencing across breakpoints thatfall within repeat-rich regions.

Examination of 86 breakpoints that could be resolved to thebase pair level (summarized in the chart in the middle of Fig. 2B)

Figure 1. (A) An illustration of the principle of the method. Breakpoints within a BAC containing segments from chromosomes 20, 3, and 17 aredetected using a combination of ‘‘bridging’’ and ‘‘outlining’’ steps. The bridging step maps fosmid end-sequences onto the reference genome. Theoutlining step maps short tags (labeled ‘‘PyroSeqs’’) using 454 technology from the BAC (in practice a pool of BACs) onto the reference genome. Theresults of bridging and outlining jointly allow precise mapping of breakpoints and reconstruction of rearranged BACs. (B) Organization of the mappingexperiment. The nonredundant collection of 552 rearrangement containing BACs, 17 normal BAC negative controls, and seven positive controls wasarrayed in six 96-well plates and pooled as indicated. Three 454 sequencing reactions (involving BACs pooled from plate pairs) produced tags for thepurpose of outlining. Six fosmid libraries (one from each 96-well plate pool of BACs) were constructed for Sanger-based sequencing of fosmid ends andbridging. (C) Bar charts detailing the classification of detected MCF-7 breakpoints.

A sequence-level map of breakpoints in MCF-7

Genome Research 169www.genome.org

Cold Spring Harbor Laboratory Press on January 5, 2014 - Published by genome.cshlp.orgDownloaded from

Figure 2. (A) Circular visualization of the MCF-7 genome obtained using Circos software. Chromosomes are individually colored with centromeres inwhite and LCR regions in black. MCF-7 BAC array comparative genome hybridization data (Jonsson et al. 2007) are plotted with gains in green and lossesin red using log2ratio. The inner chromosome annotations depict 157 somatic MCF-7 breast tumor chromosomal rearrangements associated with LCRs(black) and breakpoints not associated with LCRs (green). Chromosomal rearrangements are depicted on each side of the MCF-7 breakpoints; intra-chromosomal rearrangements (blue) are located outside and interchromosomal rearrangements (red) are located in the center of the circle. (B) Bar chartsindicating classification of somatic breakpoints in MCF-7.

Hampton et al.

170 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on January 5, 2014 - Published by genome.cshlp.orgDownloaded from

(bleu, rouge : rearrangements inter- et intra-chromosomaux ; vert, bleu : rearrangementsentre regions repetees ou non)

Hampton & al Genome Research 19 :167 (2009)

Page 13: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Evolution du cancer

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xiii

Mutational signaturesPatterns of mutations that are characteristic of a type of cancer or that are indicative of a specific process.

ChromothripsisA single event that causes genome shattering and reassembly, resulting in a characteristic pattern of oscillating copy number and up to several hundred genomic rearrangements localized to one or a few chromosomes.

time points in the life history of a cancer: for example, at diagnosis, relapse and metastasis7. A limited number of published studies have included samples that are sepa-rated by both space and time9. The biological question posed and the clinical feasibility largely determine the sampling strategy.

Single-cell sequencing. Single-cell sequencing is a poten-tially useful approach towards the study of cancer evolu-tion and is the ultimate resolution of the multi-sampling approach. In proof-of-principle studies, this approach has been successfully applied to generate catalogues of point mutations in protein-coding regions and copy number changes10,13,14. These approaches have a require-ment for whole-genome amplification of the genome of each cell, and this introduces several biases, with the potential for both false-positive and false-negative mutation calls. For haematological malignancies, in situ hybridization techniques allow single cells to be studied for cytogenetic abnormalities15, and it is feasible that in the future, microfluidic techniques will allow cells to be isolated and analysed in one step for solid tumour samples as well16,17. The ability to make inferences about phylogenetic structure using single-cell sequencing will, however, still be fundamentally limited by how repre-sentative the biopsy sample is of the whole-tumour bulk and by how many cells are individually analysed.

Mathematical algorithms. Mathematical models have been widely applied in an attempt to unpick the complex and multifactorial influences on cancer progression18–20.

Massively parallel sequencing data are particularly amenable to mathematical analysis because they repre-sent a random sample of DNA molecules, and hence of individual cancer cell genomes, within a tumour speci-men (BOX 2). Statistical algorithms for exploiting these properties have been developed, providing important insights into the clonal mix of the sample sequenced. For example, using the fraction of reads reporting a point mutation, the copy number at that locus and the level of normal cell contamination, we can work out whether the mutation is likely to be clonal or subclonal and whether the mutation has been duplicated by a subsequent copy number change7,21,22–24. Within a given copy number seg-ment, this mandates a clear temporal precedence. The earliest mutations are those that are subsequently dupli-cated, followed by those that are clonal but that are pre-sent on a single copy of the locus and then by those that are subclonal. This allows inferences about the relative timing of the copy number gain and about the changing mutational signatures that are operative in the different epochs22,25.

With the exception of more complex processes such as chromothripsis (discussed below), genomic rearrange-ments generally represent simple events (such as dele-tions or inversions), occurring over the evolutionary time course of a cancer. Mathematically, these rearrangements can be considered as sequential selections from a known library of genomic transformations — remarkably, the constraints imposed by the simplicity of the repertoire of possible rearrangement types, the genome-wide, allele-specific copy number data and the observed breakpoints mean that even deeply complex clusters of rearrange-ments can be disentangled to yield both the final genomic configuration of segments and the temporal order in which the rearrangements occurred26.

Mutations occur in a given genomic context, and this can also be exploited to understand cancer evolution. In particular, mutations can be ‘phased’ with nearby heterozygous germline SNPs, allowing haplotype-specific analysis of clonal and subclonal mutations24. Furthermore, pairs of mutations can be phased relative to one another, allowing patterns of branching and sub-clonal evolution to be delineated5,24 (BOX 2). Although such approaches are currently limited to samples with hypermutable regions or with a high mutation burden, the increasing read lengths coming in future genera-tions of single-molecule sequencers will vastly expand the power of this approach.

The heterogeneous cancer genomeThe cancer genome is characterized by heterogene-ity that is seen across tumour types, among cases of a particular tumour type and even within an individual cancer. This heterogeneity reflects the action of the twin evolutionary forces of variation generation and selec-tion. The extent of genomic variability is testament to the diverse and dynamic nature of these forces.

The heterogeneity of cancer genes. Massively parallel sequencing has enabled us to construct nearly com-prehensive catalogues of every mutation within an

Figure 1 | The evolution of clonal populations. Cancers are genomically diverse and dynamic entities. Unique clones (represented by different coloured bubbles) emerge as a consequence of accumulating driver mutations in the progeny of a single most recent common ancestor (MRCA) cell. Ongoing linear and branching evolution results in multiple simultaneous subclones that may individually be capable of giving rise to episodes of disease relapse and metastasis. The dynamic clonal architecture is shaped by mutation and competition between subclones in light of environmental selection pressures, including those that are exerted by cancer treatments.

Normalcell

MRCAcell

Distant metastasis

Time point X: diagnosis and treatment initiation

Time point Y:distant and local relapse

Driver mutations

Time

REVIEWS

796 | NOVEMBER 2012 | VOLUME 13 www.nature.com/reviews/genetics

© 2012 Macmillan Publishers Limited. All rights reserved

comment inferer ?1. echantillons en temps et espace2. echantillons de cellules singulaires

Yates & Campbell Nature Reviews Genetics 13 :795 (2012)

Page 14: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Exemple : single cells

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xiv

are believed to play an important role in cancer pathogenesis (31).Specifically, we focused on exon 8, which has been reported tocontain a ‘‘mutational hotspot’’ in codon 270 (32). We amplified andsequenced a 240 bp fragment, spanning the mutational hotspotand analyzed the sequences. The tail clipping DNA contained nomutations, whereas some of the single cells contained the samespecific C!Tmutation in codon 270 (Fig. 4A). In the subset of cellsobtained from tumor foci, 10 cells failed to amplify completely, 5cells showed only the mutant genotype, 6 showed only the wild-type (wt) genotype, and in 2 cells (M-3 and M-14), both mutant andwt genotypes were detected (Fig. 5). In the subset of cells isolatedfrom normal epithelium, five cells failed to amplify completely,eight showed only the wt phenotype, and cell R-10, which wasoriginally labeled as ‘‘normal,’’ but which likely represents a case of‘‘mistaken identity’’ (see Discussion), was heterozygous for themutation, displaying both the mutant and wt genotypes. There areseveral possible explanations for the finding of identical mutationsin some, but not all, tumor cells. The mutation may have arisen asmultiple independent events or as a single early event. It may be

present in all tumor cells but only detectable in some cells due toallele drop out, or conversely, it may be present in only a subset ofcells, and in the latter scenario, the subset may be a subclone of thetumor or a nonclonal subset. The wt genotypes may representnonmutated alleles or alleles that reverted back to the wt state by asecond mutation. Of all of the possibilities outlined above, the mostlikely is that the mutation arose as a single event in the primarytumor cell in one of the alleles was present in all tumor cells in theheterozygote state but was not detected in all cells due to alleledrop out. A discussion of the likelihood of all the possibilities and astatistical analysis supporting the most likely possibility arepresented in Supplementary Text S4.

Discussion

This work shows a new approach for studying cancer. Analysis of37 single cancer and adjacent normal cells was sufficient toestablish the monoclonal origin of the tumor cells, calculate depthof cancerous cells and normal lung epithelium cells, calculate the

Figure 4. Cancer cell lineage reconstruction. A, reconstructed cell lineage tree. All cells from the mediastinal tumor mass (M ; red), and from cancer foci in theright (R ; orange ) and left (L ; yellow ) lungs are clustered on the same subtree (highlighted gray ), whereas most cells from the normal epithelium of the right(R ; blue ) and left (L ; blue ) lungs are clustered outside this subtree. Cells R-10 and R-22, which were positioned on the border region between the tumor focusand adjacent normal tissue in the right lung, are colored gray. Founder cell of the cancer subtree is indicated. The vertical axis on the left represents estimatedcell depth (number of cell divisions since the zygote). B, correlation between lineage and physical distances. The mean lineage distance between cells fromthe same tissue section is the smallest, followed by cells from different sections in the same focus, and cells from different foci. C, mean depth of cancer cellsis almost twice the mean depth of normal lung epithelial cells but slightly less than the expected depth of B lymphocytes in an animal of the same age. D, tissuesection from the right lung. Dashed white line, the border between the cancerous and normal tissue. Cells R-24, R-25, R-26, and R-27 are outlined.

Cancer Research

Cancer Res 2008; 68: (14). July 15, 2008 5928 www.aacrjournals.org

1. microdissection de cellules isolees2. clonage + genotypage (120 loci)3. arbre phylogenetique (neighbor-joining)

Frumkin & al Cancer Research 68 :14 (2008)

Page 15: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Exemple : regions multiples

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xv

Intr atumor Heterogeneity Revealed by multiregion Sequencing

n engl j med 366;10 nejm.org march 8, 2012 887

tion through loss of SETD2 methyltransferase func-tion driven by three distinct, regionally separated mutations on a background of ubiquitous loss of the other SETD2 allele on chromosome 3p.

Convergent evolution was observed for the X-chromosome–encoded histone H3K4 demeth-ylase KDM5C, harboring disruptive mutations in R1 through R3, R5, and R8 through R9 (missense

and frameshift deletion) and a splice-site mutation in the metastases (Fig. 2B and 2C).

mTOR Functional Intratumor HeterogeneityThe mammalian target of rapamycin (mTOR) ki-nase carried a kinase-domain missense mutation (L2431P) in all primary tumor regions except R4. All tumor regions harboring mTOR (L2431P) had

B Regional Distribution of Mutations

C Phylogenetic Relationships of Tumor Regions D Ploidy Profiling

A Biopsy Sites

R2 R4

DI=1.43

DI=1.81

M2bR9

Tetraploid

R4b

R9 R8R5

R4a

R1R3R2

M1M2b

M2a

VHL

KDM5C (missense and frameshift)mTOR (missense)

SETD2 (missense)KDM5C (splice site)

SETD2 (splice site)

?

SETD2 (frameshift)

PreP

PreM

Normal tissue

PrePPreMR1R2R3R5R8R9R4M1M2aM2b

C2o

rf85

WD

R7SU

PT6H

CD

H19

LAM

A3

DIX

DC

1H

PS5

NRA

PKI

AA

1524

SETD

2PL

CL1

BCL1

1AIF

NA

R1$

DA

MTS

10

C3

KIA

A12

67.

RT4

CD

44A

NKR

D26

TM7S

F4SL

C2A

1D

AC

H2

MM

AB

ZN

F521

HM

G20

AD

NM

T3A

RLF

MA

MLD

1M

AP3

K6H

DA

C6

PHF2

1BFA

M12

9BRP

S8C

IB2

RAB2

7ASL

C2A

12D

USP

12A

DA

MTS

L4N

AP1

L3U

SP51

KDM

5CSB

F1TO

M1

MYH

8W

DR2

4IT

IH5

AKA

P9FB

XO1

LIA

STN

IKSE

TD2

C3o

rf20

MR1

PIA

S3D

IO1

ERC

C5

KLALK

BH8

DA

PK1

DD

X58

SPA

TA21

ZN

F493

NG

EFD

IRA

S3LA

TS2

ITG

B3FL

NA

SATL

1KD

M5C

KDM

5CRB

FOX2

NPH

S1SO

X9C

ENPN

PSM

D7

RIM

BP2

GA

LNT1

1A

BHD

11U

GT2

A1

MTO

RPP

P6R2

ZN

F780

AW

SCD

2C

DKN

1BPP

FIA

1THSS

NA

1C

ASP

2PL

RG1

SETD

2C

CBL

2SE

SN2

MA

GEB

16N

LRP7

IGLO

N5

KLK4

WD

R62

KIA

A03

55C

YP4F

3A

KAP8

ZN

F519

DD

X52

ZC

3H18

TCF1

2N

USA

P172

X4KD

M2B

MRP

L51

C11

orf6

8A

NO

5EI

F4G

2M

SRB2

RALG

DS

EXT1

ZC

3HC

1PT

PRZ

1IN

TS1

CC

R6D

OPE

Y1A

TXN

1W

HSC

1C

LCN

2SS

R3KL

HL1

8SG

OL1

VHL

C2o

rf21

ALS

2CR1

2PL

B1FC

AM

RIF

I16

BCA

S2IL

12RB

2

PrivateUbiquitous Shared primary Shared metastasis

Private

Ubiquitous

Lungmetastases

Chest-wallmetastasis

Perinephricmetastasis

M110 cm

R7 (G4)

R5 (G4)

R9

R3 (G4)

R1 (G3) R2 (G3)

R4 (G1)

R6 (G1)

Hilu

m

R8 (G4)

Primarytumor

Shared primaryShared metastasis

M2b

M2a

Propidium Iodide Staining

No.

of C

ells

The New England Journal of Medicine Downloaded from nejm.org at UNIVERSITE DE MONTREAL on December 13, 2013. For personal use only. No other uses without permission.

Copyright © 2012 Massachusetts Medical Society. All rights reserved.

1. echantillons de regions multiples2. sequencage complet de l’exome3. analyse phylogenetique

Gerlinger & al New England Journal of Medicine 366 :883 (2012)

Page 16: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Regions multiples

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xvi

Intr atumor Heterogeneity Revealed by multiregion Sequencing

n engl j med 366;10 nejm.org march 8, 2012 887

tion through loss of SETD2 methyltransferase func-tion driven by three distinct, regionally separated mutations on a background of ubiquitous loss of the other SETD2 allele on chromosome 3p.

Convergent evolution was observed for the X-chromosome–encoded histone H3K4 demeth-ylase KDM5C, harboring disruptive mutations in R1 through R3, R5, and R8 through R9 (missense

and frameshift deletion) and a splice-site mutation in the metastases (Fig. 2B and 2C).

mTOR Functional Intratumor HeterogeneityThe mammalian target of rapamycin (mTOR) ki-nase carried a kinase-domain missense mutation (L2431P) in all primary tumor regions except R4. All tumor regions harboring mTOR (L2431P) had

B Regional Distribution of Mutations

C Phylogenetic Relationships of Tumor Regions D Ploidy Profiling

A Biopsy Sites

R2 R4

DI=1.43

DI=1.81

M2bR9

Tetraploid

R4b

R9 R8R5

R4a

R1R3R2

M1M2b

M2a

VHL

KDM5C (missense and frameshift)mTOR (missense)

SETD2 (missense)KDM5C (splice site)

SETD2 (splice site)

?

SETD2 (frameshift)

PreP

PreM

Normal tissue

PrePPreMR1R2R3R5R8R9R4M1M2aM2b

C2o

rf85

WD

R7SU

PT6H

CD

H19

LAM

A3

DIX

DC

1H

PS5

NRA

PKI

AA

1524

SETD

2PL

CL1

BCL1

1AIF

NA

R1$

DA

MTS

10

C3

KIA

A12

67.

RT4

CD

44A

NKR

D26

TM7S

F4SL

C2A

1D

AC

H2

MM

AB

ZN

F521

HM

G20

AD

NM

T3A

RLF

MA

MLD

1M

AP3

K6H

DA

C6

PHF2

1BFA

M12

9BRP

S8C

IB2

RAB2

7ASL

C2A

12D

USP

12A

DA

MTS

L4N

AP1

L3U

SP51

KDM

5CSB

F1TO

M1

MYH

8W

DR2

4IT

IH5

AKA

P9FB

XO1

LIA

STN

IKSE

TD2

C3o

rf20

MR1

PIA

S3D

IO1

ERC

C5

KLALK

BH8

DA

PK1

DD

X58

SPA

TA21

ZN

F493

NG

EFD

IRA

S3LA

TS2

ITG

B3FL

NA

SATL

1KD

M5C

KDM

5CRB

FOX2

NPH

S1SO

X9C

ENPN

PSM

D7

RIM

BP2

GA

LNT1

1A

BHD

11U

GT2

A1

MTO

RPP

P6R2

ZN

F780

AW

SCD

2C

DKN

1BPP

FIA

1THSS

NA

1C

ASP

2PL

RG1

SETD

2C

CBL

2SE

SN2

MA

GEB

16N

LRP7

IGLO

N5

KLK4

WD

R62

KIA

A03

55C

YP4F

3A

KAP8

ZN

F519

DD

X52

ZC

3H18

TCF1

2N

USA

P172

X4KD

M2B

MRP

L51

C11

orf6

8A

NO

5EI

F4G

2M

SRB2

RALG

DS

EXT1

ZC

3HC

1PT

PRZ

1IN

TS1

CC

R6D

OPE

Y1A

TXN

1W

HSC

1C

LCN

2SS

R3KL

HL1

8SG

OL1

VHL

C2o

rf21

ALS

2CR1

2PL

B1FC

AM

RIF

I16

BCA

S2IL

12RB

2

PrivateUbiquitous Shared primary Shared metastasis

Private

Ubiquitous

Lungmetastases

Chest-wallmetastasis

Perinephricmetastasis

M110 cm

R7 (G4)

R5 (G4)

R9

R3 (G4)

R1 (G3) R2 (G3)

R4 (G1)

R6 (G1)

Hilu

m

R8 (G4)

Primarytumor

Shared primaryShared metastasis

M2b

M2a

Propidium Iodide Staining

No.

of C

ells

The New England Journal of Medicine Downloaded from nejm.org at UNIVERSITE DE MONTREAL on December 13, 2013. For personal use only. No other uses without permission.

Copyright © 2012 Massachusetts Medical Society. All rights reserved.

(bleu=absent ; gris=present)

Intr atumor Heterogeneity Revealed by multiregion Sequencing

n engl j med 366;10 nejm.org march 8, 2012 887

tion through loss of SETD2 methyltransferase func-tion driven by three distinct, regionally separated mutations on a background of ubiquitous loss of the other SETD2 allele on chromosome 3p.

Convergent evolution was observed for the X-chromosome–encoded histone H3K4 demeth-ylase KDM5C, harboring disruptive mutations in R1 through R3, R5, and R8 through R9 (missense

and frameshift deletion) and a splice-site mutation in the metastases (Fig. 2B and 2C).

mTOR Functional Intratumor HeterogeneityThe mammalian target of rapamycin (mTOR) ki-nase carried a kinase-domain missense mutation (L2431P) in all primary tumor regions except R4. All tumor regions harboring mTOR (L2431P) had

B Regional Distribution of Mutations

C Phylogenetic Relationships of Tumor Regions D Ploidy Profiling

A Biopsy Sites

R2 R4

DI=1.43

DI=1.81

M2bR9

Tetraploid

R4b

R9 R8R5

R4a

R1R3R2

M1M2b

M2a

VHL

KDM5C (missense and frameshift)mTOR (missense)

SETD2 (missense)KDM5C (splice site)

SETD2 (splice site)

?

SETD2 (frameshift)

PreP

PreM

Normal tissue

PrePPreMR1R2R3R5R8R9R4M1M2aM2b

C2o

rf85

WD

R7SU

PT6H

CD

H19

LAM

A3

DIX

DC

1H

PS5

NRA

PKI

AA

1524

SETD

2PL

CL1

BCL1

1AIF

NA

R1$

DA

MTS

10

C3

KIA

A12

67.

RT4

CD

44A

NKR

D26

TM7S

F4SL

C2A

1D

AC

H2

MM

AB

ZN

F521

HM

G20

AD

NM

T3A

RLF

MA

MLD

1M

AP3

K6H

DA

C6

PHF2

1BFA

M12

9BRP

S8C

IB2

RAB2

7ASL

C2A

12D

USP

12A

DA

MTS

L4N

AP1

L3U

SP51

KDM

5CSB

F1TO

M1

MYH

8W

DR2

4IT

IH5

AKA

P9FB

XO1

LIA

STN

IKSE

TD2

C3o

rf20

MR1

PIA

S3D

IO1

ERC

C5

KLALK

BH8

DA

PK1

DD

X58

SPA

TA21

ZN

F493

NG

EFD

IRA

S3LA

TS2

ITG

B3FL

NA

SATL

1KD

M5C

KDM

5CRB

FOX2

NPH

S1SO

X9C

ENPN

PSM

D7

RIM

BP2

GA

LNT1

1A

BHD

11U

GT2

A1

MTO

RPP

P6R2

ZN

F780

AW

SCD

2C

DKN

1BPP

FIA

1THSS

NA

1C

ASP

2PL

RG1

SETD

2C

CBL

2SE

SN2

MA

GEB

16N

LRP7

IGLO

N5

KLK4

WD

R62

KIA

A03

55C

YP4F

3A

KAP8

ZN

F519

DD

X52

ZC

3H18

TCF1

2N

USA

P172

X4KD

M2B

MRP

L51

C11

orf6

8A

NO

5EI

F4G

2M

SRB2

RALG

DS

EXT1

ZC

3HC

1PT

PRZ

1IN

TS1

CC

R6D

OPE

Y1A

TXN

1W

HSC

1C

LCN

2SS

R3KL

HL1

8SG

OL1

VHL

C2o

rf21

ALS

2CR1

2PL

B1FC

AM

RIF

I16

BCA

S2IL

12RB

2

PrivateUbiquitous Shared primary Shared metastasis

Private

Ubiquitous

Lungmetastases

Chest-wallmetastasis

Perinephricmetastasis

M110 cm

R7 (G4)

R5 (G4)

R9

R3 (G4)

R1 (G3) R2 (G3)

R4 (G1)

R6 (G1)

Hilu

m

R8 (G4)

Primarytumor

Shared primaryShared metastasis

M2b

M2a

Propidium Iodide Staining

No.

of C

ells

The New England Journal of Medicine Downloaded from nejm.org at UNIVERSITE DE MONTREAL on December 13, 2013. For personal use only. No other uses without permission.

Copyright © 2012 Massachusetts Medical Society. All rights reserved.

Gerlinger & al New England Journal of Medicine 366 :883 (2012)

Page 17: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Chemotherapie

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xvii

comprendre l’effet de remedes : selection sur lignees clonales dans le tumeur

(8 patients avec acute myeloid leukaemia ; sequencage de genomes complets)

be present in virtually all the tumour cells at presentation and at relapse,as the variant frequency of these mutations is ,40–50%. Clone 2 (withcluster 2 mutations) and clone 3 (with cluster 3 mutations) must bederived from clone 1, because virtually all the cells in the sample containthe cluster 1 mutations (Fig. 2a). It is likely that a single cell from clone 3gained a set of mutations (cluster 4) to form clone 4: these survivedchemotherapy and evolved to become the dominant clone at relapse.We do not know whether any of the cluster 4 mutations conferredchemotherapy resistance; although none had translational consequences,we cannot rule out a relevant regulatory mutation in this cluster.

Assuming that all the mutations detected are heterozygous in theprimary tumour sample (with a malignant cellular content at 93.72%for the primary bone marrow sample, see Supplementary Informa-tion), we were able to calculate the fraction of total malignant cells ineach clone. Clone 1 is the founding clone; 12.74% of the tumour cellscontain only this set of mutations. Clones 2, 3 and 4 evolved from clone1. The additional mutations in clones 2 and 3 may have provided agrowth or survival advantage, as 53.12% and 29.04% of the tumourcells belonged to these clones, respectively. Only 5.10% of the tumourcells were from clone 4, indicating that it may have arisen last (Fig. 2a).However, the relapse clone evolved from clone 4. A single clone

containing all of the cluster 5 mutations was detected in the relapsesample; clone 5 evolved from clone 4, but gained 78 new somaticalterations after sampling at day 170. As all mutations in clone 5 appearto be present in all relapse tumour cells, we suspect that one or more ofthe mutations in this clone provided a strong selective advantage thatcontributed to relapse. The ETV6 mutation, the MYO18B mutation,and/or the WNK1-WAC fusion are the most likely candidates, asETV6, MYO18B and WAC are recurrently mutated in AML.

We evaluated the mutation clusters in the seven additional primarytumour–relapse pairs by assessing peaks of allele frequency usingkernel density estimation (Supplementary Fig. 11 and Supplemen-tary Information). We thus inferred the numbers and malignant frac-tions of clones in each primary tumour and relapse sample. Similar toUPN 933124, multiple mutation clusters (2–4) were present in each ofthe primary tumours from four patients (UPN 869586, UPN 426980,UPN 452198 and UPN 758168). However, only one major cluster wasdetected in each of the primary tumours from the three other patients(UPN 804168, UPN 573988 and UPN 400220) (Fig. 1c and Sup-plementary Table 10). Importantly, all eight patients gained relapse-specific mutations, although the number of clusters in the relapsesamples varied (Fig. 1).

Mutations:Cell type:Relapse speci!c (cluster 5)

Primary speci!c (cluster 2) Relapse enriched (cluster 4)Relapse enriched (cluster 3)Founding (cluster 1)AMLNormal Pathogenic mutations

Random mutations in HSCs

Clonal fractions at initial diagnosis Day 170 First relapse

AML1/UPN933124

DNMT3A, NPM1, FLT3, PTPRT, SMC3 ETV6, WNK1-WAC,MYO18B

HSCs

a

b

12.74%

29.04%

5.10%

53.12%

Chemotherapy

Model 2 (UPNs 426980, 452198, 758168, 869586, 933124)

Model 1 (UPNs 400220, 573988, 804168)

Tumour variant frequency (%)

Rel

apse

var

iant

freq

uenc

y (%

)

0

20

40

60

80

100

20 40 60 80 100

AMLAML4040 / / UPNUPN804168804168AML40/UPN804168

Tumour variant frequency (%)

Rel

apse

var

iant

freq

uenc

y (%

)

0

20

40

60

80

100

20 40 60 80 100

AMLAML4343 / UPN / UPN869586869586AML43/UPN869586

Chemotherapy

Chemotherapy

Figure 2 | Graphical representation of clonal evolution from the primarytumour to relapse in UPN 933124, and patterns of tumour evolutionobserved in eight primary tumour and relapse pairs. a, The founding clone inthe primary tumour in UPN 933124 contained somatic mutations inDNMT3A, NPM1, PTPRT, SMC3 and FLT3 that are all recurrent in AML andprobably relevant for pathogenesis; one subclone within the founding cloneevolved to become the dominant clone at relapse by acquiring additionalmutations, including recurrent mutations in ETV6 and MYO18B, and aWNK1-WAC fusion gene. HSC, haematopoietic stem cell. b, Examples of the

two major patterns of tumour evolution in AML. Model 1 shows the dominantclone in the primary tumour evolving into the relapse clone by gaining relapse-specific mutations; this pattern was identified in three primary tumour andrelapse pairs (UPN 804168, UPN 573988 and UPN 400220). Model 2 shows aminor clone carrying the vast majority of the primary tumour mutationssurvived and expanded at relapse. This pattern was observed in five primarytumour and relapse pairs (UPN 933124, UPN 452198, UPN 758168, UPN426980 and UPN 869586).

RESEARCH LETTER

5 0 8 | N A T U R E | V O L 4 8 1 | 2 6 J A N U A R Y 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012

Ding & al. Nature 481 :506 (2012)

Page 18: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

Histoires differentes

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xviii

l’histoire peut etre lineaire, ou divergente ; on a souvent de la convergence (effetde selection similaire)

Box 1 | Phylogenetic cancer trees

A phylogenetic tree is a pictorial representation of how a tumour is inferred to have evolved. As discussed in the text, these inferences can be based on a wide range of molecular biology and sampling techniques coupled with existing and new bioinformatics algorithms for reconstructing the tree. Several key properties of the evolution of a tumour are coded in the tree and provide important biological information about the genetic diversity of a cancer and clonal mix.

All trees have a shared ‘trunk’, which represents the complement of mutations shared by all malignant cells within the cancer. Because these mutations are fully clonal, there must have been a single ancestral cell that carried all of these mutations and through which all extant tumour cells can trace their lineage; we denote this cell the ‘most recent common ancestor’, borrowing the term from population genetics. Emergence of this cell initiated the final complete selective sweep within the cancer: all clonal expansions thereafter are, by definition, incomplete. All mutations that occur after the most recent appearance of a common ancestor are subclonal.

The length of individual branches (and the trunk) denotes the number of mutations that occurs in that lineage: a so-called ‘molecular clock’. If mutation rates per unit time were constant, then this would correlate with chronological time. However, for many cancers, this assumption is probably invalid (as discussed in the text), and molecular time is likely to be a poor proxy for chronological time.

The branching structure of the tree captures the number of subclonal populations within the cancer samples and their genetic relationships. For example, both linear and branching patterns of evolution have been described in a range of cancers. Linear evolution (panel a of the figure) was described in acute myeloid leukaemia (AML) and identifies the post-treatment relapse clone as a direct descendant of the major clone. The tree in panel b demonstrates branching evolution and specifically convergent evolution, in which the same genetic consequence independently emerges in separate clades of the phylogenetic tree highlighted by green boxes containing recurrently mutated genes. Brown circles represent cytogenetically distinct populations, and the numbers represent the number of copies of each adjacent gene. Solid lines represent the most likely ancestral origin of subclones, whereas dashed lines suggest alternative origins.

As sequencing goes genome-wide, phylogenies have been constructed for single-tumour samples that are composed of multiple constituent cellular subclones. The identification of tens of thousands of mutations genome-wide permits the delineation of distinct clusters of mutations — these clusters consist of groups of mutations that share similar mutant allele frequencies (corrected for local copy number). In the tree in panel c, we present a phylogenetic tree in which the variable thicknesses of the branches reflect the numbers of mutations within each distinct mutation ‘cluster’. This gives an indication of the patterns of subclonal importance and dominance within the cancer population. Chr, chromosome; ETV6, ETS variant 6; F, ETV6–RUNX1 fusion gene; GATA3, GATA-binding protein 3; IDH2, isocitrate dehydrogenase 2; PAX5, paired box 5; PIK3CA, phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha; NCOR1, nuclear receptor co-repressor 1; MLL3, myeloid/lymphoid or mixed-lineage leukaemia 3; NPM1, nucleophosmin (nucleolar phosphoprotein B23, numatrin); RUNX1, runt-related transcription factor 1; SMAD4, SMAD family member 4; STOX2, storkhead box 2. Panel a is adapted, with permission, from REF. 21 © (2012) Macmillan Publishers Ltd. Panel b is adapted, with permission, from REF. 15 © (2011) Macmillan Publishers Ltd. All rights reserved. Panel c is adapted, with permission, from REF. 24 © (2012) Cell Press.

Nature Reviews | Genetics

1 F2 RUNX11 ETV62 PAX5

2 F2 RUNX11 ETV62 PAX5

1 F2 RUNX10 ETV61 PAX5

1 F2 RUNX10 ETV62 PAX5

1 F 3 RUNX11 ETV62 PAX5

1 F3 RUNX10 ETV62 PAX5

1 F 3 RUNX11 ETV61 PAX5

1 F 3 RUNX10 ETV61 PAX5

14%

9%

4%

7%

42%

Primary tumour

Six coding mutations,including NPM1 and IDH2

Eight coding mutations,including STOX2

Chemotherapy

Relapse1 F2 RUNX11 ETV61 PAX5

4%

7%

2%

11%

Fertilized egg

27,000 mutations, including PIK3CA, TP53,GATA3, NCOR1, SMAD4 and MLL3

Trisomy 1q

Cluster C

Cluster D

Cluster B

Some of cluster A

Some of cluster A

Some of cluster A

Del13Del t(1;22)

TetraploidLoss of 2 Chr 7 and 2Loss of 1 Chr 6, 8, 9, 11,12, 14, 15, 18 and 21

b Childhood acute lymphoblastic leukaemia (branching evolution)

c Breast cancer (branching evolution)

a Relapsed AML (linear evolution)

REVIEWS

NATURE REVIEWS | GENETICS VOLUME 13 | NOVEMBER 2012 | 797

© 2012 Macmillan Publishers Limited. All rights reserved

Box 1 | Phylogenetic cancer trees

A phylogenetic tree is a pictorial representation of how a tumour is inferred to have evolved. As discussed in the text, these inferences can be based on a wide range of molecular biology and sampling techniques coupled with existing and new bioinformatics algorithms for reconstructing the tree. Several key properties of the evolution of a tumour are coded in the tree and provide important biological information about the genetic diversity of a cancer and clonal mix.

All trees have a shared ‘trunk’, which represents the complement of mutations shared by all malignant cells within the cancer. Because these mutations are fully clonal, there must have been a single ancestral cell that carried all of these mutations and through which all extant tumour cells can trace their lineage; we denote this cell the ‘most recent common ancestor’, borrowing the term from population genetics. Emergence of this cell initiated the final complete selective sweep within the cancer: all clonal expansions thereafter are, by definition, incomplete. All mutations that occur after the most recent appearance of a common ancestor are subclonal.

The length of individual branches (and the trunk) denotes the number of mutations that occurs in that lineage: a so-called ‘molecular clock’. If mutation rates per unit time were constant, then this would correlate with chronological time. However, for many cancers, this assumption is probably invalid (as discussed in the text), and molecular time is likely to be a poor proxy for chronological time.

The branching structure of the tree captures the number of subclonal populations within the cancer samples and their genetic relationships. For example, both linear and branching patterns of evolution have been described in a range of cancers. Linear evolution (panel a of the figure) was described in acute myeloid leukaemia (AML) and identifies the post-treatment relapse clone as a direct descendant of the major clone. The tree in panel b demonstrates branching evolution and specifically convergent evolution, in which the same genetic consequence independently emerges in separate clades of the phylogenetic tree highlighted by green boxes containing recurrently mutated genes. Brown circles represent cytogenetically distinct populations, and the numbers represent the number of copies of each adjacent gene. Solid lines represent the most likely ancestral origin of subclones, whereas dashed lines suggest alternative origins.

As sequencing goes genome-wide, phylogenies have been constructed for single-tumour samples that are composed of multiple constituent cellular subclones. The identification of tens of thousands of mutations genome-wide permits the delineation of distinct clusters of mutations — these clusters consist of groups of mutations that share similar mutant allele frequencies (corrected for local copy number). In the tree in panel c, we present a phylogenetic tree in which the variable thicknesses of the branches reflect the numbers of mutations within each distinct mutation ‘cluster’. This gives an indication of the patterns of subclonal importance and dominance within the cancer population. Chr, chromosome; ETV6, ETS variant 6; F, ETV6–RUNX1 fusion gene; GATA3, GATA-binding protein 3; IDH2, isocitrate dehydrogenase 2; PAX5, paired box 5; PIK3CA, phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha; NCOR1, nuclear receptor co-repressor 1; MLL3, myeloid/lymphoid or mixed-lineage leukaemia 3; NPM1, nucleophosmin (nucleolar phosphoprotein B23, numatrin); RUNX1, runt-related transcription factor 1; SMAD4, SMAD family member 4; STOX2, storkhead box 2. Panel a is adapted, with permission, from REF. 21 © (2012) Macmillan Publishers Ltd. Panel b is adapted, with permission, from REF. 15 © (2011) Macmillan Publishers Ltd. All rights reserved. Panel c is adapted, with permission, from REF. 24 © (2012) Cell Press.

Nature Reviews | Genetics

1 F2 RUNX11 ETV62 PAX5

2 F2 RUNX11 ETV62 PAX5

1 F2 RUNX10 ETV61 PAX5

1 F2 RUNX10 ETV62 PAX5

1 F 3 RUNX11 ETV62 PAX5

1 F3 RUNX10 ETV62 PAX5

1 F 3 RUNX11 ETV61 PAX5

1 F 3 RUNX10 ETV61 PAX5

14%

9%

4%

7%

42%

Primary tumour

Six coding mutations,including NPM1 and IDH2

Eight coding mutations,including STOX2

Chemotherapy

Relapse1 F2 RUNX11 ETV61 PAX5

4%

7%

2%

11%

Fertilized egg

27,000 mutations, including PIK3CA, TP53,GATA3, NCOR1, SMAD4 and MLL3

Trisomy 1q

Cluster C

Cluster D

Cluster B

Some of cluster A

Some of cluster A

Some of cluster A

Del13Del t(1;22)

TetraploidLoss of 2 Chr 7 and 2Loss of 1 Chr 6, 8, 9, 11,12, 14, 15, 18 and 21

b Childhood acute lymphoblastic leukaemia (branching evolution)

c Breast cancer (branching evolution)

a Relapsed AML (linear evolution)

REVIEWS

NATURE REVIEWS | GENETICS VOLUME 13 | NOVEMBER 2012 | 797

© 2012 Macmillan Publishers Limited. All rights reserved

Box 1 | Phylogenetic cancer trees

A phylogenetic tree is a pictorial representation of how a tumour is inferred to have evolved. As discussed in the text, these inferences can be based on a wide range of molecular biology and sampling techniques coupled with existing and new bioinformatics algorithms for reconstructing the tree. Several key properties of the evolution of a tumour are coded in the tree and provide important biological information about the genetic diversity of a cancer and clonal mix.

All trees have a shared ‘trunk’, which represents the complement of mutations shared by all malignant cells within the cancer. Because these mutations are fully clonal, there must have been a single ancestral cell that carried all of these mutations and through which all extant tumour cells can trace their lineage; we denote this cell the ‘most recent common ancestor’, borrowing the term from population genetics. Emergence of this cell initiated the final complete selective sweep within the cancer: all clonal expansions thereafter are, by definition, incomplete. All mutations that occur after the most recent appearance of a common ancestor are subclonal.

The length of individual branches (and the trunk) denotes the number of mutations that occurs in that lineage: a so-called ‘molecular clock’. If mutation rates per unit time were constant, then this would correlate with chronological time. However, for many cancers, this assumption is probably invalid (as discussed in the text), and molecular time is likely to be a poor proxy for chronological time.

The branching structure of the tree captures the number of subclonal populations within the cancer samples and their genetic relationships. For example, both linear and branching patterns of evolution have been described in a range of cancers. Linear evolution (panel a of the figure) was described in acute myeloid leukaemia (AML) and identifies the post-treatment relapse clone as a direct descendant of the major clone. The tree in panel b demonstrates branching evolution and specifically convergent evolution, in which the same genetic consequence independently emerges in separate clades of the phylogenetic tree highlighted by green boxes containing recurrently mutated genes. Brown circles represent cytogenetically distinct populations, and the numbers represent the number of copies of each adjacent gene. Solid lines represent the most likely ancestral origin of subclones, whereas dashed lines suggest alternative origins.

As sequencing goes genome-wide, phylogenies have been constructed for single-tumour samples that are composed of multiple constituent cellular subclones. The identification of tens of thousands of mutations genome-wide permits the delineation of distinct clusters of mutations — these clusters consist of groups of mutations that share similar mutant allele frequencies (corrected for local copy number). In the tree in panel c, we present a phylogenetic tree in which the variable thicknesses of the branches reflect the numbers of mutations within each distinct mutation ‘cluster’. This gives an indication of the patterns of subclonal importance and dominance within the cancer population. Chr, chromosome; ETV6, ETS variant 6; F, ETV6–RUNX1 fusion gene; GATA3, GATA-binding protein 3; IDH2, isocitrate dehydrogenase 2; PAX5, paired box 5; PIK3CA, phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha; NCOR1, nuclear receptor co-repressor 1; MLL3, myeloid/lymphoid or mixed-lineage leukaemia 3; NPM1, nucleophosmin (nucleolar phosphoprotein B23, numatrin); RUNX1, runt-related transcription factor 1; SMAD4, SMAD family member 4; STOX2, storkhead box 2. Panel a is adapted, with permission, from REF. 21 © (2012) Macmillan Publishers Ltd. Panel b is adapted, with permission, from REF. 15 © (2011) Macmillan Publishers Ltd. All rights reserved. Panel c is adapted, with permission, from REF. 24 © (2012) Cell Press.

Nature Reviews | Genetics

1 F2 RUNX11 ETV62 PAX5

2 F2 RUNX11 ETV62 PAX5

1 F2 RUNX10 ETV61 PAX5

1 F2 RUNX10 ETV62 PAX5

1 F 3 RUNX11 ETV62 PAX5

1 F3 RUNX10 ETV62 PAX5

1 F 3 RUNX11 ETV61 PAX5

1 F 3 RUNX10 ETV61 PAX5

14%

9%

4%

7%

42%

Primary tumour

Six coding mutations,including NPM1 and IDH2

Eight coding mutations,including STOX2

Chemotherapy

Relapse1 F2 RUNX11 ETV61 PAX5

4%

7%

2%

11%

Fertilized egg

27,000 mutations, including PIK3CA, TP53,GATA3, NCOR1, SMAD4 and MLL3

Trisomy 1q

Cluster C

Cluster D

Cluster B

Some of cluster A

Some of cluster A

Some of cluster A

Del13Del t(1;22)

TetraploidLoss of 2 Chr 7 and 2Loss of 1 Chr 6, 8, 9, 11,12, 14, 15, 18 and 21

b Childhood acute lymphoblastic leukaemia (branching evolution)

c Breast cancer (branching evolution)

a Relapsed AML (linear evolution)

REVIEWS

NATURE REVIEWS | GENETICS VOLUME 13 | NOVEMBER 2012 | 797

© 2012 Macmillan Publishers Limited. All rights reserved

Yates & Campbell Nature Reviews Genetics 13 :795 (2012)

Page 19: GENOMIQUE DU CANCER´csuros/IFT6299/H2014/content/prez15-canc… · N stem cells that seed a renewing tissue such as the skin or colon. The classical Luria-Delbru ¬ck distribution

The future is now

Cancer ? IFT6299 H2014 ? UdeM ? Miklos Csuros xix

? milliers de genomes⇒ classification et diagnostique? sequencage de regions multiples, a temps multiples⇒ modelisation de l’ecologie

du tumeur? sequencage de cellules singulaires