advances 30 april, 2009 cancer genomics...sean grimmond april 30. th , 2008 ... • every cancer...

Sponsored by:

Participating Experts:

Sean Grimmond, Ph.D.Institute for Molecular BioscienceUniversity of QueenslandAustralia

Webinar SeriesWebinar SeriesScienceScienceAdvances inAdvances in 30 April, 200930 April, 2009

Brought to you by the Science/AAAS Business Office

David Wheeler, Ph.D.Baylor College of MedicineHouston, Texas

John McPherson, Ph.D.Ontario Institute for Cancer ResearchToronto, Canada

Cancer GenomicsCancer Genomics

Studying cancer transcriptomes at single nucleotide resolution

Expression Genomics Laboratory

http://www.expressiongenomics.org

Sean Grimmond April 30th , 2008

SQRL profiling is quantitative (SOLiD Vs Illumina array) Cancer Transcriptomics:

Over the last decade, transcriptomics has revolutionized our ability to capture the genes and pathways driving biological processes and pathological states.


Over the last decade, transcriptomics has revolutionized our ability to capture the genes and pathways driving biological processes and pathological states.

Cancer Transcriptomics is moving to massive scale sequence-based analyses for surveying :- i) locus activity, ii) transcript specific expression, and iii) sequence content.

Microarrayprofiling

RNAseqprofiling

Red: >2x up regulated in EB, Green: <2x down regulated in ES, Grey: Marginal detection (<.95 detection score for Illumina or 50tags for SQRL )

SQRL profiling is quantitative (SOLiD Vs Illumina array) Comparison of RNAseq & array-based gene expression profiling:

EB

ES

Microarrayprofiling

RNAseqprofiling

Red: >2x up regulated in EB, Green: <2x down regulated in ES, Grey: Marginal detection (<.95 detection score for Illumina or 50tags for SQRL )

SQRL profiling is quantitative (SOLiD Vs Illumina array) Comparison of RNAseq & array-based gene expression profiling:

AAA

AAA

AAA

AAA

Defining transcript specificExpression by “diagnostic” features:

AAA

AAA

AAA

AAA

Defining transcript specificExpression by “diagnostic” features:

Survey exon activity

Survey exon junction usage

Canonical ORF & mRNA (black arrow)

Complex transcriptional output from VEGFR1



Secreted decoy receptor 1 (common)



Secreted decoy receptor 1 (common)

Secreted decoy receptor 2 (rare)

Surveying known and novelTranscript expression:

Known complexity

•Alternative splicing•Alternative promoter usage•3’UTR switching

Theoretical complexity

•Novel Alternative splicing•Detection of gene fusions

Transcriptome discovery

•Novel Alternative splicing•Detection of gene fusions

Align, ID andQC call SNPs

Map to genome

Determine ifSNP is in dbSNP?

ORF, UTR, Syn/Non

Rank SNPs(polyphen, Canpredict

ACGATATTACACGTACACTCAAGTCGTTCGGAACCTACGATATTACACGTACATTCAAATCGTACGATATTACACGTACATTCAACTCGTACGATATTACACGCACATTCAAGTCGT

CGATATTACACGTACATTCAAGTCGTTATATTTCACGTACATTCAAGTCGTTCGATATTAAACGTACATTCAAGTCGTTCG

ATTACACGTACATTCAAGTCGTTCGGAATTACACGTACATTCACGTCGTTCGGA

CACGTACATTCAAGTCGTTCGGAACCT-----------------T------------------ SNP call

Aligned Reads

All tags

Variants expressed relative to the reference genome

Screening for expressed SNPs, mutations, RNA editing

MPP6: (W-260-stop)p55 MAGUK family member:

Tumour suppressor

W-260-stop

Profiling the small RNA Transcriptome:

APBB2A P PBCL2L11

CCND1

CCND2CCNG2

CDKN1A

CRK

CUL3

DMTF1

E2F1E2F3

E2F5

EREG

FOXO1A

GAB1

HAS2HIF1A

IRF1

KHDRBS1

KPNA2

MAP3K8MAPK9MYCN

NCOA3

NR4A3

PCAF

PDGFRA

PKD1

PKD2

PPARA

RB1RBBP7

RBL1RBL2

STAT3TP53INP1

TSG101

TXNIP

WEE1miR17-5p


WTseq is a powerful tool for monitoring gene activity and transcript specific expression and transcript discovery.



WTseq can also be used to study the sequence content of RNAs. This allows one to study expressed mutations, RNA editing events and allele specific expression.



WTseq can also be used to study the sequence content of RNAs. This allows one to study expressed mutations, RNA editing events and allele specific expression.

Sequence-based transcriptomics can also be applied to the small RNA fraction to perform similar studies in microRNAs.

Nicole Cloonan, Gabe Kolle, Brooke Gardiner, Geoff Faulkner, Darrin Taylor, Eshan Nourbakhsh, Keerthana Krishna, Shivangi Wani, Alan Robertson, David Tang, Christina Xu, Yunshan Xiao, Megan Vardy [Al Forrest, Graham Bethel, Tina Maguire].

Kevin McKernan, Gina Costa, Catalin Barbacioru Scott Kuersten, Jian Gu

Sponsored by:








Cancer Genomics Impact of Next-Generation

Sequencing PlatformsJohn D. McPherson, Ph.D.Director, Cancer GenomicsSenior Principal Investigator

Ontario Institute for Cancer Research

April 30 2009www.oicr.on.ca

Prevention Ontario Cancer Cohort

EarlyDiagnosis

One Millimetre CancerChallenge

Cancer Stem Cells

International CancerGenome Consortium

Selective Agents(Terry Fox Research

Institute - Ontario Node)

Immuno- and Bio-therapeutics

NewTherapeutics

Imaging andInterventions

Bio-repositoriesandPathology

Genomicsand HighThroughputScreening

MedicinalChemistry

Cancer Care and Services (including Health Promotion)

Informaticsand Bio-computing

Innovation Platforms

Patents to Products

High Impact Clinical Trials

Themes Innovation Programs

TranslationPrograms

CancerTargets

Ontario Institute for Cancer Research

www.oicr.on.ca

Advantages of Next-Gen Platforms

• No sub-cloning, no need for a bacterial host.– less cloning bias– bulk libraries

• Vast improvements in amounts of data generated.– quantification is possible through “counting” of “unique” reads– enhanced dynamic range– detection of rare variants

• Readily adapted to a variety of applications.– genome, transcriptome, epigenome

• Dramatic decrease in cost and speed of data generation.– Huge amounts of data per run

Next(Now)‐generation sequencers

read length

base

s pe

r mac

hine

run

10 bp 1,000 bp100 bp

1 Gb

100 Mb

10 Mb

10 Gb

AB/SOLiDv3, Illumina/GAII, Helicosshort-read sequencers

ABI capillary sequencer

454 GS FLX pyrosequencer(100-500 Mb in 100-400 bp reads,

0.5-1M reads, 5-10 hours)

(10+Gb in 30, 50-100 bp reads,>100M reads, 7-10 days)

1 Mb

(0.04-0.08 Mb in 450-800 bp reads,96 reads, 1-3 hours)

100 Gb

More reads or longer reads?

Increasingthroughput

Increasingread length

$$$

$

$$

OICR Cancer Genomics Platform ~800 billion bases/month

ACGT…

1.2PB storage1600 cores

Matching applications to platforms- read mapping, variant detection, PE, MP …

21 flow cells

Next-Gen Applications at OICR• Whole genome sequencing• Targeted genomic sequencing• Structural variation

– Rearrangements, copy number

• SNP/indel discovery• Copy number variation

– Microarray and beadstation still excellent options

• Whole transcriptome sequencing• Small RNA discovery/sequencing• Epigenomics

– Chromatin IP transcription factor binding (ChIP-seq)– Nucleosome positioning

Structural variants• Mate-pair and paired-end reads can be

used to detect structural variants

Fragmentation & circularization to an internal adaptor

ShearIsolate internal adaptors and fragment ends

Mate-Pairs Paired-Ends

Fragmentation

Add amplificationand sequencing adaptors

SequenceAdd amplificationand sequencing adaptors

Genomic DNA

1 - 20kb200 – 500bp

Clusters of aberrantly aligned read pairs

Mapping of read pairs to reference

• Spanning unexpected distance• Unexpected orientation

Fragment size

Fragmentnumber

< <

Insertion

> <

Deletion

> <>Reference

<

Map

Seq

delMap

Seq

> <

Concordant Inversion translocation

ChrA ChrB

inv

Direct Selection (M. Lovett et al. 1991) “Direct selection: a method for the isolation of cDNAs encoded by large genomic regions”

• “Hybrid selection”; “Genome partitioning”• Solid support capture

– Nimbelgen (Roche), Agilent• In-solution capture oligos

– Agilent• Regional or entire exome

ShearedDNA

Elute and sequence

microarray

NimbleGen Sequence Capture of a 600kb region

Readdepth

Targets

Repeats

%GC

NimbleGen Sequence Capture of a 600kb region

Readdepth

Targets

Repeats%GC

Oligo(Tm)

Agilent SureSelect Capture of exon targets

Readdepth

Exons

Repeats

www.opengenomics.com

Modified histonesin chromatin

DNA fragments linked to nucleosomes

Immunoprecipitationof modified histones

Isolation of DNA fragmentsand ligation of adaptors

Epigenomics• ChIP-seq

– Histone modifications– DNA binding sites

• Methylation– Genome-wide analyses– Correlation with expression studies

Epigenomics

Readdepth

State 1

Genes

State 2

State 1

State 2

International Cancer Genome Consortium

• To obtain a comprehensive description of genomic,

transcriptomic

and epigenomic

changes in 50 different tumor

types and/or subtypes which are of clinical and societal

importance across the globe.

• Every cancer genome project should state a clear rationale for

its choice of sample size, in terms of the desired sensitivity to

detect mutations. The target number of 500 samples per

tumor

type/subtype is set as a minimum, pending further

information to be provided by ICGC members proposing to

tackle specific cancer types/subtypes.

“50 different tumor types and/or subtypes”

“500 samples per tumor”

50,000 Human Genome Projects

www.icgc.org

International Cancer Genome Consortium World Map of Comprehensive

Cancer Genome Projects

Ontario/Canada: Pancreas

US:GBM, Ovary & Lung

Japan: LiverSpain:

CLL

India: Oral Cavity

UK: Breast

France: Liver & Breast China:

Stomach

Australia: Pancreas

EU: TBD

TCGA Pilot Projects

ICGC Cancer Genome Projectswww.icgc.org

Data analysis

People to thank• Cancer Genomics

– John McPherson

– Tom Hudson

– Kamran

Shazand

– Johar

Ali

– Vanya

Peltekova

– Philip Zuzarte– Michelle Sam

– April Cockburn– Ada

Wong

– Lee Timms

– Tanja

Durbic

– David D’Souza– Stacey Quinn– Melissa Bernard

• Informatics and Biocomputing– Lincoln Stein– Francis Ouellette– Arek

Kasprzyk

– Vincent Ferretti– Mathieu Lemire

– Tim Beck

– Quang

Trinh

– Michelle Chan‐Seng‐Yue

– Richard De Borja– Dave Sutton– Greg Whynott

– Tim Brown

– Victor Gu

• DCC– Christina Yung– Jianxin Wang– Junjun Zhang

• OICR Faculty– Nizar Batada– Lakshmi

Muthuswamy

• OICR Fellow– Paul Boutros

• ICGC- Jennifer

Jennings- Vanessa Ballin

www.oicr.on.ca

Sponsored by:








The Cancer Genome using Next-generation Sequencing

Technology

David A. Wheeler, Ph.D.Director, Bioinformatics and Cancer Genomics

HumanGenomeSequencingCenter

Cost of a genomeSequencer Date HGSC

Sequencing Capacity (billions)

Human Genome per Year

Cost per genome

First Generation 2003 16.2 0.04 $3,000,000,0002004 21 0.05 $250,000,0002005 30 0.07 $100,000,0002006 38 0.08 $25,000,000

Second Generation 2007 240 5 $2,000,0002008 2,040 45 $350,0002009 3,660 81 $100,000

Third Generation 2010 7,200 160 $10,000? 14,000 311 $1,000


DNA Sequencing in Cancer• Somatic mutation in DNA

– Scale of variation: single base to whole chromosome– Variety of next-generation instruments

• Epigenetic changes DNA– chip seq– reduced representation– whole genome

• Expression– RNA abundance– Splice variants

• aberrant splicing• fusion transcripts


Short read mapping software• Public-domain

– MAQ– MOSAIK– SOAP– Bowtie– TopHat (RNA-seq: splice junctions)

• Corporate– Corona Lite (AB/SOLiD)– Mapper (454)– ELAND (Illumina)

see also:http://en.wikipedia.org/wiki/List_of_sequence_alignment_software

Mini Sat

CNAFocal Amp

SNPs

1 10 100 103 104 105 106 107 108

Length (bases)

Aneuploidy covcovcovp.e.readsreads

SequenceData

TranslocationInsertion - Deletion (indel)

Size Scale of Somatic Variation

Type of Event


Copy Number Alteration by Sequence Coverage


• count number of reads per unit length of DNA• compare tumor and normal tissue from same patient

Amplification of 7p11.2

EGFR

Deletion at 9p21.3

CDKN2A

Mini Sat

CNAFocal Amp

SNPs

1 10 100 103 104 105 106 107 108

Length (bases)

Aneuploidy covcovcovp.e.readsreads

SequenceData

TranslocationInsertion - Deletion (indel)

Size Scale of Somatic Variation

Type of Event


Mutation Validation

Discovery (Tumor – Normal) Sanger/PCR(Auto + visual)

Biotage(Single base variants)

454(Single base variants,

indels)PCR Gel-Sizing

(large rearrangements)

Released Mutation List

Quality Assurance

Quality Control

Putative Mutation List

Multi-Platform Sequencing Strategy

SOLiDSOLiDSOLiD 454454

20-30X coverage 6-10X coverage

Align Reads-BLAT/Crossmatch-Mosaik

ValidationValidation

WGS Sequencing: Sequence Twice

Align Reads-Corona Lite

SNP discoveryAB SNP Caller

SNP discoveryHGSC AtlasSNP


SOLiD: mapping, mutations and validation pipeline

Valid Mutations

TumorReads

corona litemapping

&variant

detection

tumorvariants

probelist

e-Genotyping

454tumorreads

validtumor

var

NormalReads 454

normalreads

validnormal

var

normalvariants

probelist

15X Cov

12X Cov

Single base variation (SOLiD platform)

• 4.4 million variants (low stringency)• 2.2 million variants (high stringency)

– Allele must be seen at least 2X

eGenotyping Validation• 5,385 somatic mutations• 105 missense mutations


GBM missense mutations• 7 possible cancer connection

– Growth factors, tumor suppressors, cell proliferation

Gene Ref Var Codon Gene NameHDGF2 Arg Trp CGG Hepatoma-derived growth factor-

Related protein 2

PALLD Glu Gln GAA Palladin, Cytoskeletal associated protein

IL1B Phe Ser TTT Interleukin 1, beta

IL4l1 Ser Ala TCG Interleukin 4 induced 1

SIPA1L1 Lys Arg AAA Signal-induced proliferation associated

MUC16 Asn Asp AAT Mucin 16, cell surface associated1 like 1

DDX18 Ala Thr GCA DEAD(Asp-Glu-Ala-Asp) box ploypeptide 18

Cost of a genomeSequencer Date HGSC

Sequencing Capacity (billions)

Human Genome per Year

Cost per genome

First Generation 2003 16.2 0.04 $3,000,000,0002004 21 0.05 $250,000,0002005 30 0.07 $100,000,0002006 38 0.08 $25,000,000

Second Generation 2007 240 5 $2,000,0002008 2,040 45 $350,0002009 3,660 81 $100,000

Third Generation 2010 7,200 160 $10,000? 14,000 311 $1,000


Baylor HGSC Nimblegen Approach to Exome Sequencing

Elute

gDNAExon 1 Exon 2 Exon 3 Exon 4 Exon 5

Fragment and anneal to Nimblegen capture array

Sequencing

Analyze

Exon

Sequences

Coverage Profile over Capture Target

0

500000

1000000

1500000

2000000

2500000

-500

-400

-300

-200

-100 0 10 20 30 40 50 60 70 80 90 10

0

100

200

300

400

500

Target

BufferBuffer

Whole Exome Capture Chip

• Pancreatic Adenocarcinoma– SOLiD Single Slide Tumor and Normal– 3180 missense and nonsense mutations– 3 found in COSMIC

• NF2, neurofibromin 2• PTCH1, patched homolog 1 (tumor suppressor)• HEY1, hairy/enhancer-of-split related with YRPW

motif 1


Summary• Deep sequence coverage by Next-

generation sequencing methods is accurately discovering mutations related to cancer

• Multi-platform approach yields rapid validation

• e-Genotyping rapidly and efficiently assess raw sequencing data for known SNPs and mutations


Look out for more webinars in the series at:

www.sciencemag.org/webinar

For related information on this webinar topic, go to:

solid.appliedbiosystems.com

To provide feedback on this webinar, please e‐mail

your comments to [email protected]

Sponsored by:




advances 30 april, 2009 cancer genomics...sean grimmond april 30. th , 2008 ... • every cancer...

Documents