transcriptomics jiri zavadil, phd molecular mechanisms and biomarkers international agency for...
TRANSCRIPT
Transcriptomics
Jiri Zavadil, PhDMolecular Mechanisms and Biomarkers
International Agency for Research on Cancer, Lyon
Transcriptomics - Definitions
Transcriptome - the complete set of RNA transcripts produced by the genome at a given time
Transcriptome is highly dynamic and complex in comparison to the relatively stable genome
Transcriptomics - the global study of gene expression at the RNA level
- can include genes for ncRNAs (microRNAs etc)
Blood spots Cord bloodWhole bloodGenetic, epigenetic, transcriptomic analyses (nucleic acids)Proteomic analysis, serological and chemical analyses
UrineChemical, proteomic and nucleic acid analysis
Tumor cells, tissues
Biospecimens in I4C Mother-Child and Infant-Child Cohorts
Case for Integrated Omics Analyses
DNA methylation
Histone modification
The prospective biospecimen collection and retrospective case analysis will yield interconnected results
Epigenetics gene regulation RNA and protein markers
Studied by transcriptomics
Transcriptomics – Applications for I4C
Specific gene expression Genes and signatures determined by particular genetic, epigenetic regulatory factors, environmental exposures
Exploratory approachesNot hypothesis driven, e.g global gene expression in tumors versus healthy tissues, differential responses to distinct environmental exposures
Disease etiology and classification Patterns/signatures rather than single markers can improve knowledge about etiology and diagnosis
DNA Microarray Platforms
Affymetrix GeneChip
Illumina BeadArray
WorkflowReverse transcription, IVT with labeled nucleotides, array hybridization, staining, washing scanning
Pros/ConsRapid and streamlined protocols, standardized analysis; biased target collection, levels but limited sequence information
miRNA TLDA Array
742 total target miRs
Quantile Normalization
ABI 7900 SDS HT
MicroRNA - TaqMan Low Density Array
Total RNASample
Pros/ConsQuantitative abundance analysis; biased target collection
Stratton, MR. Science 331, 1553 (2011)
Cancer Genome SequencingIntegrated Molecular Profiling By MPS Massively Parallel Sequencing (MPS) - powerful nucleic acid analysis tool providing base-pair resolution information at the genome scale
emPCR
Massively Parallel Sequencing
Accuracy < 99.99% Throughput/Day <10–15 GbThroughput/Run <90 Gb or >1.4 B reads (paired-end or mate-paired runs)
Samples/Run• 1 genome• 12 exomes• 6 transcriptomes
ABI SOLiD 5500
Bridge amplification, clonal expansion
Massively Parallel Sequencing
6 human genomes at 30x64 transcriptomes at 20M mapped reads/sample
Illumina HiSeq2000/2500
mRNA Abundance AnalysisRPKM (Reads Per Kilobase per Million mapped reads)
FPKM (Fragments Per Kilobase per Million mapped reads)
Methods of quantifying gene expression levels from RNA-seq data by normalizing for total read length and the number of sequencing reads or fragments (PE reads).
Equivalent distribution Identical distribution (spread, range and median)
Unnormalized data Scaling Normalization Quantile Normalization
log 2(R
PKM
)
-4 -2
0 2
4
-4 -2
0 2
4
-4 -2
0 2
4
A1 A2 A3 A1 A2 A3 A1 A2 A3
Differential mRNA Abundance Analysis
ACSL5 – normalized differential abundance ratio = 8.4
Non-syn SNV/mutation identified at both DNA/RNA levels
DNA
RNA
Single Nucleotide Variant Analysis
Acceptor Splice Sites Mutated in UUCExon N Exon N+1GU------A-----AG5’ 3’
Tumor RNA
Tumor DNA
Normal DNA
mRNA Splicing Aberrations
Patient ID MutationMutant Allele Frequency (17,000-50,000x coverage)
Diagnosis Relapse719515 p.R238W 0.01% 27%
737185 p.R238W 0 18%
761159 p.R238W 0 31%
756421 p.R367Q 0.02% 25%
716996 p.K404KD 0 55%
763368 p.S408R 0 50%
769886 p. S445F 0 25%
728610 p. L626F 0 49%
726584 p. E274Q 0 19%
728610 p. S171I 0 43%
728610 p. M244L 0.38% 47%
726584 p. A53V 0 20%
6 matched diagnosis/relapse pediatric ALL samples (n=12) RNA-seq to discover novel mutations specific to relapse disease Targeted amplicon resequencing at ultra-deep coverage
Stage-Specific RNA Aberrations in ALL
Microarray and RNA-seq Transcriptome Profiling
•Possible with >10 picograms total RNA•Degraded samples, RIN scores >2.0•Formalin-fixed, paraffin-embedded (FFPE) samples •Whole blood•Direct cell lysate from the equivalent of a single or a few cells
microRNA Profiling•megaplex amplification protocols - 1-350 ng total RNA•non- amplification based for 350 – 1000 ng total RNA
Solutions for Low Yield Samples
MPS cost goes down, technologies become more advanced and powerful, platforms develop rapidly – a strong case for transcriptomics within integrated omics approaches applied to large cohorts such as I4C.
The Future of MPS-based OMICS
The Economist, 2011
• Low yield samples (blood spots, extracellular microRNAs) might require application of amplification methods
• Tissue and cell specificity of gene expression (e.g. cord blood vs leukemic clone) – need for carefully matched controls
• Only genes and RNAs expressed at the time of sampling are detected
• Depth of coverage needs for RNAseq affect cost-related decisions
• Specific disease progression stages might mask etiology-associated aberrations
• Bioinformatics – limited standards for complex data processing and analysis (RNAseq), more benchmarking studies needed using data from consortia-like efforts (FDA’s SEQC). Data storage and access solutions.
Considerations for I4C Transcriptomics
Thank you….