ngs in diagnostics · chek2 fancg nf2 sdhc cyld fanci nsd1 sdhd lrgt1 lrgt1 lrgt1 lrgt1 lrgt1 ....
TRANSCRIPT
Implementation of NGS in clinical testing
Test Development Validation Routine
Testing
¾ Platform ¾ Pipeline ¾ Test ¾ Reference Materials ¾ Limitations ¾ Specifications
Diagnostic Strategy
Quality Management
Proficiency Testing
¾ Quality Controls ¾ Reporting ¾ Proficiency Tests
¾ Clinical Guidelines ¾ NGS diagnostic approaches ¾ Considerations Dx yield ¾ NGS test types ¾ Workflow Dx ¾ Data management
Implementation of NGS in clinical testing
Test Development Validation Routine
Testing
Quality Management
Proficiency Testing
Diagnostic Strategy
Test development ¾NGS clinical guidelines
• NGS should not be transferred to clinical practice without an acceptable validation of the tests according to the emerging guidelines
Test development ¾NGS diagnostic approaches
Mutation scanning : Analysis of individual / small sets of genes, i.e. BRCA1/BRCA2. The NGS test should have at least the same sensitivity and specificity as the current
diagnostic.
• Different types of NGS assays for diagnostics
Mutation screening : Targeted analysis of known genes (gene panel). Novel features in the terms of the design, limitations, sensitivity, specificity and
possible adverse effects.
Exome sequencing : Extensive targeted analysis (gene panel, “mendelioms”). Trio for the identification of
de novo defects. Additional features such unsolicited findings/inform consent procedures.
¾Considerations for NGS testing Test development
The diagnostic yield is the chance that a disease causing variant is identified and molecular diagnosis can be made, calculated per patient cohort. (Weiss, Van de Zwaag et al. 2012)
• Consider the diagnostic yield for each NGS test
• Consider the frequency of other disease-causing pathomechanisms that can not be covered by NGS
cardiomyopahty /epilepsy > 50 genes associated
cystic fibrosis >98% patients mutations CFTR gene
gene panel testing
SPAST deletions 20% of cases with Hereditary Spastic Paraplegia (HSP)
¾Gene Panels Test development
• Gene panels should only contain genes clearly associated to the disease • Adjustment of panels to the latest scientific discoveries (maintain the Dx yield) • “core disease gene lists” established by clinical and laboratory experts
Type Genes / Panels Coverage
ROI (>30-x) Sequencing
precision
Additional Analysis
(CNV, repeat expansion)
Sanger to fill the gaps
A DMD 100% 99.9% Yes Yes BRCA1, BRCA2
B Gene Panels > 98% (*) 99.9% Upon request (if available)
Upon request (selected
genes)
C Whole Exome / Mendeliome > 98% 99.9% No No
(*)Test includes detailed reporting of low coverage regions
¾NGS test types Test development
• Different diagnostic types (quality levels) depending on the clinical requirements
Laboratory Workflow
Data Analysis
Assessment / Reporting
QC: quality control step
Data Management
Patient Data
LIMS
Gene Panel/ Exom Analysis
Gene DataBase
DNA Extraction
Library Preparation
Sequencing Additional
analysis Mapping
Variant calling
Annotation Quality/ Coverage
analysis
Variant Classification
Clinical Interpretation
Clinical Case
Clinical Report
QC QC
QC
QC QC
QC QC
QC
QC
QC QC QC
QC QC
¾Workflow Test development
• The workflow must adapt to the laboratory demand to avoid bottlenecks
¾Data management Test development
Breast Cancer - major genes
Breast Cancer - expanded panel
Ovarian Cancer- major genes
Ovarian Cancer - expanded panel
Polyposis Coli
Hereditary Nonpolyposis Colon Cancer
Colorectal Cancer - expanded panel
Pancreatic Cancer
Gastric Cancer
Gastrointestinal stromal tumor - major genes
Gastrointestinal stromal tumor - expanded panel
Pheochromozytoma-Paraganglioma syndrome
Renal Cancer
Thyroid Cancer - major genes
Thyroid Cancer - expanded panel
AIP DDB2 FANCL PALB2 SLX4
ALK DICER1 FANCM PHOX2B SMAD4
APC DIS3L2 FH PMS1 SMARCB1
ATM EGFR FLCN PMS2 STK11
BAP1 EPCAM GATA2 PRF1 SUFU
BLM ERCC2 GPC3 PRKAR1A TMEM127
BMPR1A ERCC3 HNF1A PTCH1 TP53
BRCA1 ERCC4 HRAS PTEN TSC1
BRCA2 ERCC5 KIT RAD51C TSC2
BRIP1 EXT1 MAX RAD51D VHL
BUB1B EXT2 MEN1 RB1 WRN
CDC73 EZH2 MET RECQL4 WT1
CDH1 FANCA MLH1 RET XPA
CDK4 FANCB MSH2 RHBDF2 XPC
CDKN1C FANCC MSH6 RUNX1
CDKN2A FANCD2 MUTYH SBDS
CEBPA FANCE NBN SDHAF2
CEP57 FANCF NF1 SDHB
CHEK2 FANCG NF2 SDHC
CYLD FANCI NSD1 SDHD
LRGt1
LRGt1
LRGt1
LRGt1
LRGt1
¾Database Test development
Patient data Family history Phenotype / HPO Additional analysis Additional diagnosis Laboratory data Quality criteria Kits / Reagents Quality parameters SOP Results Report
Genes / HGNC Isoform Associated disease MOI Panel Genes / version Panel Names / Disease Panel Kit Patients Mapping, coverage, variants Interpretation, classification Literature info User Clinical DBs / release Population DBs / release Assembly
Implementation of NGS in clinical testing
Test Development Validation Routine
Testing
Quality Management
Proficiency Testing
Diagnostic Strategy
Validation ¾Goal
• Prove the ability of the diagnostic test to detect variants in the regions defined during the development of the assay (define the reportable range)
ROI (region of interest) / Clinical Target
Technical Target
Reportable Range
L imitations
¾Platform Validation
¾Platform Validation
Platform Workflows
DNA isolation Fragmentation Library preparation Enrichment Sequencing
DNA quality DNA concentration Fragmentation size Sequencing Quality %GC # reads / passed filter % duplicates
Specifications Limitations
• Specify the components and specifications: Parameters /QC Method Equipment
Roboter
Kit
Sequencer
Software
Blood FFPE Frozen tissue cfDNA
Specimen
• Define accuracy and precision: Reproducibility (3x samples same preparation) Repeatability ( 3x replicates different preparation)
>95% concordance
¾DNA fragmentation vs. tagmentation Validation
Adapt QC parameter : DNA concentration, enzyme concentration, incubation time.
• Tagmentation (enzymatic) : 3 different Nextera Kits, same Lot nr., reference material DNA, Beckman automated
• Fragmentation (mechanical) : Covaris method, reference material DNA
¾DNA tagmentation
Validation
• DNA tagmentation in FFPE material Mechanical Covaris : FFPE DNA extraction + Fragmentation Nextera Tagmentation
#3 FFPE
Over-fragmentation = loss of material
library complexity
PCR duplicates
Downstream effects !!
Specimen % duplicates
Blood 3 – 20%
Fresh Frozen (FF) 30%
FFPE 60 – 85%
DNA fragmentation % duplicates
Mechanical 3 – 10 %
Enzymatic 19 - 30%
coverage
¾Enrichment Kits Validation
Agilent Sure-Select Custom Kit
TruSightTM Cancer Kit
Average Coverage plot , 500 samples, coding exons ATM gene
• Custom enrichment Kits vs. non-custom
¾Custom Enrichment Kits
Validation
AGRN - Myasthenic Congenital Syndrome
Kit version
% ARX >30X
Coverage variance
Master v.01 74.26 2.959
Master v.02 90.57 1.530
Master v.03 98.75 0.066
Master v.04 98.84 0.045
v01
v02
v04
exon 1 exon 2
• Improve coverage (i.e. exon 1 GC-rich regions).
Validation ¾Custom Enrichment Kits
AMT- Glycin encephalopathy Custom track: bait design
• Include intronic regions with known pathogenic variants to increase diagnostic yield
LOVD track
¾Library pool Validation
• Measurement of the library pool concentration with the bioanalyzer traces before loading the flow-cell
flow cell overload Error rate Sequencing quality
¾Sequencing quality Validation
Blood-DNA (150-cycle)
• Good quality is not always “good quality”
cffDNA (33-cycle)
cffDNA (33-cycle) mapping
¾Pipeline Validation
• Venn diagrams summarizing concordance of called variants (SNVs / INDELs) by different callers. (Hwang et al. 2015)
Alignment GRCh37/hg19
SNV/Indel Variant calling
Variant filter
Variant annotation RefSeq
Population DBs Variant filter
Select SNVs in Target regions
NGS DB
Variant interpretation Disease
Associated DBs
Run statistics
% Target bases coverered
% Mapped % Duplicates
% Reads on target
% Base quality
Variant statistics
Clinical DBs
Pathogenicity prediction
Coverage Statistics
Coverage per base
Coverage per exon RefSeq isoform
Coverage Plots ROI
Analyzable Region
HPO
# variants hom vs. het
# substitutions # indels
# Ts vs. Tv
¾Components of a NGS Pipeline Validation
CNV Variant calling
Normalization Reference set CNV-neg
Intra-Run
Select SNVs in Target regions
Variant filter custom
Variant annotation
¾Pipeline Validation
• Specify all pipeline components (i.e. third party softwares, in-house, version)
• Record any modifications made to open-source softwares
• Describe criteria used for annotation and filtering of variants
• Pipeline accuracy must be validated for each variant type (SNV, CNV, indel…)
• Define analytical performance (sensitivity, specificity, PPV)
• Define specifications (i.e. average depth of coverage, minimum depth of coverage)
• Define limitations (i.e. indel size, large CNVs, structural variants, mosaicism)
• Different pipelines will require different specifications
¾Pipeline Validation
• Sensitivity (true positive rate) TPR=TP/(TP+FN) TPR reflects the frequency of FNs. • Specificity (true negative rate) TNR=TN/(TN+FP) TNR reflects the frequency of FPs.
• Precision (Positive Predictive Value) PPV = TP/(TP+FP) PPV relates to the likelihood that a variant call is a TP.
• Do not use “no calls” or “invalid calls” in TRP, TNR, or PPV calculations. • Do not use calls in regions which “a priori” are known to increase limitations
(pseudogenes, paralogs, low complexity)
“Concordance/discrepancy in variant calling compared to a reference dataset”
¾Reference materials Validation
FFPE, frozen Tumor, cffDNA, Structural Multiplex HDx Reference data sets ($)
Genome in a bottle (CEPH-Utah, Ashkenazi trio, East Asian) Data from 12 NGS technologies (WGS and WES) High confidence variant set (SNVs, INDELs and large structural variants) Reference data sets publicly available
Pilot 3 deep exon sequencing 906 genes (HapMap 3)
Platinum genomes (17 individuals , CEPH pedigree 1463)
HapMap and CEPH populations cell lines and DNA
¾Variant calling
Validation
1000 Genomes Data Set GIAB Dataset RM8398 (NA12878)
Sensitivity: 99.89% PPV: 99.83%
NA12889, NA12890
(FP)
(TP)
(FN)
FP
FN
TP
2986
2 (FP)
(TP)
• Analysis of 2986 variants (SNVs, INDELs). In-house pipeline vs. two reference data sets.
• All variants in clinical relevant genes, coverage 98% ROI >20X
¾Variant calling Validation
Orthogonal Sanger sequencing validation of NGS variants has
limited utility
0
20
40
60
80
FPs FNs Sangerfailed
64
42
73 0 2
1000G in-house
Sanger Confirmation
7.6%
1000 Genomes Data Set NA12889, NA12890
(FP)
(TP)
(FN)
FP
FN
TP
• Analysis of 2986 variants (SNVs, INDELs). In-house pipeline vs. two reference data sets.
• All variants in clinical relevant genes, coverage 98% ROI >20X
¾Variant calling
Validation
• Influence of the coverage on the overall performance of the variant calling • Mapping data was down sampled randomly to the defined coverage
thresholds
“do not rely on regions with a coverage less than 20X ”
0
2
4
6
8
> 30% 30X > 20% 30X > 15% 30X > 10% 30X
0 01
210
78
FNs FPs
min. 30X min. 20X min. 15X min 10X
¾Variant calling
Validation
• Assessment of coverage and quality specifications. Sanger analysis of 572 variants.
a: LBQ - 236 low base quality (Q<30) b: LBC - 224 low base coverage (<20X) c: LBQ & LBC - 112 both LBQ and LBC (Q<30, <20X)
“base quality is the determining factor in variant detection”
0
10
20
30
40
50
a: LBQ b: LBC c: LBQ & LBC
11
46
4
48
10
24
confirmed not confirmed
19%
81%
82%
18%
14%
85%
¾Limitations / Pseudogenes Validation
• 1556 genes (3876 Kb) associated with Mendelian disorders.
Number of Genes Kb of sequence
Pseudogen.org Ensembl UCSC
9.5% 26.2%
Analysis limitations !
¾Pipeline / Pseudogenes Validation
• Influence of “complex” regions on pipeline validation
“ sensitivity drops noticeably when including limiting regions into the validation data”
Validation ¾Pipeline / Mosaicism
PIK3CA (AD) NM_006281:c.3145C>G:p.Gly1049Ser chr3:178952090-178952090
PIK3CA (AD) NM_006218:c.2740G>A:p.Gly914Arg chr3:178947865-178947865
• Mutant allele frequencies range from 3 to 30%.
¾Pipeline / Annotation Validation
chr1:114841792-114841797
• Complex variants have multiple alignment representations.
delCAGTGA insTCTCT
delCAGTGAinsTCTCT
delTinsTC G>C A>T
insT delCT G>C A>T
¾Test Validation
• Define for which regions of the gene panel the diagnostic test will be able to detect variants and which not because of technical limitations
ROI (region of interest) / Clinical Target
Technical Target
Reportable Range
¾Test limitations / Technical target Validation
• Part of the clinical target is not included in the technical target.
Technical Target : inlcudes only NM_000264 (LRGt1) (Illumina TruSight Cancer Kit)
Clinical Target:
PTCH1 gene : Basal cell nevus syndrom
Validation
• Is the test able to analyze the disease causing pathomechanism?
Germline deletions of the 3' region in EPCAM inactivate MSH2 in about 1% of individuals with Lynch syndrome.
EPCAM – Lynch syndrome
¾Test limitations / Technical target
¾Test limitations Validation
• Are non-coding exons relevant? APC : Adenomatous polyposis coli (FAP)
¾Test limitations / Technical target
¾Test Limitations / Low complexity Validation
ARX - Epileptic encephalopathy
• Pathogenic variants located in low complexity regions
Kit version % ARX >30X
Coverage variance
Master v.01 53.41 3.574
Master v.02 77.35 2.807
Master v.03 73.32 0.809
Master v.04 77.41 0.385
Validation
PMS2 PMS2CL
WT mut
Variant located at PMS2
¾Test Limitations / Pseudogenes
PMS2 : Colorectal cancer , Lynch Syndrome
¾Test Limitations / Pseudogenes Validation
c.2186_2187delinsG
Variant is not located at PMS2
WT mut
Validation
ABCC6 : Arterial calcification of infancy
BLAT
¾Test Limitations / Pseudogenes
ABCC6
Validation
ABCC6 : Arterial calcification of infancy
BLAT
¾Test Limitations / Pseudogenes
pseudogene ABCC6P1
Validation ¾Test Limitations / Paralogs
COL6A3 : Ullrich congenital muscular dystrophy , Dystonia, Bethlem Myopahty
BLAT
¾Test Limitations / Paralogs Validation
PKD1 : Polycystic kidney disease in adults
Del/Dup (CNV); 26
Mutation Scanning; 1
Sanger Sequencing; 17
Next Gen Sequencing; 23
¾Test Validation
Genetic testing for PKD1 Total nr. of laboratories: 67
¾Paralogs Validation
KCNJ18 gene (Kir2.6 channel) associated to susceptibility periodic paralysis (TPP) Potassium channel paralogs Kir2.2 (KCNJ12) and Kir2.5 (KCNJ17)
>99% identity
¾Paralogs Validation
KCNJ18 gene (Kir2.6 channel) associated to susceptibility periodic paralysis (TPP) Potassium channel paralogs Kir2.2 (KCNJ12) and Kir2.5 (KCNJ17)
>99% identity
Local realignment BWA-MEM/GATK
¾Test Validation
Genetic testing for KCNJ12 Total nr. of laboratories: 11
Del/Dup (CNV); 2
Mutation Scanning; 3
Sanger Sequencing;
1
Next Gen Sequencing;
5
Genetic testing for KCNJ18 Total nr. Of Laboratories: 11
Del/Dup (CNV); 2
Mutation Scanning; 0
Sanger Sequencing; 4
Next Gen Sequencing; 5
Questions?