Investigating rare diseases with Agilent NGS solutions
Chitra Kotwaliwale, Ph.D.
April 27, 2017
April 27, 2017
Research use only. Not for use in diagnostic purposes.
1
Rare diseases affect 350 million people worldwide
April 27, 2017
Research use only. Not for use in diagnostic purposes.
2
60 million affected in
the US, Europe
50% affected individuals are
children
7,000 rare diseases
80% are genetic
Rare diseases can have devastating impact on health
April 27, 2017
Research use only. Not for use in diagnostic purposes.
3
Cystic Fibrosis
• Excessive mucus in lungs and pancreas causes respiratory failure and inability to digest food
• Median survival age is 40 years
• Affects more than 30,000 people in the US; 70,000 WW
Leukodystrophy
• Progressive diseases that affect brain, spinal cord, peripheral nerves affecting movement, vision, hearing, balance, ability to eat etc.
• Children affected with leukodystrophy live 5-10 years
• Affects ~60,000 people in the US
Retinitis Pigmentosa
• Retinal degeneration ultimately causes blindness
• Most people with RP are legally blind by age 40
• Affects ~100,000 people in the US; 1.5 million WW
Genetic causes of rare diseases can be complex
April 27, 2017
Research use only. Not for use in diagnostic purposes.
4
Cystic Fibrosis
1 gene
Leukodystrophy
30 genes
Retinitis
Pigmentosa
77 genes
Wildtype CFTR
Cl-
Mutation in CFTR
Healthy neuron Damaged neuron
Sin
gle
Gene
Many G
enes
Rods & cones in healthy retina Rods & cones in RP
Complexity of symptoms makes it difficult to detect rare diseases
April 27, 2017
Research use only. Not for use in diagnostic purposes.
5
Complex
genetics
Complex
phenotype
Average physician visits before receiving a diagnosis = 7
Average time from symptom onset to accurate diagnosis = 4.8 yrs
Percent rare disease cases that are undiagnosed = ?
Rare diseases are progressive
Faster diagnosis Early intervention Improved quality of life
Source: Engel et al., Journal of Rare Disorders
April 27, 2017
Research use only. Not for use in diagnostic purposes.
6
Exome sequencing has enhanced our understanding of rare diseases
0
40
80
120
160
2009 2010 2011 2012 2013 2014 2015 2016 2017
• 3,710 genes with
phenotype
causing mutations
in OMIM
• 197,952 mutations
in HGMD
Number of Novel Rare Disease Genes Identified
~130 genes
Source: Boycott et al., Nature Reviews Genetics
April 27, 2017
Research use only. Not for use in diagnostic purposes.
7
Present since the inception of exome sequencing
Agilent launched the first whole exome sequencing kit
2013 2014 2015 2016 2017 2009 2010 2011 2012
First Whole Exome
Sequencing kit
launched
Exome
customization
enabled
Clinical Research
Exome v2
SureSelect Human
All Exon V6
Focused Exome
April 27, 2017
Research use only. Not for use in diagnostic purposes.
8
April 27, 2017
Research use only. Not for use in diagnostic purposes.
9
Agilent pioneered whole exome sequencing workflow
DNA
Extraction Library Prep
Target
Enrichment Sequencing
Data
Analysis
SureSelect baits are generated using a high-fidelity oligo synthesis process
April 27, 2017
Research use only. Not for use in diagnostic purposes.
10
3) Deblock
1) Coupling
2) Oxidation
Repeat n times
Depurination
side reaction
N1 O
O
P O
RO
O
N2 O
O
P O
RO
O
Ni O
O
P O
RO
O
HO
Inkjet
Flood
Long length synthesis is achieved by improved cycle
yield
•↑ Coupling efficiency
•↓ Depurination
•↑ Consistency
RNA
Baits
High fidelity process ensures superior quality baits for target enrichment
April 27, 2017
Research use only. Not for use in diagnostic purposes.
11
%𝑭𝑳 = (𝑪𝒀 ∗ 𝑫𝒀)𝒏𝒕 %FL= %Full Length CY=Synthesis Cycle Yield DY=Depurination Cycle Yield
Errors per kb Oligo Synthesis Fidelity
No need to QC individual
oligoes
More accurate capture
Agilent SureSelect provides the most versatile platform for target enrichment
April 27, 2017
Research use only. Not for use in diagnostic purposes.
12
Exomes
Custom panels
Catalog panels
Three pillars that guide Agilent exomes
April 27, 2017
Research use only. Not for use in diagnostic purposes.
13
Performance Content Flexibility
Agilent SureSelect Exomes
April 27, 2017
Research use only. Not for use in diagnostic purposes.
14
• Comprehensive exome optimized for rare & inherited disorders
SureSelect Clinical
Research Exome V2 (CREv2)
• Comprehensive exome for translational and clinical research
SureSelect All Human Exon V6
• Targeted exome with optimized coverage of only the disease associated genes
SureSelect Focused Exome
New!
Performance, Content, Flexibity in CREv2
April 27, 2017
Research use only. Not for use in diagnostic purposes.
15
Performance Content Flexibility
Enhanced coverage
of disease associated
genes
Optimized content
including non-coding
regions associated with
disease
Design customizability
to further enhance your
exome
CREv2 provides enhanced coverage of disease-associated genes
April 27, 2017
Research use only. Not for use in diagnostic purposes.
16
100x average sequencing depth; 67.3Mb design; 6.5Gb sequencing
5,109 disease-
associated genes
Performance
CREv2 provides high SNP and Indel concordance in targeted regions
April 27, 2017
Research use only. Not for use in diagnostic purposes.
17
Samples SNP Concordance Indel Concordance
Hom Het
Sample 1 99.91% 99.41% 97.15%
Sample 2 99.91% 99.29% 96.63%
Sample 3 99.91% 99.36% 96.95%
Sample 4 99.93% 99.40% 97.48%
Sample 5 99.95% 99.48% 97.19%
Sample 6 99.91% 99.29% 96.5%
Sample 7 99.95% 99.46% 97%
Sample 8 99.92% 99.42% 97.32%
SNP concordance calculated using HapMap data
Indel concordance calculated using dbSNP data
Agilent exomes provide uniform coverage regardless of GC content
April 27, 2017
Research use only. Not for use in diagnostic purposes.
18
0
50
100
150
200
1 2 3 4 5 6 7 8 9 10
as.factor(Bin)
Avg
_C
ov
0
50
100
150
200
1 2 3 4 5 6 7 8 9 10
as.factor(Bin)
Avg
_C
ov
Agilent CREv2 Vendor ID
GC High Low GC High Low
0
50
100
150
Pearson’s r = 0.6 Pearson’s r = 0.27 200
0
50
100
150
200
Norm
aliz
ed c
overa
ge
Norm
aliz
ed c
overa
ge
All exomes sequenced to the same average sequencing depth
Exons were divided into deciles based on GC to calculate normalized coverage
Smaller deviation from the mean
across GC bins in CREv2
CREv2 provides consistent coverage in high and low GC regions
April 27, 2017
Research use only. Not for use in diagnostic purposes.
19
All exomes sequenced to the same average sequencing depth
Agilent CREv2
Vendor ID
41% GC 25% GC
29% GC 45% GC
Agilent CREv2
Vendor ID
Low coverage of
AT-rich exon in
Vendor ID exome
Low coverage of
AT-rich exon in
Vendor ID exome
CREv2 provides the most comprehensive coverage of disease-associated regions
April 27, 2017
Research use only. Not for use in diagnostic purposes.
20
Disease association information
available with exome!
Optimized coverage of disease-associated genes
Plus
Coverage of splice sites & deep intronic regions
Coverage of other non-coding regions
…….associated with disease
Curated in collaboration with Dr. Madhuri Hegde,
Emory University
Content
CREv2 provides superior disease relevant content
April 27, 2017
Research use only. Not for use in diagnostic purposes.
21
Agilent CREv2
Vendor ID
Pathogenic variant in 5’ UTR of GJC2
Pathogenic variant associated with leukodystrophy only detectable by Agilent CREv2
but not competitor ID exome
A>G SNV
Mutations that cause retinitis pigmentosa frequently occur in non-coding regions
April 27, 2017
Research use only. Not for use in diagnostic purposes.
22
Agilent CREv2
Vendor ID
Vendor R
Agilent CREv2
Vendor ID
Vendor R
CREv2 provides more disease-associated regions
April 27, 2017
Research use only. Not for use in diagnostic purposes.
23
ClinVar
Pathogenic/Likely
Pathogenic
Leukodystrophy
Variants covered
ClinVar
Pathogenic/Likely
Pathogenic Retinitis
Pigmentosa
Variants covered
Agilent CREv2 98.1% 95.3%
Competitor ID 90% 87.9%
Competitor R 90.7% 94.6%
Accelerate the detection of disease-causing mutations with CREv2
April 27, 2017
Research use only. Not for use in diagnostic purposes.
24
0
10
20
30
40
50
Minimum Average Maximum
Number of Methods
0
5000
10000
15000
20000
Minimum Average Maximum
Cost ($)
Average time to detect leukodystrophy = 8 yrs
Comprehensive coverage of disease associated regions means
Fewer method iterations
Lower cost
Faster detection
Source: Richards et al., Neurology
Build your perfect exome with SureSelect customization capability
April 27, 2017
Research use only. Not for use in diagnostic purposes.
25
Flexibility
Unmatched flexibility in customization of content and
formats.
Use existing designs as a base to optimize the
exome for your research needs
Copy number changes in rare and inherited disorders
April 27, 2017
Research use only. Not for use in diagnostic purposes.
26
CNVs account for 10-15% pathogenicity in rare diseases
April 27, 2017
Research use only. Not for use in diagnostic purposes.
27
10-15% of pathogenic
variants associated with
rare disease are copy
number changes
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
Copy number variants in HGMD Database
Some samples have multiple underlying pathogenic variants
April 27, 2017
Research use only. Not for use in diagnostic purposes.
28
~5% of samples have multiple pathogenic variants
~12% samples with dual variatns include combination of CNVs and
SNVs
Missed if exome sequencing or CNV analysis performed alone
Detect CNVs, LOH, SNVs &
indels in one NGS assay
Research use only. Not for use in diagnostic purposes.
April 27, 2017
29
OneSeq Target Enrichment: One Assay, All Variants
1) Evenly spaced genome-wide baits
ClinGen disease
associated regions
ClinGen disease
associated regions
2) High density baits in ClinGen disease associated regions
Gene A Gene B
3) User-defined baits in exonic regions
Copy number
& LOH
SNVs & indels
Research use only. Not for use in diagnostic purposes.
April 27, 2017
30
OneSeq: Tailored for your needs
OneSeq High Resolution OneSeq Low Resolution
CNV resolution genome-wide 300 kb 1 Mb
CNV resolution in ClinGen
regions 25-50 kb 1 MB
LOH 5 Mb 10 Mb
SNV & Indels
Combine OneSeq CNV
backbone with any
SureSelect exome, ClearSeq
gene panel or SureSelect
custom region
Combine OneSeq CNV
backbone with any
SureSelect exome, ClearSeq
gene panel or SureSelect
custom region
Sequencer recommendation High or medium throughput
sequencers
High, medium or benchtop
sequencers
Region targeted by CNV
backbone 12 Mb 2.7 Mb
Research use only. Not for use in diagnostic purposes.
April 27, 2017
31
Can OneSeq detect all the important CNVs?
OneSeq 300kb (25-50 kb resolution in ClinGen regions) and 1Mb backbones have
sufficient resolution to detect most CNVs in ClinGen database
April 27, 2017
Research use only. Not for use in diagnostic purposes.
32
CNVs in ClinGen Database
1) 4,579 CNVs
2) 93% are > 300kb
3) 81% are > 1Mb
Pathogenic
Likely
Pathogenic
Likely
Pathogenic
Likely
Benign
Benign
OneSeq can reliably detect CNVs identified by microarrays
Chromosome Aberration
type
CGH
aberration
size [kb]
OneSeq
aberration size
[kb]
OneSeq avg
log2 ratio
chr13 del 12427 13335 -0.89
chr15 del 2240 1667 -0.37
chr16 del 772 863 -0.43
chr14 amp 987 544 0.54
chr6 amp 370 372 0.61
chr2 del 828 307 -0.46
chr17 amp 163 201 0.49
chr22 amp 172 191 3.00
Detection of 8 CNVs >150 kb with both OneSeq 300 kb and CGH+SNP 4x180K
microarrays in Coriell sample NA08254
Research use only. Not for use in diagnostic purposes.
April 27, 2017
33
OneSeq can detect intergenic CNVs Duplication upstream of SOX9
Customer
generated
Array data
Case published in Vetro et al, EJHG (2014), 1-8, 1018-4813/14
Microarray
OneSeq can detect CNVs in non-coding regions
OneSeq
4x180K
Catalog design
For Research Use Only. Not for use in diagnostic
procedures.
OneSeq can detect uniparental disomy
Detection of Uniparental Disomy 15 in Coriell
sample NA20409
Copy number
[Log2Ratio]
LOH data
[B allele freq]
CGH+SNP
microarray
confirmation
data
Known common CNV
OneSeq data:
Data analysis bottleneck in NGS workflow
April 27, 2017
Research use only. Not for use in diagnostic purposes.
37
FASTQ
BAM
VCF
~20,000 variants
Disease-associated variant
Exome Sequencing
April 27, 2017
Research use only. Not for use in diagnostic purposes.
38
NGS workflow needs substantial compute infrastructure
Sample
prep Library Prep
Target
Enrichment Sequencing
Data
Analysis
Compute intensive
Disparate tools
Time-consuming
Alissa software platform – from raw data to answer
April 27, 2017
Research use only. Not for use in diagnostic purposes.
39
Make your work flow with Agilent Alissa Clinical Informatics for NGS
One single platform from raw reads to draft lab reports
Comprehensive QC metrics at your fingertips
A team of experts that go the road with you
Agilent NGS solutions for rare and inherited disorders
Research use only. Not for use in diagnostic purposes.
April 27, 2017
40
Sample QC