Download - Structural Variation Detection
Review Paper digit
Structural Variation Detection
Structural Variation Detection
Review Paper digit
Table of contents
• Detection of structural DNA variation from next generationsequencing data: a review of informatic approaches
• The software pipeline digit
Structural Variation Detection
Review Paper digit
Detection of structural DNA variation from next generation sequencingdata: a review of informatic approaches
Authors: Haley J. Abel1, Eric J. Duncavage2
(1) Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
(2) Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
Structural Variation Detection
Review Paper digit
Definition
Structural DNA variation is generally defined asvariation in a DNA region larger than 1 kb andincludes several classes such as translocations,inversions, insertions/deletions and copy numbervariations (CNVs).
Structural Variation Detection
Review Paper digit
Methods
• Cytogenetics:unbiasedBUT limited resolution/sensitivity (350-500 band level)
• FISH - Fluorescence in situ hybridization:increased resolution, ability to test fixed interphase cells, faster turnaround time,greater sensitivityBUT evaluation of multiple loci requires multiple probes/assays ⇒ increasingcomplexity
• Microarrays:especially reliable for CNV and loss of heterozygosityBUT unable to detect balanced translocations
• Next Generation Sequencing:ability to detect full range of genetic variation ⇒ potential to streamline testing byusing a single analysis platformBUT dependent on coverage ⇒ susceptible to GC bias
Structural Variation Detection
Review Paper digit
NGS - Methods
• Depth of coverage analysis
......
• Discordant read pair analysis
......
......
• Split read analysis
......
......
Structural Variation Detection
Review Paper digit
Tools
Structural Variation Detection
Review Paper digit
Translocation and Inversion Detection
Structural Variation Detection
Review Paper digit
Translocation and Inversion Detection
Discordant pair analysis:
• sensitive but low breakpoint resolution and low specificity• repetetive regions on top of beeing a source of false positives drivetranslocations (difficult to separate from false positives)
• Many methods try heuristic cut offs to improve specificity:• VariationHunter and Hydra consider multiple, high scoring mappings if
available• GASVPRO tries to improve specificity by combining discordant pair
and coverage analysis
Split read analysis: excellent breakpoint resolution (up to single baseresolution), but requires much higher coverages.
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
Discordant pair analysis:• performs best on large deletions. struggles with dublications• cannot detect large insertions with the usual strategy due to pairs notspanning the dublication
• cannot detect large insertions with the usual strategy due to pairs notspanning the dublication
• Pindel pieces translocation calls together via pattern growth algorithmto find large insertions
Structural Variation Detection
Review Paper digit
Copy Number Variation DetectionDepth of coverage analysis:
• DNA• Main problem is accounting for factors that modify read depth like GC
bias• event-wise testing (EWT) algorithms rely purely on deviations in
coverage from the sample’s mean depth. GC content is adressed byanalysing the genome bin wise.
• SegSeq, CNVnator, CNAseg, CNV-seq compare the same region acrossmultiple samples (control samples). Methods make also use ofbins/partitions and rely on coverage ratios which permit finer CNVmapping.
• Exome• target-capture-data increases GC bias• small size of targets makes paired normals or population controls a
requirement
Structural Variation Detection
Review Paper digit
Copy Number Variation Detection
• Exome methods calculate local CNV first and then merge themtogether with various strategies
• CONTRA: uses circular binary segmentation for merging• CoNVEX: denoises coverage ratios with a discrete wavelet transformand then uses a Hidden Markov Model to identify gains and losses
• ExomeCNV: models B-allele frequencies to detect loss ofheterozygosity
• Some methods try to find sporadic CNVs in population exome data bynormalizing read count with principal component analysis
Structural Variation Detection
Review Paper digit
Insertion and Deletion Detection
Structural Variation Detection
Review Paper digit
Insertion and Deletion Detection
• Alignment based:• offered by many packages: SAMtools, GATK, VarScan• usually rely on probabilistic models to make indel calls• Dindel and Stampy rely on this methods but employ filters to
differentiate common errors from true indels.• all of these methods require considerable validation• insertion detection is limited to 15% of total read length
• Split read based:• Suitable for medium sized indels• High false-positive rate, because no probabilistic models discriminate
between alignment errors and true events
Structural Variation Detection
Review Paper digit
Conclusion
• There is currently no single informatic method capable of identifyingthe full range structural DNA variation.
• multiple complementary tools are required for robust variant detection
• Since methods can perform differently based on assay design,extensive validation is required for clinical use.
Structural Variation Detection
Review Paper digit
digit - A tool for detection and identification of genomicinter-chromosomal translocations
Authors: Richard Meier1,4, Stefan Graw1,4, Julian R Molina3, PeterBeyerlein1, Devin Koestler2, Jeremy Chien4
(1) Technical University of Applied Sciences Wildau, 15745 Wildau, Germany(2) Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160(3) Department of Medical Oncology, Mayo Clinic, Rochester, MN 55905(4) Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS 66160
Structural Variation Detection
Review Paper digit
Goals of the project
• Interchromosomal translocation detection utilizing mate-pairsequencing data
• Handle artifacts and robustly remove false positive calls
• Investigate translocation profiles of populations / trait associatedgroups
Structural Variation Detection
Review Paper digit
Mate-pair sequencing
sequencing
adapter ligationfragmentation
circularisation
fragmentation
genome / chromosome
template
terminalfragment
read1 read2
Structural Variation Detection
Review Paper digit
digit overview
MVM
Den
sity
01
23
4
1.0 1.5 2.0
rejected approved
chromosome_1
chromosome_2
read_1 read_2
preprocessed read pairs
retain discordantlymapping read pairs
find read pairclusters
cluster_Bcluster_A . . .. . .calculate MVMs foreach pair and filterout low value pairs
recluster remainingread pairs
compare samples and search forgroup associations
called translocations
chr14:1573290-158941 & chr22:2732247-2735312
chr2:11002738-11002738 & chr3:3763766-3766175
chr11:1573290-158941 & chr17:1147275-11149839
chr5:25819112-25821940 & chr9:5151006-5154147. . . . . . . . .. . .
sample_1
sample_4
sample_5
sample_9
discordant read pair cluster
group associated super cluster
concordant pairs
discordant pairs
threshold
Structural Variation Detection
Review Paper digit
Mapping validity measure (MVM)
... ...
AC T GG G A CT A C T ACG TA C G T
AC T GG G A CT G C T ACG G AC CC A GG CT
G A CT A C T ACG
TA C G T
G AC CC A GG CT
2kb
mapper assignsread to region
mapper assignsread to region
chromosome A
chromosome B
G T A T C C CA A TC G C AT ......
......
but
• The two reads of a read pair are remapped to both regions the mapping softwareoriginally assigned them to.
• If a read maps equally well to both regions it is impossible to resolve the readpair’s origin and it is rejected.
• The MVM judges how ambiguous the mappability of a read pair is.
• The MVM distribution of concordant (well behaved) read pairs in a sample areused as internal standard to determine a filtering threshold.
Structural Variation Detection
Review Paper digit
Simulated data
lStructural Variation Detection
Review Paper digit
Real dataSamples achieved a good separation between ambiguous and distinct readpairs via MVM thresholds across the board.
concordantdiscordantthreshold
1.0 1.5 2.0 2.5
01
23
45
sample LU526
N = 749 Bandwidth = 0.02034
Den
sity
1.0 1.5 2.0 2.5
01
23
4
sample LU748
N = 461 Bandwidth = 0.04017
Den
sity
1.0 1.5 2.0 2.5
02
46
8
sample LU271
N = 641 Bandwidth = 0.01287
Den
sity
1.0 1.5 2.0 2.5
01
23
45
6
sample LU820
N = 534 Bandwidth = 0.02189
Den
sity
1.0 1.5 2.0 2.5
01
23
4
sample LU1160
N = 268 Bandwidth = 0.05798
Den
sity
1.0 1.5 2.0 2.5
01
23
4
sample LU1184
N = 370 Bandwidth = 0.04009
Den
sity
1.0 1.5 2.0 2.5
01
23
45
sample LU1434
N = 391 Bandwidth = 0.02477
Den
sity
1.0 1.5 2.0 2.50
24
68
sample LU1466
N = 585 Bandwidth = 0.01317
Den
sity
Structural Variation Detection
Review Paper digit
Real data
• We processed 20 patient samples from a non-cancer background and35 patient samples with a lung cancer background.
• After comparing the two populations we retrieved 218 sample specificevents, 160 of which were from cancer.
• 328 translocation calls were shared between 2 or more samples
• 16 translocations were shared between cancer samples exclusively.
• 13 translocations shared between cancer and normal samples werelabeled potentially disease relevant.
Structural Variation Detection
Review Paper digit
Translocations exclusively found in cancer
Structural Variation Detection
Review Paper digit
Translocations enriched in cancer
Structural Variation Detection
Review Paper digit
Conclusion
• The method sucessfully reduces the false positives rate.
• Group comparision and population analysis is working, but will requiremore samples to make reliable judgements in the future.
• Comparisions with other tools are running as we speak.
• Combining strategies from different tools might be valuable to lookinto in future projects.
Structural Variation Detection
Review Paper digit
Questions
?Structural Variation Detection