variant quality score recalibraon -...
TRANSCRIPT
![Page 1: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/1.jpg)
VariantQualityScoreRecalibra2on
Assigningaccurateconfidencescorestoeachputa2vemuta2oncall
talks
![Page 2: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/2.jpg)
YouarehereintheGATKBestPrac2cesworkflowforgermlinevariantdiscovery
Analysis-Ready Variants
111Raw Reads
Raw Variants IndelsSNPs
Analysis-ReadyReads
Indel Realignment
Base Recalibration
SNPs & Indels
Variants
IndelsSNPs
VariantAnnotation
Variant Evaluation
look good?
use in projecttroubleshoot
111Analysis-ReadyReads
Genotype Likelihoods
Joint Genotyping
Analysis-Ready
No
n-G
AT
K
Mark Duplicates& Sort (Picard)
Var. Calling HC in ERC mode
separately per variant type
Variant Recalibration
Map to Reference
BWA mem GenotypeRefinement
Data Pre-processing Variant Discovery>> >> Callset Refinement
![Page 3: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/3.jpg)
Raw,high-sensi2vitycallsetscontainmanyfalseposi2ves
• Muta2oncallingalgorithmsareverypermissivebydesign
• Howtofilter?– Hand-tunedhard-filteringrequires2meandexper2se– BeMertolearnwhatthefiltersshouldbefromthedataitself
• Mustenableanalyststotradeoffsensi2vityandspecificitydependingonprojectgoals
þ Buildingamodelofwhattruegene6cvaria6onlookslikewillallowustorank-ordervariantsbasedontheirlikelihoodofbeingreal
![Page 4: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/4.jpg)
Fromannota2onstomixturemodels
• Eachvarianthasadiversesetofsta2s2csassociatedwithit.
• Theseannota2onstendtoformGaussianclusters
• Wecanfita“Gaussianmixturemodel”totheannota2onsknownvariantsinourdataset.
• Anynewvariantcanbescoredbyevalua2ngtheassociatedannota2onsinthismodel.
![Page 5: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/5.jpg)
Variantannota2onsarethe“features”ofthemodel
22 49582364 . A G 198.96 . AC=3; AF=0.50; AN=6; DP=87; MLEAC=3;MLEAF=0.50;MQ=71.31; MQ0=22; QD=2.29; SB=-31.76 GT:DP:GQ 0/1:12:99 0/1:11:89 0/1:28:37
VCFrecordforanA/GSNPat22:49582364
AC No.chromosomescarryingaltallele
MLEAF MaxlikelihoodAF
AN Totalno.ofchromosomes MQ RMSMAPQofallreads
AF Allelefrequency MQ0 No.ofMAPQ0readsatlocus
DP Depthofcoverage QD QUALscoreoverdepth
MLEAC MaxlikelihoodACINFO
field
NotethatVQSRwillonlylookatINFOannota2ons;sample-levelannota2ons(genotype,ADetc)arenotused.
![Page 6: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/6.jpg)
Two steps: (1) train a model then (2) apply to callset
ModifiedfromDePristoetal.NatureGene2cs.2011
(1)TrainmodelusingHapMap (2)Applymodeltocallset
Basicidea:trainingonhigh-confidenceknownsitestodeterminetheprobabilitythatothersitesaretrue
![Page 7: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/7.jpg)
(1) Training the model
(1)Trainmodelusinge.g.HapMap
• Thistellsuswhatgoodvariantslooklike• Asimilarmodelforthevariantsinourcallsetthatleastlooklikegoodvariantsisalsocreated
(badmodel,nobiscuit!)• Allvariantscannowberankedbasedonthera2obetweentheirscoresinthegoodmodeland
thebadmodel(=VQSLOD)
• Wechooseatrainingset
• Variantsthatarebothinthetrainingsetandinourcallsetareselected.
• Wetrainthemodelusingtheannota2onsoftheselectedvariants
![Page 8: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/8.jpg)
(2) Applying the model to our callset
(2)Applymodeltocallset• Usingtherankingproducedbythemodel,filteringvariantsisaseasyasseengasinglethresholdvalue
• Anyvariantswhosescorefallsbelowthethresholdisfilteredout
Buthowdowesetthatthreshold?
![Page 9: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/9.jpg)
There are in fact two components to the model
• Anega6vemodelisalsobuiltduringtraining• Itrepresentstheprobabilityofvariantstobe
falseposi6ves
Nega2veModel
Posi2veModel
![Page 10: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/10.jpg)
The VQSLOD threshold is a tradeoff between TP and FP
Nega2veModel
Posi2veModel
ThresholdConsideredPosi2ve
ConsideredNega2ve
TruePos.
FalsePos.
VQSLOD(x)=Log(p(x)/q(x))
q
p
(VQSLODisdis2nctfromQUAL!)
![Page 11: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/11.jpg)
Roleoftrainingandtruthresources
Calibra2ngSensi2vity
Truthsetisusedfortransla6ngVQSLODvaluesintosensi6vity“tranches”
Trainingsetisusedforbuildingthemodel
![Page 12: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/12.jpg)
Wesetthethresholdbasedonsensi6vitytotruthdata
Calibra2ngSensi2vity
Densityofsitesintruthset
WhatthresholddoweneedtosettocaptureX%ofthesitesinthetruthset?
![Page 13: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/13.jpg)
VariantRecalibra2onsteps&tools
• BuildandApplythemodels(fromresourcesandcallset)➔ VariantRecalibrator
• UseVQSLODtofiltervariantsandwriteanewannotatedVCF➔ ApplyRecalibra6on
Number of Novel Variants (1000s)0 500 1000 1500 2000
Ti/Tv FDR
10
5
1
0.1
1.91
1.99
2.06
2.07
0 100 200 300 400
Cumulative TPsTranch−specific TPsTranch−specific FPsCumulative FPs
Ti/Tv FDR
10
2
1
0.1
1.92
2.04
2.05
2.07
0.0 0.2 0.4 0.6 0.8
Ti/Tv FDR
10
2
1
0.1
2.79
2.96
2.98
3.01
Evaluating novel variants
More Bias
Less Bias
HiSeq: training on HapMap
More Bias
Less Bias
Likely dbSNP errors
Gaussian mixture model fits
A HM3 1KG Trio
99.5
99.5
99.5
99.5
98.2
98.5
98.4
98.6
NR sensitivity (%):
HiSeq C
HiSeq
88.0
89.7
88.0
93.8
HM3
96.0
96.8
96.0
98.3
D
65.1
66.2
65.3
66.9 86.5 With imputation
82.3
82.8
82.4
96.7 NGS only 83.0
B
Heterozygous variants Homozygous
variants
Exome
Low-pass E
Analysis tranche
Analysis tranche
HiSeq: evaluating novel variants
HiSeq HM3
Analysis tranche
Number of Novel Variants (1000s)0 500 1000 1500 2000
Ti/Tv FDR
10
5
1
0.1
1.91
1.99
2.06
2.07
0 100 200 300 400
Cumulative TPsTranch−specific TPsTranch−specific FPsCumulative FPs
Ti/Tv FDR
10
2
1
0.1
1.92
2.04
2.05
2.07
0.0 0.2 0.4 0.6 0.8
Ti/Tv FDR
10
2
1
0.1
2.79
2.96
2.98
3.01
Evaluating novel variants
More Bias
Less Bias
HiSeq: training on HapMap
More Bias
Less Bias
Likely dbSNP errors
Gaussian mixture model fits
A HM3 1KG Trio
99.5
99.5
99.5
99.5
98.2
98.5
98.4
98.6
NR sensitivity (%):
HiSeq C
HiSeq
88.0
89.7
88.0
93.8
HM3
96.0
96.8
96.0
98.3
D
65.1
66.2
65.3
66.9 86.5 With imputation
82.3
82.8
82.4
96.7 NGS only 83.0
B
Heterozygous variants Homozygous
variants
Exome
Low-pass E
Analysis tranche
Analysis tranche
HiSeq: evaluating novel variants
HiSeq HM3
Analysis tranche
![Page 14: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/14.jpg)
NOTE:SNPsandIndelsmustberecalibratedseparately!
OriginalSNPs+originalIndels
RecalSNPs+RecalIndels
Pro-2p:RunVQSRtwiceinsuccessionaccordingtothisworkflow.Thatwayyouavoidhavingtosplitthem,recalibrateandcombinethemagain.
RecalSNPs+originalIndels
VariantRecalibrator
ApplyRecalibra6on
FirstpassinSNPmode,Indelswillbeleiuntouched
VariantRecalibrator
ApplyRecalibra6on
SecondpassinINDELmode,SNPswillbeleiuntouched
![Page 15: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/15.jpg)
VariantRecalibra2onworkflow
OriginalVCFfile
Recalibra2onfileTranchesfile
+recalibra2onplots
RecalibratedVCFfile
VariantRecalibrator
ApplyRecalibra6on
+Resources
![Page 16: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/16.jpg)
OriginalVCFfile
Recalibra2onfileTranchesfile
+recalibra2onplots
RecalibratedVCFfile
VariantRecalibrator
ApplyRecalibra6on
+Resources
VariantRecalibra2onworkflow
![Page 17: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/17.jpg)
VariantRecalibrator
• BuildtheGaussianmixturemodelusingthevariantsintheinputcallsetwhichoverlapthetrainingdata
TOOLTIPS
java–jarGenomeAnalysisTK.jar–TVariantRecalibrator\
–Rhuman.fasta\–inputraw.SNPs.vcf\–resource:{seenextslide}\–anDP–anQD–anFS–anMQRankSum{…}\–modeSNP\–recalFileraw.SNPs.recal\–tranchesFileraw.SNPs.tranches\–rscriptFilerecal.plots.R
SNPexample–seedocumenta2onforindelrecommenda2ons
![Page 18: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/18.jpg)
VQSRresources
-resource:hapmap,known=false,training=true,truth=true,prior=15.0hapmap_3.3.b37.sites.vcf-resource:omni,known=false,training=true,truth=false,prior=12.0omni2.5.b37.sites.vcf-resource:1000G,known=false,training=true,truth=false,prior=10.01000G.b37.sites.vcf-resource:dbsnp,known=true,training=false,truth=false,prior=2.0dbsnp_137.b37.vcf
TOOLTIPS
SNPexample–seedocumenta2onforindelrecommenda2ons
![Page 19: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/19.jpg)
Understanding“resources”TOOLTIPS
• Prior–Phred-scaledes2mateofdataaccuracy
• Training–useinputvariantsthatoverlapwiththesetrainingsitestobuildthemodel
• Truth-usethesetruthsitestodeterminewheretosetthecutoffinVQSLODsensi2vity.
• Known–onlyforrepor2ngpurposes,notusedinanycalcula2ons
![Page 20: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/20.jpg)
OriginalVCFfile
Recalibra2onfileTranchesfile
+recalibra2onplots
RecalibratedVCFfile
VariantRecalibrator
ApplyRecalibra6on
+Resources
VariantRecalibra2onworkflow
![Page 21: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/21.jpg)
Recalibra2onplotsshowaspectsofthemodel(foreachpossiblepairofannota2ons)
21
![Page 22: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/22.jpg)
Tranchesareslicesofthedatacorrespondingtopre-setsensi2vitythresholdvalues
Calibra2ngSensi2vity
90 99 99.9
Truthsensi2vity(%)
100
![Page 23: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/23.jpg)
OriginalVCFfile
Recalibra2onfileTranchesfile
+recalibra2onplots
RecalibratedVCFfile
VariantRecalibrator
ApplyRecalibra6on
+Resources
VariantRecalibra2onworkflow
![Page 24: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/24.jpg)
ApplyRecalibra2on
• Executesthedesiredsensi2vity/specificitytradeoffbyapplyingfilterstotheinputcallset
• Createsanew,filtered,analysis-worthyVCFfile.
• Addi2onallyeveryvariantisnowannotatedwithitsVQSLODscore.
TOOLTIPS
java–jarGenomeAnalysisTK.jar–TApplyRecalibra6on\
–Rhuman.fasta\–inputraw.vcf\–modeSNP\–recalFileraw.SNPs.recal\–tranchesFileraw.SNPs.tranches\–orecal.SNPs.vcf\–ts_filter_level99.0
SNPexample–seedocumenta2onforindelrecommenda2ons
![Page 25: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/25.jpg)
OriginalVCFfile
Recalibra2onfileTranchesfile
+recalibra2onplots
RecalibratedVCFfile
VariantRecalibrator
ApplyRecalibra6on
+Resources
VariantRecalibra2onworkflow
![Page 26: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/26.jpg)
OriginalSNPs+originalIndels
RecalSNPs+RecalIndels
Pro-2p:RunVQSRtwiceinsuccessionaccordingtothisworkflow.Thatwayyouavoidhavingtosplitthem,recalibrateandcombinethemagain.
RecalSNPs+originalIndels
VariantRecalibrator
ApplyRecalibra6on
FirstpassinSNPmode,Indelswillbeleiuntouched
VariantRecalibrator
ApplyRecalibra6on
SecondpassinINDELmode,SNPswillbeleiuntouched
NOTE:SNPsandIndelsmustberecalibratedseparately!
![Page 27: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/27.jpg)
• BeforeVQSR(inputvcf):
• AierVQSR(outputvcf):
• Hardfilteringvcf:
VQSRoutputVCF(vs.HardFilter)
![Page 28: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/28.jpg)
Didtherecalibra2onworkproperly?
• Commonerrormodes:– “Nodatafound”
->toofewvariantsincallset,seedocsforworkarounds
– “Annota2onXnotfoundforanyvariant”->annota6onnotpresentinfile,useVariantAnnotator
->ifrelatedtoInbreedingCoefficientprobablylessthan10samples,cannotusethisanno6on
– Screwed-uptrancheplots,lownovelTiTv->wrongdbSNPversion,usetheoneinthebundle
![Page 29: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/29.jpg)
NOTES
![Page 30: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/30.jpg)
• SmallernumberofvariantspersamplecomparedtoWGS->typicallyinsufficienttobuildarobustrecalibra6onmodel
ifrunningononlyafewsamples
Ø Analyzesamplesjointlyincohortsofatleast30Ø Ifnecessary,addexomesfrom1000GProject
• Whattolookforinsamplesforpaddingacohort:– Similartechnicalgenera2on(technology,capture,readlength,depth)– Similarethnicbackground
VariantRecalibra2on(VQSR)onWExdata
![Page 31: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/31.jpg)
• ALWAYSdothis: • NEVERdothis:
+
per-sampleGVCFs
BAMs 1000GBAMs
+BAMs 1000GVCF
GenotypeGVCFs
RawVCF
RawVCF
FilteredVCF
VQSR VQSR
FilteredVCFü X
HC
GenotypeGVCFs
HC
Howtoaddexomesfrom1000Gtoyouranalysis
![Page 32: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/32.jpg)
WhenshouldyouNOTrunVQSR?
• Non-humanorganismswhereknownresourcesareunavailableorinsufficientlycurated
• RNAseqdataàseeRNAseq-specificfiltering
• Cohortistoosmallandnoothersamplesareavailablefor“padding”thecohort
àUsemanualfilteringrecommenda2onsinstead
![Page 33: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/33.jpg)
YouarehereintheGATKBestPrac2cesworkflowforgermlinevariantdiscovery
Analysis-Ready Variants
111Raw Reads
Raw Variants IndelsSNPs
Analysis-ReadyReads
Indel Realignment
Base Recalibration
SNPs & Indels
Variants
IndelsSNPs
VariantAnnotation
Variant Evaluation
look good?
use in projecttroubleshoot
111Analysis-ReadyReads
Genotype Likelihoods
Joint Genotyping
Analysis-Ready
No
n-G
AT
K
Mark Duplicates& Sort (Picard)
Var. Calling HC in ERC mode
separately per variant type
Variant Recalibration
Map to Reference
BWA mem GenotypeRefinement
Data Pre-processing Variant Discovery>> >> Callset Refinement
![Page 34: Variant Quality Score Recalibraon - UCLAqcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr... · 2017. 2. 24. · MLEAC Max likelihood AC Note that VQSR will only look at INFO](https://reader033.vdocuments.us/reader033/viewer/2022060910/60a55d1b943b6452e92a3ac2/html5/thumbnails/34.jpg)
FurtherreadinghMp://www.broadins2tute.org/gatk/guide/best-prac2ces
hMp://www.broadins2tute.org/gatk/guide/ar2cle?id=39
hMp://www.broadins2tute.org/gatk/gatkdocs/
org_broadins2tute_s2ng_gatk_walkers_variantrecalibra2on_VariantRecalibrator.html
hMp://www.broadins2tute.org/gatk/gatkdocs/org_broadins2tute_s2ng_gatk_walkers_variantrecalibra2on_ApplyRecalibra2on.html
talks