variant analysis introduction
DESCRIPTION
Variant Analysis Introduction. Deanna M. Church Staff Scientist, NCBI. Short Course in Medical Genetics 2013. @ deannachurch. Steve Sherry, NCBI. BAM. FASTQ. BAM. FASTQ. VCF. VCF. VCF. VCF. http:// www.bioplanet.com / gcat. http:// www.ncbi.nlm.nih.gov /variation/tools/1000genomes. - PowerPoint PPT PresentationTRANSCRIPT
Variant Analysis IntroductionDeanna M. Church Staff Scientist, NCBI
@deannachurch Short Course in Medical Genetics 2013
sequences alignments genotype likelihoods individual variants1
10
100
1,000
10,000
100,000
size
(gig
abyt
es)
component
1092 genomes (low coverage + exome)
38.2M SNPs3.9M Short Indels and14K Deletions
FASTQBAM
VCF
VCF
FASTQBAM
VCF
VCF
Steve Sherry, NCBI
http://www.bioplanet.com/gcat
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Variation Databases
http://www.ncbi.nlm.nih.gov/snp
Collection of small nucleotide variation (SNVs) Typically <50 bpSome are polymorphicSome are rareSome are errorsSubmissions clustered to make reference variants (rsIDs)
Variation Databases
Blue variants are all T insertionsSubmitters submit in different part of the polyT tractNeed additional analysis to cluster these
Variation Databases
Collection of large-scale variationBreakpoint ambiguityComplex variants (chromothripsis)Challenging to compare variants from different methodsNo reference variants (yet)
http://www.ncbi.nlm.nih.gov/dbvar
Variant Call Ambiguitystart stop
Inner start Inner stop
Outer start Outer stop
Probes with decreased signal intensityProbes with expected signal intensity
breakpoint breakpoint
Inner start Inner stop
Variant Call AmbiguityOuter start Outer stop
Fosmid clone (40 Kb +/- 1 Kb)
20Kb Clone has an insertionrelative to the genome
Clone has a deletionrelative to the genome 60 Kb
Variation Databases
http://www.ncbi.nlm.nih.gov/clinvar
How confident am I that my variant call is correct?
http://www.bioplanet.com/gcat
Fonseca et al., 2012
Available NGS Alignersalready out of data
http://www.bioplanet.com/gcat
Alignment Test
Align back to the source
Simulated ReadsGood: know where the reads goNot so good: hard to simulate real data
http://www.bioplanet.com/gcat
Variant Calling Test
Variant Calling TestTransition /Transversion ratio (Ti/Tv)
A C
GTTransversions
Transitions
Random: 0.5Whole Genome: 2.0 – 2.1Exome: 3-3.5
Variant Calling Test
Note: Difficult to test variant calling independentlyfrom the aligner as they are often coupled.
Variant Calling Test
Benchmarking on known samples
NA12878NA19240
http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
Calls
Tests
cSRA
ConcordantDiscordantNA
Target audience: Clinical testing labsSubmissions from: Clinical and Research labs
https://main.g2.bx.psu.edu/
Variant Analysis Pipelines: Galaxy
Variant Analysis Pipelines: GalaxyWorkflows
Save themShare them
Can run on Amazon Cloud Large community
Reproducibility
Annotating Variants
NC_000001.10:g.170508656G>T
NC_000001.10:g.170508561T>A
NC_000001.10:g.170508573T>C
NC_000001.10:g.170508724T>C
Annotating VariantsMolecular Consequences (often predicted)
Damaging amino acid changeAffect a splice siteChange a regulatory feature
Functional Consequences (typically asserted)
Experiments show the change affects expressionAllele associated with a disorderAllele shown to affect some function
Annotating Variants
MAPKAPK2DYRK3
Annotating Variants
http://www.ensembl.org/info/docs/variation/vep/index.html
Upload your list of variants, get back Is the variant known?Is the variant predicted to be deleterious to a protein (SIFT, PolyPhen)Overlap with predicted regulatory regionHGVS expressions
http://www.ncbi.nlm.nih.gov/variation/tools/reporter
Annotating Variants
Upload your list of variants, get back Is the variant known?Does the allele have a molecular consequence (change AA, nonsynonymous)HGVS expressionsClinVar informationAvailable Genetic TestsPublications
Take home messages
Lots of methods for sequence alignmentLots of methods for variant calling
Typically developed to use a particular alignerDifferent data sources can affect your annotation