church_ncbivariation2013
DESCRIPTION
NCBI Variation resources for CSHL Genome Access Course.TRANSCRIPT
Deanna M. Church Staff Scientist, NCBI
@deannachurch
Variation Resources at NCBI
Variation Resources Team at NCBIMing WardLon PhanBrad HolmesAnna GlodekMichael KholodovRama MaitiJuliana SampsonDavid ShaoEugene ShekhtmanQiang WangHua Zhang
Donna MaglottMelissa LandrumJennifer LeeGeorge RileyRay TullyCraig WallinShanmuga ChitipirallaDouglas HoffmanWonhee JangKen KatzMichael OvetskyRicardo Villamarin
Tim HefferonJohn LopezJohn GarnerChao Chen
Heidi Rehm, Harvard PartnersChrista Lese Martin, Geisinger Sherri Bale, GeneDxLisa Kalman, CDCBirgit Funke, Harvard PartnersMadhuri Hegde, Emory
Key Collaborators
Figure credit: http://itknowledgeexchange.techtarget.com/
dbSNPdbVar
ClinVarGTR
Quality ControlRef variantsReferences
Annotations
VisualizationTools
Data fromexternal sources
Variant Definitions Variant Annotations
LocationEvidenceMethodology
PhenotypesConsequencesTestsOther Biology
dbSNPdbVar
ClinVarGTRdbSNP
GenBank RefSeq vs
Submitter Owned RefSeq Owned
Redundancy Non-RedundantUpdated rarely Curated
INSDC Not INSDC
BRCA183 genomic records31 mRNA records27 protein records
3 genomic records 5 mRNA records1 RNA record5 protein records
Genome Res. 1999. 9: 677-679http://www.ncbi.nlm.nih.gov/snp
>gnl|dbSNP|ss76078129|allelePos=17|len=33|alleles='A/G’ GTGGCAGAGA CTGAATRAAGGGTTGAC CCAGGG
SNPs defined by flanking position
>gnl|dbSNP|ss3354770|allelePos=499|len=661|alleles='T/C’ actattcaca atagcaaaga cttggaacca acccaaatgt ccaacaatga tagactggat taagaaaatg tggcacatat acaccatgga atactaggca TTCCATTCTA CTGTGCACGA GTCACTGCAA ACTCAAGCAT TTCCAGAGTT CTGAAAGCTC AACTAAGAAC CAAGCCTACT CATTCAACAT CAACACACAC AGCACCCTGA GCGTCCAAAA CCACGGGGGT TATGTTCTAG ACCACAGGAC TGGCTACCTG GCCCTGCTCA AGGCGGCAGG ATCAATGGGC AAGAATGTGC AAGAATTTAC CACAACTCAG CCTTGCTGTG TCAACCACAG AGGCCAAGTA CCCCTAACAC CCAGATAGAG TAATTGTGCC TTACTTCTTT GTTCATTCCC ACCATTACAT TTTGTAAATT GGAACTTCTA GGAGGTTAGA AGGATATGCT GATCAAAAAA AGGGGACATA TTCAAGGAGT GTCCCTGGGT CAACCCTT Y ATTCAGTCTC TGCCACATGT CTAGTAACTG TGAGTGATGG GTGCATCAGT ATAATCCTGA GCCTCCCAAG GTACAGCCTT TCACTACTAT TCATCATATT GGCTAAGGTA TTCATCATAT TGGCTAAGGT ATTCACCAAC AGGGCTCATT TTCTATCAGA CC
ss76078129 (aligns to plus strand)
'A/G’ ss76078129ss3354770 'T/C’
ss3354770 (aligns to minus strand)
ss76078129 (33bp)
ss76078129 (661bp)
rs397515413
rs397515413
NC_000016.9 (chr16)
NW_003871055.3 (chr1 fix patch)
Hydin
Hydin2
Defines variant by location rather than flanking sequence
VCF (Variant Call File)
Clustering microsatellites
rs62645748
To be replacedby a Variation Viewer
To be replacedby a link to ClinVar
rs62645748 (NCBI Homo sapiens annotation run 104)
http://www.ncbi.nlm.nih.gov/dbvar
Submitter Information
Study Information
Sample/Sampleset data
Experiment data
Variants
Contact and author information
Study meta-data (description, PMID, ProjectID, etc)
Sample IDs (if samples are consented)Sampleset ID for pooled samples (case v control sets)
Assay method (sequencing, array)Platform and analysis information
Variant definitions
Variant Call Ambiguitystart stop
Inner start Inner stop
Outer start Outer stop
Probes with decreased signal intensityProbes with expected signal intensity
breakpoint breakpoint
Inner start Inner stop
Variant Call AmbiguityOuter start Outer stop
Fosmid clone (40 Kb +/- 1 Kb)
20Kb Clone has an insertionrelative to the genome
Clone has a deletionrelative to the genome 60 Kb
http://www.ncbi.nlm.nih.gov/clinvar
ClinVar data model and display
SCV
RCV
SCV
RCV
VariantPhenotypeSubmitter
AlleleVariant
Variant Phenotype
SCV SCV SCV SCV
Allele summary• Gene• Variant type• Genomic location• HGVS expressions*• Molecular
consequence*• Links*• Frequency*
Phenotype summary• Names• Links*• Age of onset *• Prevalence *
Interpretation• Significance• Review status *• Accession.version *
* May be provided by NCBI
ClinVar RCV report - Overview
ClinVar RCV report – Summary of assertions
• Each submission is accessioned and versioned• Terms provided by the submitter are mapped to controlled values• Method of review is clearly reported so primary data can be distinguished
from that reported in the literature
ClinVar RCV report - Evidence
Under active review
Allele report – available December
http://www.ncbi.nlm.nih.gov/refseq/rsghttp://www.lrg-sequence.org/
http://www.ncbi.nlm.nih.gov/refseq/rsg
RefSeq Gene
L R
http://www.ncbi.nlm.nih.gov/genome/tools/remap
From Assembly 1 <-> Assembly 2Assembly <-> RefSeqGene/LRGPrimary Assembly <-> Alternate loci
1:215844373
http://www.ncbi.nlm.nih.gov/variations/tools/reporter
This new look coming next month
http://www.ncbi.nlm.nih.gov/variation/view
http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
Calls
Tests
cSRA
ConcordantDiscordantNA
Target audience: Clinical testing labsSubmissions from: Clinical and Research labs
Twelve submitting labs to date
Twelve custom scripts to regularize data
Defined formats here:http://www.ncbi.nlm.nih.gov/projects/variation/get-rm
Platforms
HiSeq 2000 HiSeq 2500 MiSeq Ion Torrent Sanger 4540
5
10
15
20
25
30
NA12878 Tests by Platform
Lab Provided Validation
Variants validated in this sample using another platformVariants validated in another sample using another platformVariants seen in other samples from submitting lab using this platformVariants seen in public data setVariants that are novelVariants that were not assessed
Based on May 2013 Data release
Based on May 2013 Data release
http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
Gene level concordance
Σ (max(xi)/Σ T)i = genotype callX = count per call for each variantT = total genotype calls per variant
Sums are taken over all variants ina gene.Tested regions taken into accountPhasing ignored