church gia13
TRANSCRIPT
Converting from Analog to DigitalIntegrating the historical archive of human variation in an NGS world
Deanna M. Church Staff Scientist, NCBI
@deannachurch Genome Informatics Alliance 2013
AcknowledgementsGeT-RM
Lisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor Ananiev
ClinVarAlex Astashyn Shanmuga ChitipirallaDouglas Hoffman Wonhee Jang Brandi KattmanMelissa LandrumJennifer LeeAdriana Malheiro Wendy RubinsteinGeorge Riley Amanjeev Sethi Ricardo Villamarin
ISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn Garner John Lopez
GRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute
Variation
Phenotypes
Phenotypes
Variant Call (dbVar submission)
Array data files
Clinical Labs
QC AnalysisCuration
Data regularization
dbGaP
Controlled Access
Web accessFTP AccessAssembly
Remapping
dbVar
ISCA
UCSC
DGV
DGVa
NCBIApproved Users
BioProject ID
ClinVardbGaP projects needa sponsoring NIH institute to run the DAC (NICHD)
ASDAtrial Septum Defect Autism Spectrum Disorder
??
No HPO 1,814
HPO6,770
Riggs et al, 2012
~2 HPO terms/case(max of 16)
The Human Phenotype Ontology
http://www.ncbi.nlm.nih.gov/medgen
Variation
sequences alignments genotype likelihoods individual variants1
10
100
1,000
10,000
100,000
size
(gi
gaby
tes)
component
1092 genomes (low coverage + exome)
38.2M SNPs3.9M Short Indels and14K Deletions
FASTQBAM
VCF
VCF
FASTQBAM
VCF
VCF
Steve Sherry, NCBI
http://www.bioplanet.com/gcat
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
http://genomereference.org
GRCh37
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35 Unlocalized in NCBI36/GRCh37 Finished in GRCh38
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Doggett et al., 2006
Kidd et al, 2007 APOBEC cluster
Part of chr22 assembly
Alternate locus for chr22
White: InsertionBlack: Deletion
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Human Resolved for GRCh38
http://genomereference.org
GRCh38 is coming(September, 2013)
http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
Calls
Tests
cSRA
ConcordantDiscordantNA
Target audience: Clinical testing labsSubmissions from: Clinical and Research labs
Reporting Standards: Not standard
Twelve submitting labs to date
Twelve custom scripts to regularize data
Despite defined formats here:http://www.ncbi.nlm.nih.gov/projects/variation/get-rm
What are the issues?
Reporting Standards: Not standard
What are the issues?
Better Example: QUAL*
*Required sixth column in VCF file
10.01-18357.112.6-21.20-21.220-3070Allele string34.79-44624.03None20-46006
c.1956+15C>CT
Reporting Standards: Not standard
What are the issues?
Lab reporting a single nucleotide change (C->T) het change as:
c.1956+15C>T[=]
HGVS standards says this should be reported as:
Lab reporting a single nucleotide change (A->G) hom change as:
c.670+9A>GHGVS standards says this should be reported as:
c.[670+9A>G];[670+9A>G]
Defining a reference sequence: Data validation
NM_007171.3:c.942T>CReported as:
Base in transcript is a ‘C’ not a ‘T’
http://www.ncbi.nlm.nih.gov/clinvar
Standardize data: what is the variation?607008.0001
985A>G985A>G (K304E)A985GACADM, LYS304GLUK304EK304E (985 A->G)K304E (K329E)K304E onlyK329EK329E(985A>G)LYS304GLUMutation c.985A>G (p.K304E)c.985A>Gc.985A>G (p.K304E)c.985A>G (p.Lys304Gluincludes: K304E (985A>G)p.K304Ep.Lys329Glupreviously known as p.Lys329GluAnalysis of ACADM 985A>G mutation
NC_000001.10:g.76226846A>GNG_007045.1:g.41804A>GNM_000016.4:c.985A>GNP_000007.1:p.Lys329Glurs77931234
Miki et al, 1994