church gia13

30
verting from Analog to Dig ng the historical archive of human variation in an Deanna M. Church Staff Scientist, NCBI @deannachurch Genome Informatics Alliance 2

Upload: deanna-church

Post on 24-Jun-2015

155 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Church gia13

Converting from Analog to DigitalIntegrating the historical archive of human variation in an NGS world

Deanna M. Church Staff Scientist, NCBI

@deannachurch Genome Informatics Alliance 2013

Page 2: Church gia13

AcknowledgementsGeT-RM

Lisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor Ananiev

ClinVarAlex Astashyn Shanmuga ChitipirallaDouglas Hoffman Wonhee Jang Brandi KattmanMelissa LandrumJennifer LeeAdriana Malheiro Wendy RubinsteinGeorge Riley Amanjeev Sethi Ricardo Villamarin

ISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn Garner John Lopez

GRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute

Page 3: Church gia13
Page 4: Church gia13

Variation

Phenotypes

Page 5: Church gia13

Phenotypes

Page 6: Church gia13

Variant Call (dbVar submission)

Array data files

Clinical Labs

QC AnalysisCuration

Data regularization

dbGaP

Controlled Access

Web accessFTP AccessAssembly

Remapping

dbVar

ISCA

UCSC

DGV

DGVa

NCBIApproved Users

BioProject ID

ClinVardbGaP projects needa sponsoring NIH institute to run the DAC (NICHD)

Page 7: Church gia13

ASDAtrial Septum Defect Autism Spectrum Disorder

??

No HPO 1,814

HPO6,770

Riggs et al, 2012

~2 HPO terms/case(max of 16)

The Human Phenotype Ontology

Page 8: Church gia13

http://www.ncbi.nlm.nih.gov/medgen

Page 9: Church gia13

Variation

Page 10: Church gia13

sequences alignments genotype likelihoods individual variants1

10

100

1,000

10,000

100,000

size

(gi

gaby

tes)

component

1092 genomes (low coverage + exome)

38.2M SNPs3.9M Short Indels and14K Deletions

FASTQBAM

VCF

VCF

FASTQBAM

VCF

VCF

Steve Sherry, NCBI

Page 11: Church gia13

http://www.bioplanet.com/gcat

Page 12: Church gia13

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 13: Church gia13

http://genomereference.org

GRCh37

Page 14: Church gia13

Dennis et al., 2012

1q32 1q21 1p21

1p21 patch alignment to chromosome 1

Page 15: Church gia13

Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35 Unlocalized in NCBI36/GRCh37 Finished in GRCh38

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

Doggett et al., 2006

Page 16: Church gia13
Page 17: Church gia13

Kidd et al, 2007 APOBEC cluster

Part of chr22 assembly

Alternate locus for chr22

White: InsertionBlack: Deletion

Page 18: Church gia13

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 19: Church gia13

Human Resolved for GRCh38

http://genomereference.org

Page 20: Church gia13

GRCh38 is coming(September, 2013)

Page 21: Church gia13

http://www.ncbi.nlm.nih.gov/variation/tools/get-rm

Calls

Tests

cSRA

ConcordantDiscordantNA

Target audience: Clinical testing labsSubmissions from: Clinical and Research labs

Page 22: Church gia13

Reporting Standards: Not standard

Twelve submitting labs to date

Twelve custom scripts to regularize data

Despite defined formats here:http://www.ncbi.nlm.nih.gov/projects/variation/get-rm

What are the issues?

Page 23: Church gia13

Reporting Standards: Not standard

What are the issues?

Better Example: QUAL*

*Required sixth column in VCF file

10.01-18357.112.6-21.20-21.220-3070Allele string34.79-44624.03None20-46006

Page 24: Church gia13

c.1956+15C>CT

Reporting Standards: Not standard

What are the issues?

Lab reporting a single nucleotide change (C->T) het change as:

c.1956+15C>T[=]

HGVS standards says this should be reported as:

Lab reporting a single nucleotide change (A->G) hom change as:

c.670+9A>GHGVS standards says this should be reported as:

c.[670+9A>G];[670+9A>G]

Page 25: Church gia13

Defining a reference sequence: Data validation

NM_007171.3:c.942T>CReported as:

Base in transcript is a ‘C’ not a ‘T’

Page 26: Church gia13

http://www.ncbi.nlm.nih.gov/clinvar

Page 27: Church gia13

Standardize data: what is the variation?607008.0001

985A>G985A>G (K304E)A985GACADM, LYS304GLUK304EK304E (985 A->G)K304E (K329E)K304E onlyK329EK329E(985A>G)LYS304GLUMutation c.985A>G (p.K304E)c.985A>Gc.985A>G (p.K304E)c.985A>G (p.Lys304Gluincludes: K304E (985A>G)p.K304Ep.Lys329Glupreviously known as p.Lys329GluAnalysis of ACADM 985A>G mutation

NC_000001.10:g.76226846A>GNG_007045.1:g.41804A>GNM_000016.4:c.985A>GNP_000007.1:p.Lys329Glurs77931234

Page 28: Church gia13
Page 29: Church gia13

Miki et al, 1994

Page 30: Church gia13