mar2013 performance metrics working group

29
David Jenkins on behalf of Justin H. Johnson Director of Bioinformatics Performance Metric & Figures of Merit

Upload: genomeinabottle

Post on 10-May-2015

343 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Mar2013 Performance Metrics Working Group

David Jenkins on behalf of

Justin H. Johnson

Director of Bioinformatics

Performance Metric & Figures of Merit

Page 3: Mar2013 Performance Metrics Working Group

Who are we?

• Justin Johnson

– Managing Director of Services

– Director of Bioinformatics

– 10 Years at JCVI before EdgeBio

– Project Manager - Archon Genomics XPrize

• EdgeBio

– CLIA Lab

– Illumina Hiseq & Miseq, Ion Proton & PGM

Page 4: Mar2013 Performance Metrics Working Group

Overview – GIAB as I See It.

• Which genomes?

• How do we sequence them?

• How do we analyze them?

• How do we enable their usage?

Page 5: Mar2013 Performance Metrics Working Group

Overview Experimental Data

• Sequence Data & Variation

• Metadata

Database

• RM vs. Reference

• Every Base

Visualize and Filter

• Browser over DB

• Query by Experiment Data

Compare and Report

• Single Genome Browser

• ValidationProtocol.org

Refine and Feedback

Experimental Data = Combination of Prep / Sequencing / Analysis

Bioinformatics Data Integration / Representation

Page 6: Mar2013 Performance Metrics Working Group

Experimental Data • GetRM Model for Collection

– http://www.ncbi.nlm.nih.gov/projects/variation/get-rm/

• Preparation – Link to published prep protocol – ROI in Bed/GFF/GBK Format

• Sequencing – Platform Information (Minimally - Name) – Chemistry (Minimally - Version)

• Analysis – Link to published analysis protocol or best practices – Read Data (fastq, sra, hdf5, others) – Alignment/Assembly Data (bam)

• Minimal Tag Set TBD – Variation (VCF or gVCF)

• Minimal Tag Set TBD in INFO field of VCF or define external XSD • https://sites.google.com/site/gvcftools/home/about-gvcf

Page 7: Mar2013 Performance Metrics Working Group

gVCF

https://sites.google.com/site/gvcftools/home/about-gvcf

Page 8: Mar2013 Performance Metrics Working Group

Meta Data

• All required fields in VCF 4.1

• Others (Examples) – AA : ancestral allele

– AC : allele count in genotypes, for each ALT allele, in the same order as listed

– AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes

– AN : total number of alleles in called genotypes

– BQ : RMS base quality at this position

– CIGAR : cigar string describing how to align an alternate allele to the reference allele

– DB : dbSNP membership

– DP : combined depth across samples, e.g. DP=154

– END : end position of the variant described in this record (for use with symbolic alleles)

– H2 : membership in hapmap2

– VALIDATED : validated by follow-up experiment

• Reference Block Implementations

• Handle Indel Conflicts and Resolution

• Genotype Quality for non-variant sites (GQX)

Page 9: Mar2013 Performance Metrics Working Group

Database

• Store Each Base + Meta of RM versus Reference for each Experiment from gVCF

– Distinguish missing versus homozygous reference

– Include copy number and phasing when available, not required

• Engine that drives front end visualization (Genome Browser)

• Build on GetRM/NCBI Database Work

Page 10: Mar2013 Performance Metrics Working Group

Visualize and Filter

• Build on GetRM/NCBI Browser Work

• Single RM -> Many Experiments

• Not all metadata will be visual, but most/all will be filterable

• Filter data to generate ROI or VOI – Canned: i.e. Intersect of All Platforms + Analysis, All OMIM SNPs,

Clinical Cert SNV List, etc

– Dynamic: allowing people to explore prep, sequence, or analysis bias

• Slice, Dice, Export VOI to compare and reporting SW

• Allow user defined tracks

• By product is community educational resource – I have a ROI for a test and want to know what platform, prep, exome

kit version, etc covers it best. What do I do?

Page 11: Mar2013 Performance Metrics Working Group

Parallel Database, Filter Effort (Gemini) Quinlan Lab at UVA - https://github.com/arq5x/gemini

• Gemini – simple, flexible, and powerful framework for exploring genetic variation

• Basic browser capabilities being developed

• Flexible custom annotation and metadata addition to DB

• Leverage the expressive power of SQL while overcoming fundamental challenges associated with using databases for very large datasets

Page 12: Mar2013 Performance Metrics Working Group

Gemini

http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf

Page 13: Mar2013 Performance Metrics Working Group

Gemini

http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf

Page 14: Mar2013 Performance Metrics Working Group

Gemini

http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf

Page 15: Mar2013 Performance Metrics Working Group

Compare and Reporting

• Take in ROI or VOI from the visualize and filter stage

• Take in user defined VOI or VOI + ROI

• Leverage SW under ValidationProtocol.org to generate reports and files including BNLT:

– Summary of completeness, accuracy, phasing

– Discordant variants in VCF

– Concordant variants in VCF

– Phasing errors in VCF

• Provide intuitive way to feed these resultants in downstream analysis SW (VarinatViz, IO8) or back into browser (User Defined Track)

Page 16: Mar2013 Performance Metrics Working Group

• $10 million prize competition to showcase whole genome sequencing technology

• Award to the team(s) who can most completely, accurately and affordably sequence 100 human genomes in 30 days or less

• Competing Teams will sequence the genomes of the 100 centenarians who have evaded the usual diseases of aging such as heart disease, diabetes, cancer and Alzheimer’s

Page 17: Mar2013 Performance Metrics Working Group

AGXP Validation Study Overview

Page 18: Mar2013 Performance Metrics Working Group

AGXP Validation Study Analysis

• 2 Major Phases using NA19239 and NA12878

–Develop Reference Standards • Fosmid Reconstruction, Variation Discovery

• Technology Comparison and Bias Removal

–Develop Performance Metrics • Software Development

• Help labs use the data

Page 19: Mar2013 Performance Metrics Working Group

Compare and Report

• The validationprotocol.org website provides a simple way for anyone to compare their variant calls against the public reference genomes.

• Encourages submission and analysis in public tools like Galaxy through transparent interoperability with GenomeSpace.

Page 20: Mar2013 Performance Metrics Working Group

Compare and Report

Page 21: Mar2013 Performance Metrics Working Group

Compare and Report

Page 22: Mar2013 Performance Metrics Working Group

Compare and Report

Page 23: Mar2013 Performance Metrics Working Group

Follow On

• Export different categories (Concordant/Discordant/Phasing Error) variants to VariantViz IO8

• Visualize Quality, Allele Frequencies, Depth, etc Info to detect patterns in and between variant categories

Page 24: Mar2013 Performance Metrics Working Group

Concordant SNPs

Potential false positives

Page 25: Mar2013 Performance Metrics Working Group
Page 26: Mar2013 Performance Metrics Working Group
Page 27: Mar2013 Performance Metrics Working Group

Xprize Team • Justin H. Johnson and Team - EdgeBio

• Brad Chapman Harvard: automated high-throughput analysis pipelines with custom visualization and processing tools

• Gabor Marth Boston College: Read mapping, single-nucleotide and insertion-deletion polymorphism detection, and discovery of structural variants.

• Aaron Quinlin University of Virginia: structural variation (SV)

• Granger Sutton JCVI: Oversight Committee

• Victor Jongeneel University of Illinois and NCSA: Oversight Committee

• Larry Kedes UCLA: Oversight Committee

Page 28: Mar2013 Performance Metrics Working Group

EdgeBio Team

• LAB

– Joy Adigun

– Ryan Mease

– Jennifer Sheffield

– Aaron Johnson

– Jackie Jackson

• IFX

– David Jenkins

– Anju Varadarajan

– Vani Rajan

– Karthik Kota

– Phil Dagasto

• Adam Bennett

• Isabel Llorente

Page 29: Mar2013 Performance Metrics Working Group

More info available at

http://bit.ly/agxpval

http://www.genomeinabottle.org

Thank You!