variant analysis introduction

29
ariant Analysis Introducti Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics

Upload: hogan

Post on 24-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Variant Analysis Introduction. Deanna M. Church Staff Scientist, NCBI. Short Course in Medical Genetics 2013. @ deannachurch. Steve Sherry, NCBI. BAM. FASTQ. BAM. FASTQ. VCF. VCF. VCF. VCF. http:// www.bioplanet.com / gcat. http:// www.ncbi.nlm.nih.gov /variation/tools/1000genomes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Variant Analysis Introduction

Variant Analysis IntroductionDeanna M. Church Staff Scientist, NCBI

@deannachurch Short Course in Medical Genetics 2013

Page 2: Variant Analysis Introduction

sequences alignments genotype likelihoods individual variants1

10

100

1,000

10,000

100,000

size

(gig

abyt

es)

component

1092 genomes (low coverage + exome)

38.2M SNPs3.9M Short Indels and14K Deletions

FASTQBAM

VCF

VCF

FASTQBAM

VCF

VCF

Steve Sherry, NCBI

Page 3: Variant Analysis Introduction

http://www.bioplanet.com/gcat

Page 4: Variant Analysis Introduction

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 5: Variant Analysis Introduction

Variation Databases

http://www.ncbi.nlm.nih.gov/snp

Collection of small nucleotide variation (SNVs) Typically <50 bpSome are polymorphicSome are rareSome are errorsSubmissions clustered to make reference variants (rsIDs)

Page 6: Variant Analysis Introduction

Variation Databases

Blue variants are all T insertionsSubmitters submit in different part of the polyT tractNeed additional analysis to cluster these

Page 7: Variant Analysis Introduction

Variation Databases

Collection of large-scale variationBreakpoint ambiguityComplex variants (chromothripsis)Challenging to compare variants from different methodsNo reference variants (yet)

http://www.ncbi.nlm.nih.gov/dbvar

Page 8: Variant Analysis Introduction

Variant Call Ambiguitystart stop

Inner start Inner stop

Outer start Outer stop

Probes with decreased signal intensityProbes with expected signal intensity

breakpoint breakpoint

Inner start Inner stop

Page 9: Variant Analysis Introduction

Variant Call AmbiguityOuter start Outer stop

Fosmid clone (40 Kb +/- 1 Kb)

20Kb Clone has an insertionrelative to the genome

Clone has a deletionrelative to the genome 60 Kb

Page 10: Variant Analysis Introduction

Variation Databases

http://www.ncbi.nlm.nih.gov/clinvar

Page 11: Variant Analysis Introduction
Page 12: Variant Analysis Introduction

How confident am I that my variant call is correct?

Page 13: Variant Analysis Introduction

http://www.bioplanet.com/gcat

Page 14: Variant Analysis Introduction

Fonseca et al., 2012

Available NGS Alignersalready out of data

Page 15: Variant Analysis Introduction

http://www.bioplanet.com/gcat

Alignment Test

Align back to the source

Simulated ReadsGood: know where the reads goNot so good: hard to simulate real data

Page 16: Variant Analysis Introduction

http://www.bioplanet.com/gcat

Page 17: Variant Analysis Introduction

Variant Calling Test

Page 18: Variant Analysis Introduction

Variant Calling TestTransition /Transversion ratio (Ti/Tv)

A C

GTTransversions

Transitions

Random: 0.5Whole Genome: 2.0 – 2.1Exome: 3-3.5

Page 19: Variant Analysis Introduction

Variant Calling Test

Note: Difficult to test variant calling independentlyfrom the aligner as they are often coupled.

Page 20: Variant Analysis Introduction

Variant Calling Test

Benchmarking on known samples

NA12878NA19240

Page 21: Variant Analysis Introduction

http://www.ncbi.nlm.nih.gov/variation/tools/get-rm

Calls

Tests

cSRA

ConcordantDiscordantNA

Target audience: Clinical testing labsSubmissions from: Clinical and Research labs

Page 22: Variant Analysis Introduction

https://main.g2.bx.psu.edu/

Variant Analysis Pipelines: Galaxy

Page 23: Variant Analysis Introduction

Variant Analysis Pipelines: GalaxyWorkflows

Save themShare them

Can run on Amazon Cloud Large community

Reproducibility

Page 24: Variant Analysis Introduction

Annotating Variants

NC_000001.10:g.170508656G>T

NC_000001.10:g.170508561T>A

NC_000001.10:g.170508573T>C

NC_000001.10:g.170508724T>C

Page 25: Variant Analysis Introduction

Annotating VariantsMolecular Consequences (often predicted)

Damaging amino acid changeAffect a splice siteChange a regulatory feature

Functional Consequences (typically asserted)

Experiments show the change affects expressionAllele associated with a disorderAllele shown to affect some function

Page 26: Variant Analysis Introduction

Annotating Variants

MAPKAPK2DYRK3

Page 27: Variant Analysis Introduction

Annotating Variants

http://www.ensembl.org/info/docs/variation/vep/index.html

Upload your list of variants, get back Is the variant known?Is the variant predicted to be deleterious to a protein (SIFT, PolyPhen)Overlap with predicted regulatory regionHGVS expressions

Page 28: Variant Analysis Introduction

http://www.ncbi.nlm.nih.gov/variation/tools/reporter

Annotating Variants

Upload your list of variants, get back Is the variant known?Does the allele have a molecular consequence (change AA, nonsynonymous)HGVS expressionsClinVar informationAvailable Genetic TestsPublications

Page 29: Variant Analysis Introduction

Take home messages

Lots of methods for sequence alignmentLots of methods for variant calling

Typically developed to use a particular alignerDifferent data sources can affect your annotation