sept2016 sv nhgri_repeats

Post on 17-Jan-2017

52 Views

Category:

Health & Medicine

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Effect of Repeats on the Characterization of Structural Variation

Nancy F. Hansen, Ph.D..

September 15, 2016

Outline of my talk• Description of “PBRefine” callset

• Refinement of regions by alignment of PacBio assemblies to the human reference (Build37) with nucmer (MUMmer3.23, Kurtz et al., Genome Biology (2004))

• Characterization of SVs using mummerplot dot plots• Role of repeats in curation of structural variation

• Ambiguities in the positions of insertions and deletions due to repeats• “Correct” answer can be dependent on alignment algorithm• Evidence from different technological platforms can point to different

breakpoints

The PBRefine Pipeline

Extract reference sequence

surrounding variant predictions from

reference

Align reference sequence to PB assembly* with

MUMmer

Count end-to-

end alignment

s

Discard region as repetitive

Align assembly region back to reference with

MUMmer

Characterizevariants

More than 2

2 orfewer

* CA and hybrid Falcon assemblies for all three trio members

Why long read assemblies for structural variant prediction?

• Continuity• Consensus accuracy

Why not long read assemblies?

• Often assemblers will miss the second haplotype for diploid organisms

Accurate positions, accurate consensus for novel inserted sequences

Inaccurate genotypes for heterozygotes labeled as homozygotes

How often are variants confirmed?1. Consider only SVs for which there are one or two contigs found

in the assembly2. Require consistent position and variant type

Variant Type

Total Calls

Assembler Variants confirmed in HG002

Variants confirmed in HG003

Variants confirmed in HG004

Overall 6,784 Mt. Sinai/Falcon

1,851 (27.3%)

1,729 (25.5%)

1,708 (25.2%)

NHGRI/CA 1,808 (26.7%)

1,565 (23.1%)

1,545 (22.8%)

Insertions

743 Mt. Sinai/Falcon

171 (23.0%)

157 (21.1%)

156 (21.0%)

NHGRI/CA 155 (20.9%)

134 (18.0%)

130 (17.5%)

Deletions

6,041 Mt. Sinai/Falcon

1,680 (27.8%)

1,572 (26.0%)

1,552 (25.7%)

NHGRI/CA 1,653 (27.4%)

1,431 (23.7%)

1,415 (23.4%)

(Mummerplot, Adam Philippy)

Simple deletion

Reference

Asse

mbl

ySimple deletion

Dr

Size of deletion=Dr

Simple deletion

Simple deletion

Deletion flanked by repeated sequence

Reference

Asse

mbl

yDeletion flanked by repeated sequence

Dr

Dc

Size of deletion=Dr - Dc

Deletion flanked by repeated sequence

Simple insertion with duplication of flanking sequence

Simple insertion

Reference

Asse

mbl

y

Simple insertion with duplication of flanking sequence

Simple insertion

Insertion of an additional copy of a tandem repeat

Tandem insertion

Insertion of an additional copy of a tandem repeat

Tandem insertion

Inversions

Inversion

Inversions

Inversion

Deletion of one copy of a tandem inverted repeat

Tandem inverted repeat deletion

Deletion of one copy of a tandem inverted repeat

Tandem inverted repeat deletion

• Thank you!

• Jim Mullikin• Adam Phillippy• Sergey Koren• Brian Walenz• Ali Bashir

top related