sept2016 sv nhgri_repeats

18
Effect of Repeats on the Characterization of Structural Variation Nancy F. Hansen, Ph.D.. September 15, 2016

Upload: genomeinabottle

Post on 17-Jan-2017

52 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: Sept2016 sv nhgri_repeats

Effect of Repeats on the Characterization of Structural Variation

Nancy F. Hansen, Ph.D..

September 15, 2016

Page 2: Sept2016 sv nhgri_repeats

Outline of my talk• Description of “PBRefine” callset

• Refinement of regions by alignment of PacBio assemblies to the human reference (Build37) with nucmer (MUMmer3.23, Kurtz et al., Genome Biology (2004))

• Characterization of SVs using mummerplot dot plots• Role of repeats in curation of structural variation

• Ambiguities in the positions of insertions and deletions due to repeats• “Correct” answer can be dependent on alignment algorithm• Evidence from different technological platforms can point to different

breakpoints

Page 3: Sept2016 sv nhgri_repeats

The PBRefine Pipeline

Extract reference sequence

surrounding variant predictions from

reference

Align reference sequence to PB assembly* with

MUMmer

Count end-to-

end alignment

s

Discard region as repetitive

Align assembly region back to reference with

MUMmer

Characterizevariants

More than 2

2 orfewer

* CA and hybrid Falcon assemblies for all three trio members

Page 4: Sept2016 sv nhgri_repeats

Why long read assemblies for structural variant prediction?

• Continuity• Consensus accuracy

Why not long read assemblies?

• Often assemblers will miss the second haplotype for diploid organisms

Accurate positions, accurate consensus for novel inserted sequences

Inaccurate genotypes for heterozygotes labeled as homozygotes

Page 5: Sept2016 sv nhgri_repeats

How often are variants confirmed?1. Consider only SVs for which there are one or two contigs found

in the assembly2. Require consistent position and variant type

Variant Type

Total Calls

Assembler Variants confirmed in HG002

Variants confirmed in HG003

Variants confirmed in HG004

Overall 6,784 Mt. Sinai/Falcon

1,851 (27.3%)

1,729 (25.5%)

1,708 (25.2%)

NHGRI/CA 1,808 (26.7%)

1,565 (23.1%)

1,545 (22.8%)

Insertions

743 Mt. Sinai/Falcon

171 (23.0%)

157 (21.1%)

156 (21.0%)

NHGRI/CA 155 (20.9%)

134 (18.0%)

130 (17.5%)

Deletions

6,041 Mt. Sinai/Falcon

1,680 (27.8%)

1,572 (26.0%)

1,552 (25.7%)

NHGRI/CA 1,653 (27.4%)

1,431 (23.7%)

1,415 (23.4%)

Page 6: Sept2016 sv nhgri_repeats

(Mummerplot, Adam Philippy)

Simple deletion

Reference

Asse

mbl

ySimple deletion

Dr

Size of deletion=Dr

Page 7: Sept2016 sv nhgri_repeats

Simple deletion

Simple deletion

Page 8: Sept2016 sv nhgri_repeats

Deletion flanked by repeated sequence

Reference

Asse

mbl

yDeletion flanked by repeated sequence

Dr

Dc

Size of deletion=Dr - Dc

Page 9: Sept2016 sv nhgri_repeats

Deletion flanked by repeated sequence

Page 10: Sept2016 sv nhgri_repeats

Simple insertion with duplication of flanking sequence

Simple insertion

Reference

Asse

mbl

y

Page 11: Sept2016 sv nhgri_repeats

Simple insertion with duplication of flanking sequence

Simple insertion

Page 12: Sept2016 sv nhgri_repeats

Insertion of an additional copy of a tandem repeat

Tandem insertion

Page 13: Sept2016 sv nhgri_repeats

Insertion of an additional copy of a tandem repeat

Tandem insertion

Page 14: Sept2016 sv nhgri_repeats

Inversions

Inversion

Page 15: Sept2016 sv nhgri_repeats

Inversions

Inversion

Page 16: Sept2016 sv nhgri_repeats

Deletion of one copy of a tandem inverted repeat

Tandem inverted repeat deletion

Page 17: Sept2016 sv nhgri_repeats

Deletion of one copy of a tandem inverted repeat

Tandem inverted repeat deletion

Page 18: Sept2016 sv nhgri_repeats

• Thank you!

• Jim Mullikin• Adam Phillippy• Sergey Koren• Brian Walenz• Ali Bashir