![Page 1: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/1.jpg)
Reconstruction of Haplotype Spectra from NGS Data
Ion MandoiuUTC Associate Professor in Engineering InnovationDepartment of Computer Science & Engineering
University of Connecticut
![Page 2: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/2.jpg)
Haplotype Spectra Reconstruction
• Given NGS reads, reconstruct:– Full length sequences– Sequence frequencies
• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction
![Page 3: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/3.jpg)
Single Individual Haplotyping• Somatic cells are diploid, containing two nearly
identical copies of each autosomal chromosome– Heterozygous loci found by mapping reads to reference
genome– Long haplotype fragments can be generated by
sequencing fosmid pools [Duitama et al. 2012]
![Page 4: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/4.jpg)
RefHap Algorithm [Duitama et al. 12]
• Reduce the problem to Max-Cut• Solve Max-Cut• Build haplotypes according with the cut
Locus 1 2 3 4 5f1 * 0 1 1 0
f2 1 1 0 * 1
f3 1 * * 0 *
f4 * 0 0 * 1
3f1
1
1 -1
-1f4
f2
f3
h1 00110h2 11001
Chr. 22, 32k SNPs, 14k fragments
![Page 5: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/5.jpg)
Haplotype Spectra Reconstruction
• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies
• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction
![Page 6: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/6.jpg)
Transcriptome Reconstruction Challenge: Alternative Splicing
[Griffith and Marra 07]
![Page 7: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/7.jpg)
1 742 3 65t1 :
1 743 65t2 :
1 742 3 5t3 :
t4 : 1 743 5
1 742 3 65
![Page 8: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/8.jpg)
• Map the RNA-Seq reads to genome
• Construct Splice Graph - G(V,E)– V : exons– E: splicing events
• Generate candidate transcripts– Depth-first-search (DFS)
• Filter candidate transcripts– Fragment length distribution (FLD)– Integer programming
Genome
TRIPTransciptome Reconstruction using Integer Programming
![Page 9: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/9.jpg)
How to filter?
• Select the smallest set of putative transcripts that yields a good statistical fit between– empirically determined during library preparation– implied by “mapping” read pairs
1 3
1 2 3
500
300
200 200 200
200 200
Series1
Mean : 500; Std. dev. 50
Series1
Mean : 500; Std. dev. 50
t3t2 t1
![Page 10: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/10.jpg)
Allele Specific Expression
![Page 11: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/11.jpg)
Haplotype Spectra Reconstruction
• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies
• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction
![Page 12: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/12.jpg)
RNA Virus ReplicationHigh mutation rate (~10-4)
Lauring & Andino, PLoS Pathogens 2011
![Page 13: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/13.jpg)
Shotgun reads starting
positions distributed
~uniformly
Amplicon reads
have predefined
start/end positions
covering fixed
overlapping windows
Shotgun vs. Amplicon Reads
![Page 14: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/14.jpg)
Reconstruction from Shotgun Reads: ViSpA
Read Error Correction
Read Alignment
Preprocessing of Aligned Reads
Read Graph ConstructionContig AssemblyFrequency
Estimation
Shotgun reads
Quasispecies sequences w/ frequencies
![Page 15: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/15.jpg)
Reconstruction from Amplicon Reads: VirA
Reference in FASTAformat
Error-correctedSAM/BAMRead data
Estimate Amplicons
Max-Bandwidth Paths
Viral population variants with frequencies
Amplicon Read Graph
Frequency Estimation
![Page 16: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/16.jpg)
• K amplicons represented by K-layer read graph
• Vertices distinct reads⇔• Edges reads with consistent overlap⇔• Vertices have count function c(v)
Amplicon Read Graph
![Page 17: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/17.jpg)
Read Graph Transformation• Heuristic to reduce edges in dense graphs
• Replace bipartite cliques with star subgraphs
![Page 18: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/18.jpg)
Challenges
• Scalability• Exploit inherent sparsity of biological instances
• E.g., exact scaffolding algorithm using non-serial
dynamic programming based on SPQR trees
• Flexibility• Long (noisy) reads + short
• Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq
• Quantifying reconstruction uncertainty• Compute intensive, e.g., bootstrapping
+
+
+
--
+
-
-
![Page 19: Reconstruction of Haplotype Spectra from NGS Data](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816384550346895dd46ad7/html5/thumbnails/19.jpg)
Acknowledgements
Jorge DuitamaSahar Al SeesiMazhar KahnRachel O’Neill
Alexander ArtyomenkoAdrian CaciulaNicholas MancusoSerghei MangulBassam TorkAlex ZelikovskyIrina AstrovskayaPavel Skums