![Page 1: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/1.jpg)
By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin
Sequencing Technologies and Human Genetic Variation
![Page 2: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/2.jpg)
Overview
Introduction
Simulating genomic variation and sequencing
Analyzing and comparing different sequencing technologies
Algorithms for detecting human genetic variation
![Page 3: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/3.jpg)
Introduction
Different people have different mutations in their genomes
A recent study was done (Nature 453, 56-64, 5/1/2008) where 8 human genomes were compared, and 1,695 structural variants were found
![Page 4: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/4.jpg)
Whole-genome shotgun sequencing allows for fast and relatively cheap sequencing of human genomes
New technologies are being developed to allow for accurate detection of human genomic variation
Most of these technologies use short paired reads.
How long should the reads be in order to optimize the process of detecting human genomic variation ?
What algorithms can be used to detect variations in a new individual’s genome ?
![Page 5: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/5.jpg)
Simulating Genomic Variation
Program to take a human genome and add randomly-distributed inversions, insertions, deletions, and SNPs
The number of mutations (and their mean lengths) can be controlled by the user
To simplify, no two mutations can overlap each other (the SNPs are an exception)
![Page 6: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/6.jpg)
Inversions Insertions Deletions
“Intermediate” mutated genome
Original genome
![Page 7: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/7.jpg)
Subtract Deletions
“Intermediate” mutated genome
“Intermediate” mutated genome
![Page 8: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/8.jpg)
SNPs
“Intermediate” mutated genome
(output mutated genome)
![Page 9: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/9.jpg)
Simulating Genomic Sequencing
Program to take a human genome and create paired reads (output read pairs to a file)
The read lengths are all identical, and the separation between reads in a pair is picked randomly based on a normal distribution
The program can simulate sequencing errors when creating the paired reads
![Page 10: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/10.jpg)
Simulating Genomic Sequencing
The user can control the total number of reads, read lengths, the mean of the read separations, and sequencing error rate
![Page 11: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/11.jpg)
Genome to be sequenced
Choose uniformly - distributed random locations
![Page 12: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/12.jpg)
Genome to be sequenced
Create read pair at each location. Choose random direction for each read
L Ld1
L is a constant while d is random (normally distributed)
Read direction
![Page 13: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/13.jpg)
L LRead direction
d2
L LRead direction d3
![Page 14: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/14.jpg)
L Ld2
L Ld3
L Ld1
Resulting paired reads
![Page 15: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/15.jpg)
L Ld2
L Ld3
L Ld1
Paired reads with simulated sequencing errors
![Page 16: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/16.jpg)
program runtime ~window size
1
![Page 17: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/17.jpg)
![Page 18: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/18.jpg)
1
0
![Page 19: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/19.jpg)
50 insertions
100 insertions
500 insertions
![Page 20: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/20.jpg)
![Page 21: Sequencing Technologies and Human Genetic Variation](https://reader036.vdocuments.us/reader036/viewer/2022062323/56816150550346895dd0d7fa/html5/thumbnails/21.jpg)