2nd (next) generation sequencing · next generation sequencing (ngs) allows for producing millions...
TRANSCRIPT
![Page 1: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/1.jpg)
2nd (Next) Generation Sequencing
2/2/2018
![Page 2: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/2.jpg)
Why do we want to sequence a genome?
- To see the sequence (assembly)- To validate an experiment (insert or knockout)- To compare to another genome and find variations (cancer, populations)
The problem: We cannot sequence the genome from start to end. We need to sheer the DNA into smaller fragments and sequence smaller pieces.
Sanger sequencing is slow and not high throughput: 13 years for a human genome.
2
![Page 3: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/3.jpg)
Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost.
You can sequence your genome at 30X depth for 1000-1500 USD.
3
Roche 454 Ion Torrent
Illumina HiSeq 2500
![Page 4: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/4.jpg)
4Surya Saha, Boyce Thompson Institute, Ithaca, NY (BTI plant bioinformatics course)
![Page 5: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/5.jpg)
Alex Sanchez, Statistics and Bioinformatics Research Group, Statistics Department, Universitat de Barcelona5
![Page 6: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/6.jpg)
1. Reads come from molecule fragments
2. Read length is the same for an entiredataset (e.g. 101 bases long)
3. Either single or paired-end reads4. Mate reads5. Physical coverage and depth 6. Number of reads7. Duplicates (PCR or sequence)8. Dark matter (PCR cannot find repeats)
6
fragmentpair 1 pair 2
Lex Naderbragt, SeRC Nordic Assembly Workshop in Stockholm, Sweden, May 14th 2014
![Page 7: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/7.jpg)
chr22:11M-12M
7
RepeatMaskerGap
![Page 8: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/8.jpg)
➔ Illumina sequencers can only sequence DNA fragments up to ~300nt long
➔ DNA must be size-selected, usually by gel cut
➔ ~200-300nt band cut, purified, prepared for sequencing
➔ Fragment length follows a normal distribution around target cut size 8
![Page 9: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/9.jpg)
➔ Each sequencing run generates a certain # of total reads
➔ # of reads per sample ~= # total reads/number of samples
➔ # of reads for one sample: library size
➔ Can choose target library size for your instrument based on:
◆ Desired depth
◆ Desired coverage
For more see https://genohub.com/recommended-sequencing-coverage-by-application/
9
![Page 10: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/10.jpg)
Single End
10
Paired End
![Page 11: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/11.jpg)
● Question:“Given a read and a reference sequence, where, if anywhere, in the reference does the read sequence occur?”
● E.g. chr3:2,358,092-2,358,193
● More on this next lecture
11
![Page 12: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/12.jpg)
12
Mapped or
Aligned reads
Genome Locus
Coverage: fraction of genomic
locus covered by at least one read
Depth: number of sequenced bases that map to a
given location
![Page 13: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/13.jpg)
➔ Illumina is now the most common sequencer.➔ It’s error is uniformly distributed (~0.1%) only substitutions (no indels).➔ Older Illumina machines had a fall of quality towards the end of the read.
13
![Page 14: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/14.jpg)
➔ Fragment (insert) size follow a truncated normal distribution
➔ Sequencing depth is defined by number of fragments covering a bp of the DNA. Not the number of reads. Use read depth to refer to that.
➔ Physical coverage is the amount of the genome expected to be covered. However coverage is usually used to mean depth!
➔ Coverage follows a Poisson (Negative Binomial) distribution with lambda=physical depth.
➔ Coverage follows a Poisson distribution.
➔ Read length is a fixed number for Illumina reads. Error is usually higher toward the ends → trimming
14
![Page 15: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/15.jpg)
15
Good coverage Bad coverage
![Page 16: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/16.jpg)
- The machines output files containing short reads in fastq format.- For each read there are 4 lines:
@ read_header commentRead_sequence+ [read_header]Quality_string (in ASCII)
- Scores estimate the probability that a base is called incorrectly. - Q30 means 99.9% accuracy.- Reads are short, we need a “reference sequence” to resolve where they
come from (resequencing).
16
![Page 17: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/17.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
start new read
17
![Page 18: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/18.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
unique read header
17
![Page 19: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/19.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
comments separated by space, could be anything
17
![Page 20: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/20.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
Sequence of the read
17
![Page 21: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/21.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
start quality line
17
![Page 22: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/22.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
repeat read header and comment, not required
17
![Page 23: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/23.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
Quality sequence of the read, in ASCII
17
![Page 24: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/24.jpg)
@SRR1997412.1 1 length=125NTTGTAGCTGAGGAAACTGAGGCTCAGGAGGACAAGTGGCCTGCCAAAGGTACCAGCACTCAGATGGAATGGTTTTGAACTCAGTCCATTTGAACTCAGTTTGAACCTGTCTCTTATACACATCT+SRR1997412.1 1 length=125#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<[email protected] 2 length=125NTATTTAGTCATGTAAGACTCCTTAACCAGCTAACTTAAGAAAGACTTCTAGGACAGAATAGGTTACACTAGTTATAATTTTATCTTTCTTCTACTCACTTGCTTCTCAATTGAAAGAGCGGAAA+SRR1997412.2 2 length=125
Next read
17
![Page 25: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/25.jpg)
Quail, Michael A., et al. "A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers." BMC genomics13.1 (2012): 341. 18
![Page 26: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/26.jpg)
➔ NCBI (https://www.ncbi.nlm.nih.gov/sra)
➔ Illumina basespace (https://basespace.illumina.com/home/index)
➔ Google genomics cloud (https://console.cloud.google.com/genomics/)
➔ Genome In A Bottle (GIAB) (http://jimb.stanford.edu/giab/)
➔ REPOSITIVE (https://discover.repositive.io/datasets/)
➔ GDC (https://portal.gdc.cancer.gov/)
➔ Seven Bridges (https://igor.sbgenomics.com/)
19
![Page 27: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/27.jpg)
27
![Page 28: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/28.jpg)
28
![Page 29: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/29.jpg)
![Page 30: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/30.jpg)
30
![Page 31: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/31.jpg)
31
![Page 32: 2nd (Next) Generation Sequencing · Next Generation Sequencing (NGS) allows for producing millions and billions of short reads in just few days at lower cost. You can sequence your](https://reader030.vdocuments.us/reader030/viewer/2022040716/5e2104810b881639902276c1/html5/thumbnails/32.jpg)
ART : WGS simulator
WGSIM: WGS simulator
PBSIM: PacBio simulator
See more on OMIC tools (https://omictools.com/read-simulators-category )
20