nextgeneraonsequencing - cbs · presentation1.pptx author: mette voldby larsen created date:...
TRANSCRIPT
Next Genera*on Sequencing
Me3e Voldby Larsen PhD, associate professor
Outline
• DNA sequencing – Exercise: Assembling NGS data
• From DNA to protein – Exercise: Predic>ng genes using Prodigal
• PathogenFinder – Exercise: Iden>fying immunologically relevant target proteins using PathogenFinder
Deoxyribose, phosphate, base (Adenine, Thymine, Guanine, Cytosine)
Double helix: Via hydrogen bonds, A pairs with T and C with G.
The two strings are an>parallele.
DNA Deoxyribose Nucleic Acid
Organiza>on of DNA
Ribosome
History of sequencing 1977 Fred Sanger develops sequencing by enzyma>c synthesis using chain-‐
termina>ng inhibitors. First genera+on sequencing. 1982 GenBank is established as a collec>on of all publicly available DNA sequences. 1990 The Human Genome Project is launched, planned to take 15 years. 1995 The first genome of a free-‐living organism, the bacterium Haemophilus
influenza (1.8 Mb) (PMID: 7542800). 1996 The first genome of a eukaryote, Saccharomyces cerevisiae (12.1 MB) (PMID:
8849441). 1996 Pyrosequencing is developed. Next (second) genera+on sequencing. 1998 The first genome of an animal, the nematode Caenorhabdi+ elegans (97 Mb)
(PMID: 9851916). 2001 The first dra\s of the human genome (3 Gb) (PMID: 11237011 and PMID:
11181995). 2005 Launch of the GS20 sequencer (454/Roche) using pyrosequencing. 2006 Launch of the Genome Analyzer (Solexa/Illumina) using cyclic reversible
terminator sequencing. Next (second) genera+on sequencing. 2010 Ion torrent bench top sequencing machines. Next (second) genera+on
sequencing. 2011 Pacific Biosciences RS machine capable of single-‐molecule sequencing. Third
genera+on sequencing.
Sanger sequencing
No -OH at the 3’ position – additional nucleotides cannot be added.
Template strand
Primer
DNA with unknown sequence is mixed with primers, DNA polymerase, dNTPs and flourescent ddNTPs.
The ddNTPs are each bound to a flourescent dye.
Synthesis stops when a flourescent ddNTP is added. Fragments of different lengths are made, each ending with a ddNTP.
The fragments flouresces with different color identifying the ddNTP that terminated the fragment.
Next genera>on sequencing Pyrosequencing (Roche/454)
Cyclic reversible terminator sequencing
Comparison of sequencing methods
Example of data file from next gen sequencer -‐ Short (raw) reads in FASTQ format
@M10_0139:1:2:18915:1321#ATCACG/1!TATCAAGAAAGATTTTAACAGCATTGACTCTGTTATCGAGTTTCATTTTAAACATAGTTTCCAGTGGT!+M10_0139:1:2:18915:1321#ATCACG/1!_bbeeeccgfgecgiiiihfhchiiiiiiiiihhfhhh^dghhhhf_fffghhhhhhacgeeghgbb] !@M10_0139:1:2:18915:1321#ATCACG/2!AGTTCATAGTGACAAGGTAATATTTGTCAAATTATATCGACCTAAAACGGTAGGATATATAACAAAAT!+M10_0139:1:2:18915:1321#ATCACG/2!a__eceeeeggffhihe^bhfiifh_edeg_agbgd]dd`g`fgdhedffaedadhhchhfhiicfhX !@M10_0139:1:2:12256:1321#ATCACG/1!ACGGGTGAACTGTACGGCATCGAAGCCCTTGCGCGCTGGCACGATCCCCAGCATGGTCATGCCCCCTC !+M10_0139:1:2:12256:1321#ATCACG/1!___`c_c`egge[bfghdeghfhhhhhfiii_ffhhN`ghhfddbcddadcddbccb_bbbcbc^aac !@M10_0139:1:2:12256:1321#ATCACG/2!AATCCGGAAAAGCCCGTACCAAAATCATCTACCGATAAGCCCACGCCCATATCACGCAGGATGAATCG !+M10_0139:1:2:12256:1321#ATCACG/2!a_ZcccWHO_bgadgc_WbaceZefda^f`egd`HO[ega\G\b`F_dggeca_cad`Y]^b__bKYZ!
.
.
.
Reference assembly
GACCC CTGG
AAAAA TG
GC
CGAG
CGT
Contig
GCTGG
GCG
A TGCT
ACGCT
Coverage/depth
De novo assembly
CTGGGC
AAAAA
GTG
GC
AGCG
T
GCTGAA AACGCT
Exercise
Assembling NGS data