Transcript
Page 1: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Genomic Sequence Questions

• How are sequence maps of genomes produced? • How is the information in the genome deciphered? • What can comparative genomics reveal about genome structure and evolution? • How does the availability of genomic sequence affect what we can do/ask?

Page 2: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The human nuclear genome viewed as a set of labeled DNA

Chapter 13 Opener

Page 3: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

3072-CHARAC/Page 1953 REAMS OF PAPER TO PRINT OUT DNA=1.8 Meters!AGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGCGGGG AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCA

Page 4: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

“Normal” DNA synthesis without dideoxy terminators

Figure 7-15

Page 5: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The structure of 2’,3’-dideoxynucleotides

Page 6: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The dideoxy sequencing method

Figure 20-16a

Page 7: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The dideoxy sequencing method

Figure 20-16b

Page 8: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 8

Principles of DNA Sequencing 5’

5’ Primer

3’ Template G C A T G C

dATP dCTP dGTP dTTP ddATP

dATP dCTP dGTP dTTP ddCTP

dATP dCTP dGTP dTTP ddTTP

dATP dCTP dGTP dTTP ddCTP

GddC

GCATGddC

GCddA GCAddT ddG

GCATddG

Page 9: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 9

Principles of DNA Sequencing G

C

T

A

+

_

+

_

G C A T G C

short

long

Page 10: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 11: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 11

Multiplexed CE with Fluorescent detection

ABI 3700 96x700 bases

Page 12: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Small Fragments

Large Fragments

Large Fragments

Page 13: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Types of data generated

•  Mated read pairs (Forward and Reverse) •  Insert size •  Chromatograms

– Sequence – Quality scores

•  Assembly (often multiple versions) – Depth (coverage) – Gaps (sequence and physical) – Scaffolds (100 N convention)

Page 14: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 14

Shotgun Sequencing

Isolate Chromosome

ShearDNA into Fragments

Clone into Seq. Vectors Sequence

Page 15: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 15

Shotgun Sequencing

Sequence Chromatogram

Send to Computer Assembled Sequence

Page 16: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca Lecture 3.0 16

Shotgun Sequencing

•  Very efficient process for small-scale (~10 kb) sequencing (preferred method)

•  First applied to whole genome sequencing in 1995 (H. influenzae)

•  Now standard for all prokaryotic genome sequencing projects

•  Successfully applied to D. melanogaster •  Moderately successful for H. sapiens

Page 17: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Genome sequencing is now automated

Figure 13-3

Page 18: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

• Laboratory Intensive

• Physical Maps • Chromosome isolation • “Walking” • Slow, but you always know where you are

• Computationally Intensive • Fast to generate data, use any of many technologies to randomly generate sequence from a variety of sources • Slow to put all the pieces back together

There are two main approaches to genome sequencing

Page 19: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Chromosome walking

Figure 20-13

Page 20: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

A physical map puts clones in order

Figure 13-7a

Page 21: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Strategy for ordered-clone sequencing

Figure 13-8

Page 22: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The logic of creating a sequence map of the genome

Figure 13-2

Page 23: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

End reads from multiple inserts may be overlapped to produce a contig

Figure 13-4

Page 24: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

To sequence a genome, plasmids with different, but known, insert sizes are

required

• Small insert plasmid library ~2kb +/- 100 bp • Medium insert plasmid library ~10kb +/- 500 bp • Large insert library ~50kb +/- 1kb

Note: Regardless of the insert size in the library, all clones are sequenced using “mated” or paired-end sequencing

2kb

10kb

50kb

Page 25: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Strategy for whole-genome shotgun sequencing assembly

Page 26: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

What do you do if you encounter a “GAP” a region that is missing from contigs? (Note: Contig = Contiguous sequence)

Paired-end reads may be used to join two sequence contigs

Page 27: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Anatomy of a WGS Assembly

Page 28: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTGGTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACCAGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGAAACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTCTTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCATGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTCAGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTGTATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGGACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCGCAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGGAACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGCTTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTGAGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAAAGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCTCCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCCGGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGATTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACTATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATCAGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATGGATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTGGCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCGCGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTGGTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACGACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAACACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTGTAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAGTCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGTTTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCAAGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTGTCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGCCTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAAAGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGGCGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGTGAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGTAGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGCGCGCACTGGTTTGCATGTGGCCTGAGAGAGCACAGATCGAAGGTGGGGTGATGTGGCGTCGCTGCAGAGAAACTCCGGCGAAGGCGACAGATAAAGGAGAGTGGAAATCATTGAACAGTGTCGGTCGTCTGTTGTTTCGCAGGGTCCTCGAAGAGTTGCTGTGGTTCATTCGCGGCGACACGAACGCAAACCATCTTTCTGAGAAGGGCGTGAAGGCAAGTCTACGTTGTACCTCTTGTCTCTGCCGAAGCTCAGATGTCTCCACGGCGTTGGTTTCTTTTCGTTTTTGCTTTCGTGGCATTACCATCGAGTCACCACTCATAGTTGCGTGTGTCTACATGTTTTCTAGAACGTCCGTTGTGTTGCCTCGTGGCGACC

Page 29: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The Bioinformatic Pipeline

•  Many software packages, the most widely use free suite is: Phred-Phrap-Consed

•  Quality are obtained and files generated •  Vector sequences are removed •  A repeat library is constructed and sequences

are masked •  Reads are assembled, viewed and assessed •  Primers are designed to close gaps

Page 30: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Genomic Sequence

•  What was the sequencing strategy? •  What is the genome size? Repeat content? •  What “fold” coverage exists? 1X? 10X? •  Has host and vector contamination been

removed?

Page 31: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The Plasmodium falciparum Genome

•  Approx 30 million bp in size, distributed in 14 chromosomes

•  Genome project is an internationally funded effort,(NIH, Wellcome Trust, Burroughs Wellcome Foundation)

•  Sequence is being generated at 3 different sites, (Sanger Centre, Stanford, TIGR)

•  Sequence is nearly complete in terms of total coverage but unfinished in terms of assembly

•  Sequence is nearly 80% A/T in composition

Page 32: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The sequencing Strategy

•  Separate chromosomes on a pulse-field gel •  In some cases, make chromosome-specific

BAC’s or YAC’s •  Shotgun sequence smaller plasmids •  Remove contaminants (vector, E. coli, yeast) •  Assemble “contigs”

Page 33: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

P. falciparum Statistics (3D7)

11

13

10

Page 34: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Add consed picture

Page 35: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly

Chromosome

BACs YACs

Shotgun Clones (Plasmids)

Contiguated Clones

Page 36: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly Problems

Chromosome

BACs YACs

X

Physical gap, no cloned DNA exists PCR Library Walking

Page 37: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly Problems

Sequence Gap, clone exist but no sequence read

X

Page 38: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly Problems

Repetitive DNA elements

Page 39: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The Nature of Unfinished Unannotated Sequence

•  Fragmented •  May contain vector or library host DNA •  May have sequence gaps •  May be mis-assembled •  Genes and features are not identified •  Probably will NEVER be “finished”

Page 40: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 41: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 42: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 43: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 44: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Module 1 Introduction to next-gen sequencing

FRANCIS OUELLETTE

Page 45: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

History of DNA Sequencing

Avery: Proposes DNA as ‘Genetic Material’

Watson & Crick: Double Helix Structure of DNA

Holley: Sequences Yeast tRNAAla

1870

1953

1940

1965

1970

1977

1980

1990

2002

Miescher: Discovers DNA

Wu: Sequences λ Cohesive End DNA

Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation

Messing: M13 Cloning

Hood et al.: Partial Automation

•  Cycle Sequencing •  Improved Sequencing Enzymes •  Improved Fluorescent Detection Schemes

1986

•  Next Generation Sequencing • Improved enzymes and chemistry • New image processing

Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998)

1

15

150

50,000

25,000

1,500

200,000

50,000,000

Efficiency (bp/person/year)

15,000

100,000,000,000 2009

Page 46: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Why are we sequencing? •  Before Next-generation:

–  Reductionist perspective on life –  DNA, RNA, (proteins), (populations), sampling, averages,

consensus •  Problems: sampling, averages, consensus.

•  After Next-generation: – We are still reductionist, but better – Genome sequence and structure – Less cloning/PCR – Single molecules (for some)

Page 47: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Sanger (old-gen) Sequencing

Now-Gen Sequencing

Whole Genome Human (early drafts), model organisms, bacteria, viruses and mitochondria (chloroplast), low coverage

New human (!), individual genome, 1,000 normal, 25,000 cancer matched control pairs, rare-samples

RNA cDNA clones, ESTs, Full Length Insert cDNAs, other RNAs

RNA-Seq: Digitization of transcriptome, alternative splicing events, miRNA

Communities Environmental sampling, 16S RNA populations, ocean sampling,

Human microbiome, deep environmental sequencing, Bar-Seq

Other Epigenome, rearrangements, ChIP-Seq

Page 48: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Differences between the various platforms:!

•  Nanotechnology used."•  Resolution of the image analysis."•  Chemistry and enzymology."•  Signal to noise detection in the software"•  Software/images/file size/pipeline"•  Cost $$$"

Page 49: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Next Generation DNA Sequencing Technologies Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

Human Genome 6GB == 6000 MB

Req’d Coverage 6 12 30

3730 454 Illumina

bp/read 600 400 2X75

reads/run 96 500,000 100,000.000

bp/run 57,600 0.5 GB 15 GB

# runs req’d 625,000 144 12

runs/day 2 1 0.1 Machine days/human genome

312,500 (856 years)

144 120

Cost/run $48 $6,800 $9,300

Total cost $15,000,000 $979,200 $111,600

Page 50: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

URLs

•  http://454.com/ •  http://illumina.com/ •  http://appliedbiosystems.com/

•  http://pacificbiosciences.com/ •  http://helicosbio.com

Page 51: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 52: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 53: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Top Related