genomic sequence questions - uwi st....

53
Genomic Sequence Questions •How are sequence maps of genomes produced? •How is the information in the genome deciphered? •What can comparative genomics reveal about genome structure and evolution? •How does the availability of genomic sequence affect what we can do/ask?

Upload: truongngoc

Post on 19-Mar-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Genomic Sequence Questions

• How are sequence maps of genomes produced? • How is the information in the genome deciphered? • What can comparative genomics reveal about genome structure and evolution? • How does the availability of genomic sequence affect what we can do/ask?

Page 2: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The human nuclear genome viewed as a set of labeled DNA

Chapter 13 Opener

Page 3: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

3072-CHARAC/Page 1953 REAMS OF PAPER TO PRINT OUT DNA=1.8 Meters!AGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGCGGGG AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCA

Page 4: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

“Normal” DNA synthesis without dideoxy terminators

Figure 7-15

Page 5: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The structure of 2’,3’-dideoxynucleotides

Page 6: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The dideoxy sequencing method

Figure 20-16a

Page 7: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The dideoxy sequencing method

Figure 20-16b

Page 8: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 8

Principles of DNA Sequencing 5’

5’ Primer

3’ Template G C A T G C

dATP dCTP dGTP dTTP ddATP

dATP dCTP dGTP dTTP ddCTP

dATP dCTP dGTP dTTP ddTTP

dATP dCTP dGTP dTTP ddCTP

GddC

GCATGddC

GCddA GCAddT ddG

GCATddG

Page 9: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 9

Principles of DNA Sequencing G

C

T

A

+

_

+

_

G C A T G C

short

long

Page 10: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 11: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 11

Multiplexed CE with Fluorescent detection

ABI 3700 96x700 bases

Page 12: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Small Fragments

Large Fragments

Large Fragments

Page 13: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Types of data generated

•  Mated read pairs (Forward and Reverse) •  Insert size •  Chromatograms

– Sequence – Quality scores

•  Assembly (often multiple versions) – Depth (coverage) – Gaps (sequence and physical) – Scaffolds (100 N convention)

Page 14: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 14

Shotgun Sequencing

Isolate Chromosome

ShearDNA into Fragments

Clone into Seq. Vectors Sequence

Page 15: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Lecture 3.0 15

Shotgun Sequencing

Sequence Chromatogram

Send to Computer Assembled Sequence

Page 16: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca Lecture 3.0 16

Shotgun Sequencing

•  Very efficient process for small-scale (~10 kb) sequencing (preferred method)

•  First applied to whole genome sequencing in 1995 (H. influenzae)

•  Now standard for all prokaryotic genome sequencing projects

•  Successfully applied to D. melanogaster •  Moderately successful for H. sapiens

Page 17: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Genome sequencing is now automated

Figure 13-3

Page 18: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

• Laboratory Intensive

• Physical Maps • Chromosome isolation • “Walking” • Slow, but you always know where you are

• Computationally Intensive • Fast to generate data, use any of many technologies to randomly generate sequence from a variety of sources • Slow to put all the pieces back together

There are two main approaches to genome sequencing

Page 19: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Chromosome walking

Figure 20-13

Page 20: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

A physical map puts clones in order

Figure 13-7a

Page 21: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Strategy for ordered-clone sequencing

Figure 13-8

Page 22: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

The logic of creating a sequence map of the genome

Figure 13-2

Page 23: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

End reads from multiple inserts may be overlapped to produce a contig

Figure 13-4

Page 24: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

To sequence a genome, plasmids with different, but known, insert sizes are

required

• Small insert plasmid library ~2kb +/- 100 bp • Medium insert plasmid library ~10kb +/- 500 bp • Large insert library ~50kb +/- 1kb

Note: Regardless of the insert size in the library, all clones are sequenced using “mated” or paired-end sequencing

2kb

10kb

50kb

Page 25: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Strategy for whole-genome shotgun sequencing assembly

Page 26: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

What do you do if you encounter a “GAP” a region that is missing from contigs? (Note: Contig = Contiguous sequence)

Paired-end reads may be used to join two sequence contigs

Page 27: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Anatomy of a WGS Assembly

Page 28: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTGGTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACCAGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGAAACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTCTTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCATGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTCAGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTGTATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGGACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCGCAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGGAACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGCTTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTGAGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAAAGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCTCCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCCGGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGATTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACTATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATCAGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATGGATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTGGCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCGCGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTGGTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACGACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAACACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTGTAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAGTCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGTTTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCAAGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTGTCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGCCTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAAAGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGGCGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGTGAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGTAGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGCGCGCACTGGTTTGCATGTGGCCTGAGAGAGCACAGATCGAAGGTGGGGTGATGTGGCGTCGCTGCAGAGAAACTCCGGCGAAGGCGACAGATAAAGGAGAGTGGAAATCATTGAACAGTGTCGGTCGTCTGTTGTTTCGCAGGGTCCTCGAAGAGTTGCTGTGGTTCATTCGCGGCGACACGAACGCAAACCATCTTTCTGAGAAGGGCGTGAAGGCAAGTCTACGTTGTACCTCTTGTCTCTGCCGAAGCTCAGATGTCTCCACGGCGTTGGTTTCTTTTCGTTTTTGCTTTCGTGGCATTACCATCGAGTCACCACTCATAGTTGCGTGTGTCTACATGTTTTCTAGAACGTCCGTTGTGTTGCCTCGTGGCGACC

Page 29: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The Bioinformatic Pipeline

•  Many software packages, the most widely use free suite is: Phred-Phrap-Consed

•  Quality are obtained and files generated •  Vector sequences are removed •  A repeat library is constructed and sequences

are masked •  Reads are assembled, viewed and assessed •  Primers are designed to close gaps

Page 30: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Genomic Sequence

•  What was the sequencing strategy? •  What is the genome size? Repeat content? •  What “fold” coverage exists? 1X? 10X? •  Has host and vector contamination been

removed?

Page 31: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The Plasmodium falciparum Genome

•  Approx 30 million bp in size, distributed in 14 chromosomes

•  Genome project is an internationally funded effort,(NIH, Wellcome Trust, Burroughs Wellcome Foundation)

•  Sequence is being generated at 3 different sites, (Sanger Centre, Stanford, TIGR)

•  Sequence is nearly complete in terms of total coverage but unfinished in terms of assembly

•  Sequence is nearly 80% A/T in composition

Page 32: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The sequencing Strategy

•  Separate chromosomes on a pulse-field gel •  In some cases, make chromosome-specific

BAC’s or YAC’s •  Shotgun sequence smaller plasmids •  Remove contaminants (vector, E. coli, yeast) •  Assemble “contigs”

Page 33: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

P. falciparum Statistics (3D7)

11

13

10

Page 34: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Add consed picture

Page 35: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly

Chromosome

BACs YACs

Shotgun Clones (Plasmids)

Contiguated Clones

Page 36: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly Problems

Chromosome

BACs YACs

X

Physical gap, no cloned DNA exists PCR Library Walking

Page 37: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly Problems

Sequence Gap, clone exist but no sequence read

X

Page 38: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Contig Assembly Problems

Repetitive DNA elements

Page 39: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

The Nature of Unfinished Unannotated Sequence

•  Fragmented •  May contain vector or library host DNA •  May have sequence gaps •  May be mis-assembled •  Genes and features are not identified •  Probably will NEVER be “finished”

Page 40: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 41: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 42: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 43: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 44: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Module 1 Introduction to next-gen sequencing

FRANCIS OUELLETTE

Page 45: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

History of DNA Sequencing

Avery: Proposes DNA as ‘Genetic Material’

Watson & Crick: Double Helix Structure of DNA

Holley: Sequences Yeast tRNAAla

1870

1953

1940

1965

1970

1977

1980

1990

2002

Miescher: Discovers DNA

Wu: Sequences λ Cohesive End DNA

Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation

Messing: M13 Cloning

Hood et al.: Partial Automation

•  Cycle Sequencing •  Improved Sequencing Enzymes •  Improved Fluorescent Detection Schemes

1986

•  Next Generation Sequencing • Improved enzymes and chemistry • New image processing

Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998)

1

15

150

50,000

25,000

1,500

200,000

50,000,000

Efficiency (bp/person/year)

15,000

100,000,000,000 2009

Page 46: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Why are we sequencing? •  Before Next-generation:

–  Reductionist perspective on life –  DNA, RNA, (proteins), (populations), sampling, averages,

consensus •  Problems: sampling, averages, consensus.

•  After Next-generation: – We are still reductionist, but better – Genome sequence and structure – Less cloning/PCR – Single molecules (for some)

Page 47: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Sanger (old-gen) Sequencing

Now-Gen Sequencing

Whole Genome Human (early drafts), model organisms, bacteria, viruses and mitochondria (chloroplast), low coverage

New human (!), individual genome, 1,000 normal, 25,000 cancer matched control pairs, rare-samples

RNA cDNA clones, ESTs, Full Length Insert cDNAs, other RNAs

RNA-Seq: Digitization of transcriptome, alternative splicing events, miRNA

Communities Environmental sampling, 16S RNA populations, ocean sampling,

Human microbiome, deep environmental sequencing, Bar-Seq

Other Epigenome, rearrangements, ChIP-Seq

Page 48: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

Differences between the various platforms:!

•  Nanotechnology used."•  Resolution of the image analysis."•  Chemistry and enzymology."•  Signal to noise detection in the software"•  Software/images/file size/pipeline"•  Cost $$$"

Page 49: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Next Generation DNA Sequencing Technologies Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

Human Genome 6GB == 6000 MB

Req’d Coverage 6 12 30

3730 454 Illumina

bp/read 600 400 2X75

reads/run 96 500,000 100,000.000

bp/run 57,600 0.5 GB 15 GB

# runs req’d 625,000 144 12

runs/day 2 1 0.1 Machine days/human genome

312,500 (856 years)

144 120

Cost/run $48 $6,800 $9,300

Total cost $15,000,000 $979,200 $111,600

Page 50: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence

Introduction to next-gen sequencing bioinformatics.ca

URLs

•  http://454.com/ •  http://illumina.com/ •  http://appliedbiosystems.com/

•  http://pacificbiosciences.com/ •  http://helicosbio.com

Page 51: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 52: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence
Page 53: Genomic Sequence Questions - UWI St. Augustinesta.uwi.edu/fst/dms/icgeb/documents/SequencingTechnologies.pdf · Introduction to next-gen sequencing bioinformatics.ca Genomic Sequence