lecture 2. dna sequencing and structural genomics
DESCRIPTION
LECTURE 2. DNA Sequencing and Structural Genomics. Sequencing with DNA Polymerases and Chain Terminators (Sanger sequencing). Synthesize new DNA using cloned DNA as template. Depends on hybridization of a primer to the DNA template. 1980 Nobel Prize. Fred Sanger. - PowerPoint PPT PresentationTRANSCRIPT
LECTURE 2. DNA Sequencing andStructural Genomics
Sequencing with DNA Polymerases and Chain Terminators (Sanger sequencing)
1980 Nobel Prize
Synthesize new DNA using cloned DNA as template. Depends on hybridization of a primer to the DNA template.
Fred Sanger
Manual Sanger Sequencing
Enzyme 3' exoProcessivity
*rate of
polymerase#
Klenow (+) 10-50 45
Reverse Transcriptase
(-) 10 5
T7 sequenase**
(-) 2000-3000 300
Taq (-) 7500 35-100
Properties of DNA Pols used for Sequencing
Major Problem with Sanger sequencing:
DNA secondary structures form with ss DNA. Intramolecular Watson-Crick Base pairs
Causes Stops and Compressions=Gel Artifacts (bases are closer together than normal spacing) This is especially a problem in GC rich regions (which form stable "hairpins").
STRATEGIES for DNA SEQUENCING
-DIRECTED SEQUENCINGStart at ends of cloned DNA molecule using UNIVERSAL PRIMER SITES present in the vector sequence. Design a
new sequencing primer based on the first round of sequence to continue the job: PRIMER WALKING
USED FOR SMALLER DNAs: cDNAs: <10 KB
-RANDOM SEQUENCINGFragment the cloned DNA randomly and subclone pieces
into vector. Sequence all clones using UNIVERSAL PRIMER. Use a computer to align sequence overlaps and
determine the entire sequence of the starting DNA
USE FOR LONG DNAs: BACS, etc. (GENOMIC)
PRIMER WALKING
STRATEGIES for DNA SEQUENCING
-DIRECTED SEQUENCINGStart at ends of cloned DNA molecule using UNIVERSAL PRIMER SITES present in the vector sequence. Design a new sequencing primer based on the first round of sequence to continue the job: PRIMER WALKING
USED FOR SMALLER DNAs: cDNAs: <10 KB
-RANDOM SEQUENCINGFragment the cloned DNA randomly and subclone pieces into vector. Sequence all clones using UNIVERSAL PRIMER. Use a computer to align sequence overlaps and determine the entire sequence of the starting DNA
USE FOR LONG DNAs: BACS, etc. (GENOMIC)
RANDOM SEQUENCING
BAC clone
4100 genes 6000 genes 18,000 genes 14,000 genes
35-70,000 genes?
50 genes
Genomes are LARGE and impractical to sequence by manual methods
BOTTLENECKS IN LARGE SCALE AUTOMATED SEQUENCING:
-sub-cloning of target DNA into appropriate vectors-preparation of DNA of quality suitable for sequencing-setting up sequencing reactions-pouring and loading sequencing gels-GEL ELECTROPHORESIS ARTIFACTS (due to secondary DNA structures).
ALTERNATIVES to gels for separating sequencing products:
-sequencing by HYBRIDIZATION-Mass Spectrometry Matrix-Assisted Laser Desorption/Ionization Time of Flight Mass Spectrometry (MALDI-TOFMS)-capillary electrophoresis
50-100 uM
40 cm
1. Ultra-thin, long gels can be run at very high voltages2kV to 10kV: short runs, theoretically good separation2. Samples can be directly loaded from 96-well plate format by electrophoresis: easy to automate3. Use non-polymerized gel media: can be automatically removed and replaced in between runs.don't have to take apart and make sequencing gels4. Capillaries can be clustered: new automated model has 4 X 16 (96) arrays.
+-
The ABI 3700 Automated Sequencer: Quick, Cheap Genome Sequencing
Emission Spectra of dyes used with the ABI3700
Front View
Fully Automated System that Requires 5 min of manpower per run:
Example: Let's say we that the 9 kV run gives us 600 bp reliably for run
4 runs (10 hr day) X 96 X 600= 230,400 bp per day!
Human Genome Project Goals: Three Orderly Steps to Complete the Genome Sequence1) Complete Genetic MapThe 1999 map is based on 42,000 STSs and ESTs (representing 30,000 genes) and 1102 informative microsattelite markers http://www.ncbi.nlm.nih.gov/genemap/
Currently, ~4.8 million SingleNucleotide Polymorphisms are(SNPs) are mapped.
1 SNP every 1200, on average
~25,000 associated with genes
2) Physical Map is largely assembled
BAC Contigs for the Human Genome
3) As of 25 may, 1999 , ~19 % of the genome sequenced (+63% in “draft”) http://www.ncbi.nlm.nih.gov/genome/seq/
Goal: to finish entire sequence by 2003Cost: $3 billion (orginal goal was 2005)
Shotgun Sequencing the Human Genome:>90% of the genome has been completedsince Spring 2000 by CeleraVenter JC, Adams MD, Sutton GG, Kerlavage AR, Smith HO, Hunkapiller M 1998. Shotgun sequencing of the human genome. Science 1 5:1540-1542.
Human Genome Plan is ordered: genetic map, contig, completely sequence the BACs that make up the contigsShotgun Approach: (already proven successful for many bacterial genomes and in 2000 for drosophila): -just start sequencing random clones without bothering to order them -sequence them only from the ends (not completely)-sequence enough random clones this way and you will cover the entire genome-use sophisticated computer programs to put the genome back together
Covering the genome. A 100-kbp portion of the genome showing expected clone coverage needed for shotgun sequencing.
Shotgun Approach: Randomly sequence clones from different types of libraries
35 billion bases to be sequencedTime: less than 1 yearCost: ~$250 million
April 2000: Celera finishes sequencing phase of the project: 11X coverage of the genome of four-five individuals September, 2000: Initial assembly of the human genome completed (using sequences in public databases as well)October 2000: Sequencing phase of mouse genome project completed; ~9 billion base pairs.
Problems with this approach:
-only 90-95% of genome can be sequenced: many gaps for others to fill-Sequence will not be annotated and may notbe released in a timely fashion: in fact, youneed to subscribe to Celera for this infoCost: $450,000 minimum per University-Are they doing this just to get a jump on patenting genes? Ethical problems??
Who’s DNA was sequenced? Craig Venter (Celera)
Oct 18, 2001 , ~47 % sequenced (+51% in “draft”)
What about the Genome Consortium?Sept, 2000 , ~24 % sequenced (+66% in “draft”)
May, 1999 , ~19 % sequenced (+63% in “draft”)
Genome Watch
23 Oct 2002
Draft 5.8%
Finished
92.8%
Total 98.6%
Was Shotgun Sequencing of the Human Genome Successful?
The Celera assembly dependedOn BAC tiles in the public database;gaps in the Celera sequence were filled with sequence obtained from the public database
Waterston RH, Lander ES, Sulston JE. 2002. On the sequencing of the human genome. PNAS USA 99 :3712-371.
NO!
Myers EW, Sutton GG, Smith HO, Adams MD, Venter JC.2002. On the sequencing and assembly of the human genome.Proc Natl Acad Sci U S A.99 :4145-4146
SORELOSERS!
The Truth:Both Approaches are RequiredTo Sequence Large Genomes!
Where are we now?Estimates Range that 2-20% of the genome still remains to be sequenced
Completion of the genome is likely still 2-5 years awayGaps in BACs to fill; “unclonable” sequences?
For example, still controversy over how many genes encoded inthe human genome 30,000 or 70,000?
Chr 21 BAC/gene map Chr 15 BAC/gene map
See http://www.ncbi.nih.gov/cgi-bin/Entrez/hum_srch