genomics part 1. human genome project g oal is to identify the dna sequence of every gene in humans...
DESCRIPTION
Genomes That Have Been Sequenced RNA virus MS2 – (1976) DNA virus fX174 – 5368 base pairs (bp) (1977) Bacterium H. influenzae – 1.8 million bp (1985) Yeast S. cerevisiae (first eukaryote) – 12 million bp (1997) Fruit fly – 130 million bp (2000) First plant (Arabidopsis thaliana) – 120 million bp (2000) Human – 3 billion bp. “Working draft” announced % complete, 2003; complete DNA of a single individual (2007). Other animals: dog, horse, cat, mouse, chimpanzee, rat, chicken, pufferfish, mosquito, and many more Now about 900 genomes have been sequenced.TRANSCRIPT
GenomicsGenomicsPart 1Part 1
Human Genome Project Goal is to identify the DNA sequence of
every gene in humansGenome all the DNA in one cell of an organism
Will provide scientists with an encyclopedia of information and a better understanding of humans
Genomes That Have Been Sequenced• RNA virus MS2 – (1976)• DNA virus fX174 – 5368 base pairs (bp) (1977)• Bacterium H. influenzae – 1.8 million bp (1985)• Yeast S. cerevisiae (first eukaryote) – 12 million bp (1997)• Fruit fly – 130 million bp (2000)• First plant (Arabidopsis thaliana) – 120 million bp (2000)• Human – 3 billion bp. “Working draft” announced 2000.
99% complete, 2003; complete 2006. DNA of a single individual (2007).
• Other animals: dog, horse, cat, mouse, chimpanzee, rat, chicken, pufferfish, mosquito, and many more
• Now about 900 genomes have been sequenced.
http://www.genomesonline.org/
Sequencing the genomes of many organisms:
• Provides information to help understand many aspects of biology
• Can help us understand human genes, which are usually similar to genes in other organisms
• Provides information on evolution
Sequencing DNA
• Various methods have been used over the years.• Some newer methods involve copying the DNA to
give pieces of different lengths, with the last nucleotide having a fluorescent nucleotide that is a different color for each base.
• When the pieces are separated by size, reading the sequence of colors gives the sequence of the DNA.
• Dideoxy chain-termination method for sequencing DNA
Figure 20.12
DNA(template strand)
Primer Deoxyribonucleotides Dideoxyribonucleotides(fluorescently tagged)T
GTT
3
5
DNA polymerase
CTGACTTCGACAA
P P P P P P
dATP
dCTP
dTTP
dGTP
G
OH
ddATP
ddCTPddTTP
ddGTP
G
H
5
3
5
3
CTGACTTCGACAA
ddCTGTT
ddGCTGTT
ddAGCTGTT
ddAAGCTGTT
ddGAAGCTGTT
ddTGAAGCTGTT
ddCTGAAGCTGTT
ddACTGAAGCTGTT
ddGACTGAAGCTGTT
3DNA (templatestrand)
Labeled strands
Directionof movementof strands
Laser Detector
APPLICATION The sequence of nucleotides in any cloned DNA fragment up to about 800 base pairs in length can be determined rapidly with specialized machines that carry out sequencing reactions and separate the labeled reaction products by length.
TECHNIQUE This method synthesizes a nested set of DNA strands complementary to the original DNA fragment. Each strand starts with the same primer and ends with a dideoxyribonucleotide (ddNTP), a modified nucleotide. Incorporation of a ddNTP terminates a growing DNA strand because it lacks a 3’—OH group, the site for attachment of the next nucleotide (see Figure 16.12). In the set of strands synthesized, each nucleotide position along the original sequence is represented by strands ending at that point with the complementary ddNT. Because each type of ddNTP is tagged with a distinct fluorescent label, the identity of the ending nucleotides of the new strands, and ultimately the entire original sequence, can be determined.
RESULTS The color of the fluorescent tag on each strand indicates the identity of the nucleotide at its end. The results can be printed out as a spectrogram, and the sequence, which is complementary to the template strand, can then be read from bottom to top. (Notice that the sequence here begins after the primer.)
GACTGAAGC
http://en.wikipedia.org/wiki/File:Sanger_sequencing_read_display.gif
Sequencing DNA
• An automated sequencing machine can analyze about 1000 samples in a day, determining sequences of 300 to 1000 bp for each.
• The cost of sequencing per nucleotide has dropped steadily, from about $10 per bp in 1990 to about 1/10 of a cent per bp today.
• In the future it is expected to drop even more, allowing affordable sequencing of individual genomes.
Two Strategies for Genome Sequencing
• Method 1: use genetics to find the locations of many genes on the chromosomes; cut chromosomes into pieces containing these genes; sequence small pieces; assemble the sequences
Cytogenetic mapChromosome bandingpattern and location ofspecific genes byfluorescence in situhybridization (FISH)
Genetic (linkage)mapping Ordering of genetic markers such as RFLPs, simple sequence DNA, and other polymorphisms (about 200 per chromosome)
Physical mappingOrdering of large over-lapping fragmentscloned in YAC and BACvectors, followed byordering of smallerfragments cloned inphage and plasmidvectors
DNA sequencingDetermination ofnucleotide sequence ofeach small fragment andassembly of the partialsequences into the com-plete genome sequence
Chromosomebands
Genes locatedby FISH
Geneticmarkers
Overlappingfragments
…GACTTCATCGGTATCGAACT…
1
2
33
Method 1
Two Strategies for Genome Sequencing
• Method 2: “Shotgun” approach: entire chromosome is cut into random pieces; the pieces are sequenced; computer programs then assemble the resulting very large number of overlapping short sequences into a single continuous sequence.
• Two rival groups used these different strategies in sequencing the human genome.
1
2
3
4
Cut the DNA frommany copies of anentire chromosomeinto overlapping frag-ments short enoughfor sequencing.
Clone the fragmentsin plasmid or phagevectors
Sequence eachfragment
Order thesequences into oneoverall sequencewith computersoftware.
ACGATACTGGT
CGCCATCAGT ACGATACTGGT
AGTCCGCTATACGA
…ATCGCCATCAGTCCGCTATACGATACTGGTCAA…
Method 2
• About 3 billion bp• Current estimates are that the human genome contains about 25,000 genes• Only 1.5% of the genome codes for genes.• The rest is involved in regulation, or is “junk.”• The number of genes is not much different than in many other “simpler”
organisms.
The Human Genome
Genome sequences provide clues to important biological questions
• In genomics: scientists study whole sets of genes and their interactions• Computer analysis of genome sequences helps researchers identify
sequences that are likely to encode proteins• Comparison of the sequences of “new” genes with those of known
genes in other species may help identify new genes