chapter 3. the beginnings of genomic biology – molecular ......chapter 3. the beginnings of...

41
Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology – molecular genetics 3.1. DNA is the Genetic Material 3.2. Watson & Crick – The structure of DNA 3.3. Chromosome structure 3.3.1. Prokaryotic chromosome structure 3.3.2. Eukaryotic chromosome structure 3.3.3. Heterochromatin & Euchromatin 3.4. DNA Replication 3.4.1. DNA replication is semiconservative 3.4.2. DNA polymerases 3.4.3. Initiation of replication 3.4.4. DNA replication is semidiscontinuous 3.4.5. DNA replication in Eukaryotes. 3.4.6. Replicating ends of chromosomes 3.5. Transcription 3.5.1. Cellular RNAs are transcribed from DNA 3.5.2. RNA polymerases catalyze transcription 3.5.3. Transcription in Prokaryotes 3.5.4. Transcription in Prokaryotes - Polycistronic mRNAs are produced from operons 3.5.5. Beyond Operons – Modification of expression in Prokaryotes 3.5.6. Transcriptions in Eukaryotes 3.5.7. Processing primary transcripts into mature mRNA 3.6. Translation 3.6.1. The Nature of Proteins 3.6.2. The Genetic Code 3.6.3. tRNA – The decoding molecule 3.6.4. Peptides are synthesized on Ribosomes 3.6.5. Translation initiation, elongation, and termnation 3.6.6. Protein Sorting in Eukaryotes

Upload: others

Post on 23-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics

Contents

3. The beginnings of Genomic Biology – molecular genetics 3.1. DNA is the Genetic Material 3.2. Watson & Crick – The structure of DNA 3.3. Chromosome structure

3.3.1. Prokaryotic chromosome structure 3.3.2. Eukaryotic chromosome structure 3.3.3. Heterochromatin & Euchromatin

3.4. DNA Replication 3.4.1. DNA replication is semiconservative 3.4.2. DNA polymerases 3.4.3. Initiation of replication 3.4.4. DNA replication is semidiscontinuous 3.4.5. DNA replication in Eukaryotes. 3.4.6. Replicating ends of chromosomes

3.5. Transcription 3.5.1. Cellular RNAs are transcribed from DNA 3.5.2. RNA polymerases catalyze transcription 3.5.3. Transcription in Prokaryotes 3.5.4. Transcription in Prokaryotes - Polycistronic mRNAs are

produced from operons 3.5.5. Beyond Operons – Modification of expression in

Prokaryotes 3.5.6. Transcriptions in Eukaryotes 3.5.7. Processing primary transcripts into mature mRNA

3.6. Translation 3.6.1. The Nature of Proteins 3.6.2. The Genetic Code 3.6.3. tRNA – The decoding molecule 3.6.4. Peptides are synthesized on Ribosomes

3.6.5. Translation initiation, elongation, and termnation 3.6.6. Protein Sorting in Eukaryotes

Page 2: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 1

Frederick Griffith (1879-1941)

As the development of classical genetics proceeded from Mendel in 1866 through the early part of the 20th century the understanding that Mendel’s factors that produced traits were carried on chromosomes, and that there were infinite ways that the genetic information from 2 parents could assort in each generation to produce the genetic variety demanded by Darwin’s theories on “origin of species” on which natural selection acted. This gave rise to the study of gene behavior of more complex traits and an understanding of genes in populations.

At the same time a quest for the material inside a cell, perhaps a subcomponent of a chromosome, that carried the genetic instructions to make organisms what they are was ongoing.

In 1928, a British scientist, Frederick Griffith, published his work showing that live, rough, avirulent bacteria could be transformed by a “principle” found in dead, smooth, virulent bacteria into smooth, virulent bacteria. This meant that the bacterial traits of rough versus smooth and avirulence versus viru-

virulence were controlled by a substance that could carry the phenotype from dead to live cells.

Griffith’s observations on Pneu-mococcus were controversial to say the least, and inspired a spirited debate and much experimentation directed at proving whether the “transforming principle” was protein or nucleic acid, the two main components of

CHAPTER 3. THE BEGINNINGS OF GENOMIC

BIOLOGY – MOLECULAR GENETICS (RETURN)

3.1. DNA IS THE GENETIC MATERIAL. (RETURN)

Page 3: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 2

chromosomes identified early in the 20th century, well before Griffith’s experiments. This debate continued until Oswald Avery and his colleagues, Colin MacLeod, and Maclyn McCarty published their work in 1944 unequivocally showing that DNA was, in fact, Griffith’s transforming principle. This completely revolutionized genetics and is considered the founding observation of molecular genetics.

Oswald T. Avery Colin MacLeod Maclyn McCarty

In 1953, more evidence supporting DNA being the genetic material resulted from the work of Alfred Hershey and Martha Chase on E. coli infected with bacteriophage T2. In their experiment, T2 proteins were labeled with the 35S radioisotope, and T2 DNA was labeled with was labeled with the 32P radioisotope. Then the labeled viruses were mixed separately with the E. coli host, and after a short time, phage attachment was

disrupted with a kitchen blender, and the location of the label determined. The 35S-labeled protein was found outside the infected cells, while the 32P-labeled DNA was inside the E. coli, indicating that DNA carried the information needed for viral infection.

Once it was established that DNA was the genetic material carrying the instructions for life so to speak, attention turned to the question of “How could a molecule carry genetic information?” The key to that became obvious with a detailed understanding of the structure of the DNA molecule, which was developed by two scientists a Cambridge University, James Watson and Francis Crick.

Figure 3.1. An electron micrograph of bacteriophage T2 (left), and a sketch showing the structures present in the virus (right). The head consists of a DNA molecule surrounded by proteins, while the core, sheath, and tail fibers are all made of protein. Only the DNA molecule enters the cell.

Page 4: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 3

The basic laboratory observations that lead to the formulation of a structure for DNA did not involve biologists. Rather Irwin Chargaff, an analytical, organic chemist, and physicists, Rosalind Franklin and Maurice Wilkins made the laboratory observations that led to the solution of the structure of DNA.

Chargaff determined that there were 4 different nitrogen bases found in DNA molecules; the purines, adenine (A) and guanine G), and the pyrimidines, cytosine (C) and thymine (T), and he purified DNA from a number of different sources so he could examine the quantitative relationships of A, T, G, and C. He con-cluded that in all DNA molecules, the mole-percentage of A was nearly equal to the mole-percentage of T, while the mole-percentage of G was nearly equal to the mole-percentage of C. Alternatively, you could state this as the mole-percentage of pyri-midine bases equaled the mole-percentage of purine bases. These observations became known as Chargaff’s rules.

Rosalind Franklin a young x-ray crystallographer working in the laboratory of Maurice Wilkins at Cambridge University used a technique known as x-ray diffraction to generate images of DNA molecules that showed that DNA had a helical structure with repeating structural elements every 0.34 nm and every 3.4 nm along the axis of the molecule.

Rosalind Franklin Maurice Wilkins

Figure 3.2. X-ray diffraction image of DNA molecule showing helical structure with repeat structural elements.

3.2. WASON & CRICK – THE STRUCTURE OF DNA. (RETURN)

Page 5: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 4

These astute observations allowed Watson and Crick to synthesize together a 3-dimentional structure of a DNA molecule with all of these essential features. This structure was published in 1953, and immediately generated much excitement, culminating in a Nobel Prize in Physiology and Medicine, in 1962 awarded to Franklin, Wilkins, Watson, and Crick.

The key elements of this structure are:

• Double helical structure – each helix is made from the alternating deoxyribose sugar and phosphate groups derived from deoxynuclotides, which are the monomeric units that are used to make up polymeric nucleic acid molecules. Each nucleotide in each chain consists of a nitrogen base of either the purine type (adenine or guanine) or the pyrimidine type (cytosine or thymidine) attached to the 1’-position of 2’-deoxyribose sugar, and a phosphate group, esterified by a phospho-ester bond to the 5’-position of the sugar.

Figure 3.3. Watson & Crick’s DNA structure. Their model consisted of a double helicical structure with the sugars and phosphates making the two hlices on the outside of the structure. The sugars were held together by 3’-5’-phosphodiester bonds. The bases pair on the inside of the molecule with A always pairing with T, and G always pairing with C. This pairing leads to Chargaff’s observations about bases in DNA. Figure 3.4. Structures of purine and pyrimidine bases in DNA,

and structure of 2’-deoxyribose sugar.

Page 6: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 5

Figure 3.7. Strand of DNA showing the 3’,5’-phosphodi-ester bonds holding nucleotides together.

• Each of the 2 polynucleotide chains of the double helix are held together by hydrogen bonds beween the adenosines in one strand and the thymidines in the other strand, and between the guanosines in one strand hydrogen bonded to the cytosines in the other strand.

• The nucleotides are held together in sequence order along the length of the polynucleotide chain by 3’-5’-phosphodiester bonds, and the strands demonstrate a polarity as the 5’-OH at one end of a polynuc-leotide strand is distinct from the 3’-OH at the other end of the strand. Often, but not always, the 5’-strand end will have a phosphate group attached.

Figure 3.5. The building bocks of nucleic acids are nucleotides and nucleosides. Any base together with a deoxyribose sugar forms a deoxyribonucleoside, while if the sugar is ribose a ribonucleoside is formed (not shown). Addition of a phosphate on the 5’ position of the sugar froms nucleotides from nucleosides.

Figure 3.6. Base pairing between A and T involves two hydrogen bonds, and pairing between G and C involves 3 hydrogen bonds. This means that the forces holding strands together in G=C base pair-rich regions are stronger than in A=T base pair-rich regions.

Page 7: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 6

In order to get a uniform diameter for the molecule and have proper alignment of the nucleotide pairs in the middle of the strands, the strands must be orient-ed in antiparallel fashion, i.e. with the strand polarity of each strand of the double helix going in the opposite direction (one strand is 3’-> 5’ whie the other is 5’ -> 3’).

The truly elegant aspect of this solution to DNA structure produces a spacing of exactly 0.34 nm between nucleotide base pairs in the molecule, and there are 10 base-pairs per complete turn of the helix. This corresponds precisely with Rosalind Franklin’s x-ray diffraction measurements of repeating units of 0.34 nm and 3.4 nm, and with her measurements of 2 nm for the diameter of the double helix.

It is also noteworthy that Watson and Crick suggested that the structure they proposed produced a clear method for the two strands of the DNA molecule to duplicate and maintain the fidelity of the sequence of bases along each chain as DNA was synthesized inside a cell. Thus, providing a mechanism for the fidelity of information transfer from cell generation to cell generation.

Page 8: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 7

The DNA inside a cell seldom exists as a simple, “naked” DNA molecule. Because DNA molecules are long linear molecules with an overall negative charge deriving from the phosphate groups making up the helices, positively charged ionic species within cells are attracted to these molecules. These positively charged molecules can be small ions such as K+ and Mg++, or they can be larger positively charged proteins, and/or other larger molecular species. These ionic interactions play an important role producing the folding and packaging that is required to keep the large linear molecule packaged inside the microscopic cell.

In the case of proteins it is clear that the positively charged proteins can interact both by general ionic interactions, but they can also ingeract in sequence specific ways; i.e. specific proteins only bind to specific sequences of bases in the DNA strand. Thus, the types of molecular interactions that ionic substances, particularly proteins, have with DNA molecules play important roles in determining the expression of information that is carried in the DNA molecule. It will be obvious as we proceed through our study of genomic biology, that such DNA-protein interactions are as critical to describing “genetic information” as are the

base sequences of the DNA molecules themselves. This was obvious well before we btained the first genomic DNA sequences, but has become even more apparent and significant now that we have the DNA sequences of many genomes. Thus, genomic biology is not merely the study of DNA nucleotide sequences, but involves the study of the structure of the genetic material such as chromosomes and chromatin.

3.3.1. Prokaryotic chromosome structure (return)

Most Prokaryotes (e.g. bacteria) have a single, circular chromosome although some have more than one chromosome, and some have linear chromosomes rather than circular chromosomes. Certainly, the most well studied bacteria, e.g. Escherichia coli, has a single circular chromosome that can exist in either a relaxed or supercoiled state.

Supercoiling involves breaking one of the 2 circular helical strands and then rotating the broken ends either in the direction of the helix (+ supercoil) or in the opposite direction of the helix (- supercoil). As supercoiling is added to the DNA molecule it becomes “tightly” coiled (see Figure 3.8.), and therefore can be compacted more easily. This permits the packaging of the large DNA molecule into the relatively small cells in which it must exist and function.

3.3. CHROMOSOME STRUCTURE. (RETURN)

Page 9: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 8

Additional packaging results from the supercoiled

DNA being carefully looped onto a scaffold of proteins leading to an organized intracellular structure that can be easily accessible but also keep DNA from twisting and being damaged du-ring normal cellular processes.

3.3.2. Eukaryotic chromosome structure (return)

In general Eukaryotes have much larger genomes than do Archea and other Prokaryotes. This difference in relative genome size compared to the complexity of the organism does not appear to be as true for species within the Eukaryota. This lack of correlation between organismal complexity and genome size (called the C-vlaue) is referred to as the C-value paradox (Table 3.1) The C-value paradox results from great variation in the nature of DNA in different Eukaryotes. Some eukaryotes contain substantial amounts of DNA that appears to have limited or at have a gene density in their genomes resembling the Prokaryotes (e.g. the yeasts and malarial parasite in the table above). The

Figure 3.9. Diagram of DNA organizational structure in prokaryotes. Supercoiled DNA is looped and attached to scaffold proteins.

Figure 3.8. An E. coli cell lysed open showing the expanse of its DNA molecule (left). Note that this entire molecule must be folded and packaged inside the cell in the picture. On the right are two electron micrographs showing circular DNA molecules either in a relaxed (top) or supercoiled (bottom) state.

Page 10: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 9

majority of Eukaryotes fall somewhere in between these extremes, but are highly variable in their DNA contents. For now we need to appreciate that this variation in DNA content and type appears to have a relationship to chromosome structure. But the nature of this relationship will be considered further once we learn more about DNA sequencing and examine fully sequenced genomes.

In eukaryotes, there are multiple levels of chromosomal organization that we will need to consider. Observations using powerful electron

Figure 3.10. Electron micrograph showing the nucleosome structure of Eukaryotic DNA. The DNA molecule is barely visible, but connects the beads of proteins that the DNA wraps around creating the appearance of beads on a string.

Page 11: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 10

microscopes demonstrated that in Eukaryotes, the DNA molecules in chromosomes are organized like beads on a string. These structures have subsequently been named nucleosomes. Investigation of the nature of nucleosomes has shown that they are made from several types basic proteins (positively charged) found in cells called histone proteins.

The basic nucleosome consists of a combination of histones H2A, H2B, H3, and H4. DNA is subsequently wrapped around these structures producing the bead-like appearance observed in the electron microscope. Once the nucleosomes are formed, they can condense or decondense based on interaction with another histone, histone H1.

During prophase of mitosis or meiosis, the nucleosome structure of chromatin further condenses into a so-called solenoid structure, which is approxim-ately 30 nm in diameter. This solenoid from is not visible in a light microscope but can be viewed in an electron microscope. This appears to be the form DNA assumes when chromosomes condense during during mitosis, but the DNA is not as accessible for use in the cell as it is during interphase, when the chromatin is decondensed.

Figure 3.11. Nucleosomes are formed when DNA wraps around a histone complex. Nucleosomes can exist in either a more condensed or a decondenses state depending ot the state of the genetic material in a cell.

Page 12: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 11

The solenoid structures are subsequently looped and fastened to chromosome scaffold proteins generating a structure that is visible in a light microscope that we know as a chromosome.

While this may seem like an elaborate structure involving several sets of structural proteins, such a

structure appears to be required to allow for the appropriate assembly and assortment of the genetic material during the cell cycle in mitosis. Without this structural organization, it

is likely that cellular DNA would become a hopeless tangle, and cellular reproduction would be severely hampered, and would likely require too much time and effort to ultimately be successful.

3.3.3. Heterochromatin & Euchromatin. (return)

The cell cycle affects DNA packing into chromatin with chromatin condensing for mitosis and meiosis and then decondensing during interphase while being most dispersed at S-phase. However, cytogeneticists have observed that there can be two differently staining forms of chromatin, called Euchromatin and

Figure 3.12. Condensation of chromatin leads to the careful packaging of DNA into so called solenoid sturctures. These structures ultimately form chromosomes.

Figure 3.13. Loop-folding of the 30 nm solenoid structure yields a packaged DNA that is visible in a ligh microscope in each Eukaryotic chromosome.

Page 13: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 12

heterochromatin. Euchromatin condenses and decondenses with the cell cycle. Euchromatin accounts for most of the active genome in dividing cells and bears most of the protein-coding DNA sequences. Heterochromatin remains condensed throughout the cell cycle and is believed to be relatively inactive. There are two types of heterochromatin based on activity, ie. constitutive heterochromatin that is tightly condensed in virtually all cell types and facultative heterochromatin which varies between cell types and/or developmental stages.

Other methods of characterizing types of DNA suggest that there are sequences of DNA that can occur in may copies in the genome. These types of sequences can be repeated only once in the genome or they can occur 10’s of thousans of times or more in genomes. Sequences can be categorized into:

• Unique-sequence DNA, present in one or a few copies per genome.

• Moderately repetitive DNA, present in a few to 105 copies per genome

• Highly repetitive DNA, present in about 105–107 copies per genome

Observations about repetitive DNA sequences as described above have been known for decades, and initially it was shown that Prokaryotic DNA was mostly

unique-sequence DNA, and Prokaryotes had little or no repetitive sequences. However, Eukaryotes have a mix of unique and repetitive sequence types of DNA.

• Unique-sequence DNA includes most of the genes that encode proteins, and Euchromatin is rich in unique-sequence DNA.

• Repetitive-sequence DNA includes the moderately and highly repeated sequences. They may be dispersed throughout the genome or clustered in tandem repeats. Heterochromatin is rich in moderate and highly repetitive DNA.

• Human DNA contains about 65% unique sequences while unque sequence DNA makes up a much lower percentage of the genome of organisms that have unexpectedly large genomes (C-values) that were discussed earlier in this section.

Page 14: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 13

As Watson and Crick were solving the structure of DNA, they realized the general mechanism by which the molecule could be copied and maintain fidelity in copying the DNA molecule. From that beginning, interest in understanding the duplication of the DNA molecules of a cell became a subject of investigation, and led to a number of Nobel Prize awards. However, understanding DNA replication was critical to the development of the technologies needed for molecular genetics and ultimately genomic biology research.

3.4.1. DNA Replication is semiconservative. (return)

Among the earliest experiments concerning the nature of how DNA replicates were the studies of Mathew Meselson and Frank Stahl. Meselson, while a Ph.D. student designed an experiment that utilized so called “heavy” isotopes nitrogen. Elemental isotopes consist of atoms having the same number of proton, but with more than the average number of neutrons. For example, nitrogen normally has 7 protons, and 7 neutrons, giving it an atomic mass of 14 (written 14N)

but it is possible to find atoms with 7 protons, and 8 neutrons, having an atomic mass of 15 (written as 15N). It turns out that if you grew bacterial cells on a nitrogen source enriched in a 15N enriched nitrogen source, the DNA molecules purified from such cells have a greater density (they are heavier). By synchronizing cells and purifying DNA after each round of DNA replication and then determining the density of the newly made DNA molecules using density gradient centrifugation, Meselson and Stahl were able to show that the first round of DNA synthesis produced molecules having a hybrid density between light and heavy DNA. While after a subsequent round of DNA replication they produced light and hybrid molecules. Such a pattern of

3.4. DNA REPLICATION. (RETURN)

Figure 3.14. Diagram showing the predicted outcome of conservative, semiconservative, and dispersive DNA replication. Original strands are shown in red while newly made DNA is shown in blue.

Page 15: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 14

15N labeling was consistent only with the semiconservative replication of DNA.

3.4.2. DNA Polymerases. (return)

The enzyme that replicates the DNA double helix is called DNA polymerase. The enzyme is difficult to work with because there are but a few copies of it needed per cell, and then they are required only in S-phase of the cell cycle. In spite of these limitations, Arthur Kornberg, won the Nobel Prize in 1959 for the first purification and characterization of an enzyme that makes DNA. Kornberg’s enzyme was purified from the bacterium E. coli, and beside the enzyme 4 additional components were required to make DNA in a test tube. These factors included a template DNA (Kornberg used E. coli DNA), the four deoxy nucleotide triponosphates (dNTP), i.e. dATP, dGTP, dCTP, and dTTP. Note that these are the deoxy NTP, and not the ribose containing NTP’s. The remaining requirements for DNA polymerase are magnesium ion (Mg++) and a primer single strand of DNA. This primer requirement involves a single strand of DNA that will form a short double-stranded region of DNA. DNA polymerase then adds nucleotides to the free 3’-end of this primer, but without the primer DNA polymerase is unable to make a DNA strand. As the nucleotides are added they are added from the 5’-end to the growing 3’-end of the

strand according to the sequence of the corresponding strand being copied. This copied strand is referred to as the template strand.

All DNA polymerases studied to date make DNA using the general principles established for Kornberg’s

Figure 3.15. Note that the template strand is read from it’s 3’-end to its 5’-end while the antiparallel, new DNA strand is made from the 5’-end to the 3’-end.

Page 16: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 15

enzyme, but there are significant differences between them in other respects. For example, in E. coli there are five different DNA polymerases. Kornberg’s enzyme is now known as DNA polymerase I, but there are also DNA polymerases II, III, IV, and V. DNA polymerases II, IV, and V are not involved in the DNA replication process, and they have specialized functions in repairing damaged DNA under specific circumstances. DNA polymerases I and III are the DNA polymerases involved in the replication of cellular DNA. Both of these DNA polymerases contain a 3’ -> 5’ exonuclease activity that is involved in proof-reading the recently made DNA strand and removing any mistakes that are made. Only DNA polymerase I has a 5’ -> 3’ exonuclease activity and we will visit this function again below when the role of DNA polymerase I in DNA replication is considered.

3.4.3. Initiation of replication. (return)

Replication initiates at a specific sequence in the genome that is often called an origin of replication. E. coli has one origin, called oriC, where replication starts when the strands of the helix are forced apart to expose the bases, creating a replication bubble with two replication forks. Replication is usually bidirectional from the origin using the two forks to enlarge the bubble in both directions. E. coli has one origin, oriC, with the following properties:

• A minimal sequence of about 245 bp required for initiation.

• Three copies of a 13-bp AT-rich sequence. • Four copies of a 9-bp sequence.

Figure 3.16. Initiation of DNA replication in E. coli. at oriC. Noote the 9 and 13 bp repeats where DNA helicase binds and activates replicatlion throught the action of DNA primase.

Page 17: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 16

From a series of in vitro studies it has been shown in E. coli that the following steps are involved in initiating replication:

1) Initiator proteins attach to oriC (E. coli’s initiator protein is the DnaA protein derived from the dnaA gene.

2) DNA helicase (from dnaB gene) binds initiator proteins on the DNA and denatures the AT-rich 13-bp region using ATP as an energy source.

3) DNA primase (from the dnaG gene) binds helicase to form a primosome, which synthesizes a short (5–10 nt) RNA primer.

3.4.4. DNA Replication is Semidiscontinuous (return)

When DNA denatures (strands separate) at the ori, replication forks are formed. DNA replication is usually bidirectional, but we will consider events at just one replication fork, but don’t forget that a similar set of events are occurring at the other replication fork in the bubble. The events occurring at each fork are:

1) Single-strand DNA-binding proteins (SSBs) bind the ssDNA formed by helicase, preventing reannealing.

2) Primase synthesizes a primer on each template strand.

3) DNA polymerase III adds nucleotides to the 3’-end of the primer, synthesizing a new strand complementary to the template and displacing the SSBs. DNA is made in opposite directions (at each fork) on the two template strands since DNA polymerase only adds nuclotides to the free 3’-end.

4) The new strand made 5’-to-3’ in the same direction as movement of the replication fork, i.e. DNA polymerase III is continuously moving toward the fork on one strand of the bubble at each fork. This defines the “leading strand”. On the other strand the new strand must be made in the opposite direction as it must be made 5’ -> 3’.

5) This means that on this “lagging strand” primase must add the RNA primer very close to the replication fork, and the DNA polymerase III moves away from the fork rather than toward the fork like it was on the leading strand.

6) The Leading strand needs only one primer and continuously makes the new DNA strand, while on the lagging strand a series of RNA primers are required and only a limited number of DNA nucleotides are added by DNA polymerase III before the previously made fragment is encountered.

Page 18: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 17

7) Thus, the leading strand is synthesized continuous-ly, while the lagging strand is synthesized discontin-uously in the form of shorter pieces of DNA with interspersed RNA primers called Okazaki frag-

ments. DNA replication is therefore semidiscon-tinuous.

8) As the bubble enlarges and DNA helicase denatures (untwists) the strands, this causes tighter winding in other parts of the circular chromosome. A protein called DNA Gyrase relieves the tension created in the molecule.

9) As Okazaki fragments accumulate on the lagging strand, DNA polymerase I binds and the 5’ -> 3’ exonuclease activity removes the RNA primers, and replaces them with DNA nucleotides.

Figure 3.17. DNA replication at a replication fork showing continuous DNA synthesis on the lower strand and discontinuous DNA synthesis on the upper strand where Okazaki fragments are

d d

Figure 3.18. Removal of the RNA primers by the 5’-> 3’ exonuclease of DNA polymerase I, and replacement with DNA nucleotides on the lagging strand.

Page 19: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 18

10) The DNA fragments lacking RNA primers are now fastened together using an enzyme called DNA ligase that closes the remaining gaps on the lagging strand.

3.4.5. DNA replication in Eukaryotes. (return)

Enzymes of eukaryotic DNA replication are not as well characterized as their prokaryotic counterparts. Fifteen DNA polymerases are known in mammalian cells, for example. Three DNA polymerases are used to replicate nuclear DNA. Pol extends the 10-nt RNA primer by about 30 nt. Pol and Pol extend the RNA/DNA primers, one the leading strand and the other on the lagging stand, but it is not clear which synthesizes which.

Primer removal differs from that in prokaryotes. Pol continues extension of the newer Okazaki fragment, displacing the RNA and producing a flap that is removed by nucleases, thus allowing the Okazaki fragments to be joined by DNA ligase.

Other DNA polymerases replicate mitochondrial or chloroplast DNA, or they are used in DNA repair. These are all similar to the prokaryotic system described in detail above.

3.4.6. Replicating ends of chromosomes. (return)

Replicating the ends of chromosomes in organisms without circular chromosomes presents unique problems. Removal of primers at the 5’-end of the newly made strand will produce shorter strands that cannot be extended with existing DNA polymerases, and if the gap is not addressed chromosomes would become shorter each time DNA replicates. Thus a new mechanism for the completion of the ends of the chromosome is required. This is accomplished using the telomerase system.

Most eukaryotic chromosomes have short, species-specific sequences tandemly repeated at their telomeres. It has been shown that chromosome lengths are maintained by telomerase, which adds telomere repeats without using the cell’s regular replication

Figure 3.19. DNA ligase joins an opening in a DNA strand remaking acomplete phosphodiester-linked polynucleotide chain.

Page 20: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 19

machinery. In humans, the telomere repeat sequence is 5’-TTAGGG-3’.

Telomerase, an enzyme containing both protein and RNA, includes an 11-bp RNA sequence used to synthesize the new telomere repeat DNA. Using an RNA template to make DNA, telomerase functions as a reverse transcriptase called TERT (telomerase reverse transcriptase). The 3’-end of the telomerase RNA contains the sequence 3’-CAUC, which binds the 5--GTTAG-3’ overhang on the chromosome, positioning telomerase to complete its synthesis of the GGGTTAG telomere repeat. Additional rounds of telomerase activity lengthen the chromosome by adding telomere repeats. Ends of telomere DNA usually loop back to form a D-loop. After telomerase adds telomere sequences, chromosomal replication proceeds in the usual way. Any shortening of the chromosome ends is compensated for by the addition of the telomere repeats.

Telomere length may vary, but organisms and cell types have characteristic telomere lengths, resulting from many levels of regulation of telomerase. Mutants affecting telomere length have been identified, and data shortening of telomeres eventually leads to cell death. Loss of telomerase activity results in limited rounds of cell division before the cell death.

Figure 3.20. The dilemma of how the 3’ overhangs are replicated at each end of the chromosome to duplicate a chromosome and make sister chromatids.

Page 21: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 20

Figure 3.21. Replication of chromosome ends using telomerase.

Page 22: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 21

In cells the genetic information carried in the DNA

nucleotide sequence becomes functional information that gives characteristics to cells ultimately specifying traits. This conversion of DNA sequence information into functional information begins with the creation of cellular RNAs from one of the two strands of DNA sequence. This process is called transcription. The mechanism by which these cellular RNAs are transcribed from DNAs will be presented in this section while the regulation of these processes will be covered later.

3.5.1. Cellular RNAs are transcribed from DNA (return)

Ribosomal RNAs (return 3.6.4.) The most abundant type of RNA in most cells is a

structural component of the cellular particle that is involved in the synthesis of proteins called a ribosome. Since ribosomes have 2 subunits, a large subunit and a small subunit, they also have two major types of ribosomal RNA. These are described in detail in Table 3.2. In addition to the largest ribosomal RNAs there are additional smaller ribosomal RNAs as well. Note that the size and nature of all of these ribosomal RNAs is different in Prokaryotes and Eukaryotes.

3.5. TRANSCRIPTION. (RETURN)

Page 23: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 22

In prokaryotes a small 30S ribosomal subunit contains the 16S ribosomal RNA. The large 50S ribosomal subunit contains two rRNA species (the 5S and 23S ribosomal RNAs). Bacterial 16S ribosomal RNA, 23S ribosomal RNA, and 5S rRNA genes are typically organized as a co-transcribed unit (operon). There may be one or more copies of the operon dispersed in the genome (for example, Escherichia coli has seven). Archaea contains either a single rDNA operon or multiple copies of the operon.

In Eukaryotes, the cytoplasmic small ribosomal subunit (40S) contains an 18S rRNA while the large ribosomal subunit (60S contains a 28S, 5S, and 5.8S rRNA. As in Prokaryotes these rRNAs are structural components of ribosomes where they perform essential function. In mammals, the 28S, 5.8S, and 18S rRNAs are encoded by a single nuclear transcription unit (45S). Two internally transcribed spacers separate the 3 rRNA species in the 45S transcript. Generally, there are many copies of the 45S rDNAs organized clusters throughout the nuclear genome. In humans, for example, each cluster has 300-400 repeats. 5S rDNA is not made as part of the 45S transcript, but occurs in tandem arrays (~200-300 5S genes) interspersed in the mammalian genome independently of the 45S rDNA genes.

Mammalian mitochondria have only two mitochondrial rRNA molecules (12S and 16S) but do not contain 5S rRNA. The ribosomal RNAs are transcribed from the mitochondrial genome. This is also the case for plant mitochondrial rRNAs although plants contain a more prokaryotic like ribosomal RNAs, i.e. a 16S, a 26S, and a 5S rRNA. Plants also contain chloroplast ribosomal RNAs (16S, 23S, and 5S) produced by transcription from the chloroplast genome.

Messenger RNAs – mRNAs

All organisms (and mitochondria and chloroplasts) produce a type of RNA that codes for the amino acid sequence of proteins. This RNA is a copy of the DNA sequence of the gene and is transcribed from one of the two DNA strands of each gene. By reproducing the DNA sequence as an mRNA copy the sequence information for the gene is faithfully maintained allowing the generation of many gene “copies” that can be used to produce even more protein copies from each gene.

Transfer RNAs - tRNA (return 3.6.3.)

Transfer RNAs (tRNAs) are smaller (~90 nt) RNA molecules that are transcribed from genes scattered throughout both Prokaryotic and Eukaryotic genomes, including mitochondrial and chloroplast genomes. These molecules are the “decoding” molecules that determine which amino acids are put in proteins in the

Page 24: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 23

order specified by the nucleotide sequence in the mRNA. They are highly structured RNA molecules, and there is at least one, often several, tRNA for each of the twenty protein-contained amino acids. Each tRNA is processed from a transcribed precursor-tRNA molecule coded for by specific tRNA genes, and typically there is but one tRNA produced per tRNA gene.

In Eukaryotes tRNA are scattered across all chromosomes, and there are separate sets of tRNA genes in each of the organelle genomes present in eukaryotes.

Other Non-protein-coding Transcribed RNAs

More recently additional types of RNAs that perform vital functions in cells have been described. Most of these have been described in Eukaryotes once we described and characterized genomes of Eukaryotes.

Small nuclear RNAs (snRNA) are smaller RNAs (typically ~ 150 nt) transcribed from nuclear DNA in eukaryotic cells. snRNAs are structurally part of small nuclear ribonucleoprotein particles (snRNPs) that are involved in processing mRNAs in the nucleus of cells. Typically there are but a handful of different snRNAs made in each species and these are highly conserved among eukaryotes.

Small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that function to guide modification of

other types of RNA, mostly rRNA, tRNA, and snRNA. One of the main functions of snoRNAs involves modification of the 45S ribosomal precursor so that it can be futher processes to generate the 18S, 5.8S, and 28S rRNAs.

Small regulatory RNAs are found in prokaryotes where they are involved in the regulation of gene expression, but mostly they are known for the role they play in transcriptional, posttranscriptional and translational control of gene expression in Eukaryotes. These molecules are an array of 20-30 nt RNAs transcribed in various ways from genes in the genomes of organisms. Note that although there are primarily 2 types of srRNAs, microRNAs (miRNA) and short interfering RNA (siRNA) these types are specific to certain organisms and there are likely thousands of genes transcribed for such srRNAs.

3.5.2. RNA polymerases catalyze transcription (return) RNA polymerase is the enzyme responsible for

copying a DNA sequence into an RNA sequence, during the process of transcription. As complex molecule composed of protein subunits, RNA polymerase controls the process of transcription, during which the information stored in a molecule of DNA is copied into a molecule of cellular RNA.

Page 25: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 24

The detailed mechanism of how RNA polymerase works is shown in Figure 3.22.

Multisubunit RNA polymerases exist in all species, but the number and composition of these proteins vary across taxa. For instance, bacteria contain a single type

of RNA polymerase that transcribes mRNA, tRNA, and all rRNAs. Eukaryotes contain three (animals and fungi) to five (plants) distinct types of RNA polymerases. Each of these RNA polymerases transcribes different species of RNA as shown in Table 3.3.

Figure 3.22. The chemical reaction catalyzed by RNA polymerases showing both the reactants and products and the specificity of base pair addition. Note the antiparallel nature of the RNA strand to the DNA strand being transcribed. RNA polymerase makes a phosphodiester bond between the 5’-phosphate group closest to the ribose sugar and the 3’-OH on the 3’-end of the growing strand of RNA.

Page 26: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 25

In spite of these differences, there are striking

similarities among transcriptional mechanisms for all RNA polymerases. For example, transcription is divided into three steps for both bacteria and eukaryotes. They are initiation, elongation, and termination. The process of elongation is highly conserved between bacteria and eukaryotes, but initiation and termination are somewhat different.

All species require a mechanism by which transcription can be regulated in order to achieve spatial and temporal changes in gene expression. Proteins that interact with the core RNA polymerase, and that recognize specific sequences in the DNA mediate these initial regulatory steps during transcription initiation. However the types and nature of these interacting proteins are quite distinct in Prokaryotes compared to Eukaryotes. This leads to a discussion of how transcription initiation at each gene locus takes place in both Prokaryotes and Eukaryotes.

3.5.3. Transcription in Prokaryotes (return)

For a model of Prokaryotic gene regulation, the bacterium, Escherichia coli, will be used as a model. This model is similar to nearly all Prokaryotes.

A prokaryotic gene is a DNA sequence in the chromosome. The gene has three regions, each with a function in transcription (see Figure 3.23.). These are:

1) A promoter sequence that attracts RNA polymerase to begin transcription at a site specified by the promoter. Some genes use one strand of DNA as the template; other genes use the other strand.

2) The transcribed sequence, called the RNA-coding sequence. The sequence of this DNA corresponds with the RNA sequence of the transcript.

3) A terminator region that specifies where trans-cription will stop.

Figure 3.23. Prokaryotic genes all have promoter regions upstream (toward the 5’-end of the mRNA) of the protein coding gene and terminator regions downstream (toward the 3’-end of the mRNA). These regions are located at the 3’-end (promoter) and the 5’-end (terminator) of the template strand of DNA. Typically the nucleotide where RNA polymerase begins transcribing is designaed the +1 nucleotide position, and sequences in the promoter are designated as (-) nt positions.

Page 27: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 26

The process of transcription initiation in E. coli is shown in Figure 3.24. The process involves two DNA sequences centered at -35 bp and -10 bp upstream from the +1 start site of transcription in the promoter region of the gene. These two consensus sequences (in E. coli) are 5’-TTGACA-3’ at the -35 nt region and 5’-TATAAT-3’ at the -10 region (previously known as a Pribnow box, but they can vary according to the organism and gene within the organism.

Transcription initiation requires the RNA polymerase holoenzyme (only one type is found in bacteria) to bind to the promoter DNA sequence. Holoenzyme consists of:

1) Core enzyme of RNA polymerase, containing five polypeptides (two alpha, one beta, one beta’ and an omega; written as α2ββ’ω).

2) One of several sigma factors (σ-factor) that binds the core enzyme and confers ability to recognize specific gene promoters.

RNA polymerase holoenzyme binds promoter in two steps (Figure 3.24) that involve the sigma factor. First, it loosely binds to the -35 sequence of dsDNA closed promoter complex (Figure 3.24a). Second, it binds tightly to the -10 sequence (Figure 3.24b), untwisting about 17 bp of DNA at the site. At this point RNA

a

b

c

d

Figure 3.24. Prokaryotic (E. coli) transcription initiation. a) RNA Polymerase holoenzyme is “recruited to the promoter by a specific σ-factor (sigma factor); b) strands of the DNA are separated exposing the sense strand for copying; d) nucleotides are polymerized as RNA polymerase moves down the strand, and σ-factor leaves the complex as; d) elongation continues, the newly made mRNA exits the enzyme, and the transcription “bubble” moves

Page 28: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 27

polymerase is in position to begin transcription (open promoter complex).

Promoters often deviate from consensus the consensus sequences at -35 and -10, and the associated genes will show different levels of transcription, corresponding with σ-factor’s ability to recognize their sequences. E. coli has several sigma factors with important roles in gene regulation. Each sigma can bind a molecule of core RNA polymerase and guide its choice of genes to transcribe, but has different affinity for specific promoters.

Most E. coli genes have a σ70 promoter, and σ70 is usually the most abundant σ-factor in the cell. σ70 recognizes the sequence TTGACA at -35, and TATAAT at -10. Other sigma factors may be produced in response to changing conditions, and each can bind the core RNA polymerase, enabling holoenzyme to recognize different promoters. An example is σ32, which arises in response to heat shock and other forms of stress and recognizes a sequence at -39 bp and -15 bp. E. coli has additional sigma factors with various roles (Table 3.4), and other bacterial species also have multiple similar and additional sigma factors.

Many bacterial genes are controlled by regulatory proteins that interact with regulatory sequences near the promoter. There are two classes of regulatory

proteins, i.e. activators that stimulate transcription by facilitating RNA polymerase activity, and repressors that

TABLE 3.4. E. coli σ-factors and their function

s-factors Function

σ70 (rpoD) = σA the "housekeeping" sigma factor or also called as primary sigma factor, transcribes most genes in growing cells. Every cell has a “housekeeping” sigma

σ19 (fecI) the ferric citrate sigma factor, regulates the fec gene for iron transport

σ24 (rpoE) the extracytoplasmic/extreme heat stress sigma factor

σ28 (rpoF) the flagellar sigma factor σ32 (rpoH) the heat shock sigma factor; it is turned

on when the bacteria are exposed to heat. Due to the higher expression, the factor will bind with a high probability to the polymerase-core-enzyme. Doing so, other heatshock proteins are expressed, which enable the cell to survive higher temperatures. Some of the enzymes that are expressed upon activation of σ32 are chaperones, proteases and DNA-repair enzymes.

σ38 (rpoS) the starvation/stationary phase sigma factor

σ54 (rpoN) the nitrogen-limitation sigma factor

Page 29: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 28

inhibit transcription by decreasing RNA polymerase binding or elongation of RNA.

Once initiation is completed, RNA synthesis begins, and the sigma factor is released and reused for other initiations (Figure 3.24c). Core enzyme completes the transcript. Core enzyme untwists DNA helix locally, allowing a small region to denature. Newly synthesized RNA forms an RNA–DNA hybrid, but most of the transcript is displaced as the DNA helix reforms (Figure 3.24d).

Terminator sequences are used to end transcription. In E. coli there are two types of transcript termination:

1) Rho-independent (ρ-independent) or type I terminators (Figure 3.25, upper) have twofold symmetry that would allow a hairpin loop to form (Figure 3.25). The palindrome is followed by 4–8 U residues in the transcript, and when these sequences are transcribed, they form a stem-loop structure and cause chain termination.

2) Rho-dependent (ρ-dependent) or type II terminators (Figure 3.25, lower) require the protein ρ for termination. Rho binds to the C-rich sequence in the RNA upsteam of the termination site and moves with the transcript until encountering a stalled polymerase. It then acts as a helicase, using ATP hydrolysis for energy to

move along the transcript and destabilize the RNA–DNA hybrid at the termination region, terminating transcription.

3.5.4. Transcription in Prokaryotes – polycistronic mRNAs from operons (return)

While we have considered the structure of a prokaryotic gene as having a promoter, a coding region, and a termination region (see Figure 3.23), in most cases multiple protein-coding regions are under the control of a single promoter. This genetic structure is

Figure 3.25. Simplified schematics of the mechanisms of prokaryotic transcriptional termination. In Rho-independent termination, a terminating hairpin forms on the nascent mRNA interacting with the NusA protein to stimulate release of the transcript from the RNA polymerase complex (top). In Rho-dependent termination, the Rho protein binds at the upstream rut site, translocates down the mRNA, and interacts with the RNA polymerase complex to stimulate release of the transcript.

Page 30: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 29

referred to as an operon, and the mRNA transcribed from each operon is in fact an RNA capable of producing multiple peptides. This type of mRNA, typical of prokaryotes, and Eukaryotic mitochondria and chloroplasts, is referred to as a polycistronic mRNA. Thus, the proteins binding to promoter and regulatory regions of genomes that regulate gene expression in prokaryotes regulate the production of multiple peptides simultaneously. Typically, these peptides are functionally related, e.g. the proteins required to catabolize lactose as a carbon source [lac operon] (see Figure 3.26.), or the proteins required to make the amino acid tryptophan [trp operon] (see Figure 3.27.).

The lac operon is an example of an inducible (positively regulated) operon. The repressor protein does not bind to the operator and stop transcription in the presence of the effector (lactose), while the tryptophan operon is an example of a repressible (netatively regulated) operon. The repressor protein only binds to the operator in the presence of the effector molecule (tryptophan). Thus, using the similar types of regulatory proteins and genes, and similar operon structure almost any type of gene regulation can be obtained.

Additionally, it should be noted that the proteins for related critical cellular functions can be coordinately

regulated as a consequence of the production of polycistronic mRNAs.

Figure 3.26. The lac operon in E. coli. Three lactose metabolism genes (lacZ, lacY, and lacA) are organized together in a cluster called the lac operon. The coordinated transcription and translation of the lac operon structural genes is controlled by a shared promoter, operator, and terminator. A lac regulator gene (lacI) with its separate promoter is found just outside the lac operon. The lacI gene produces a regulatory protein, the lac repressor protein that binds to the “inducer”, which is lactose (or a derivative, allolactose) when it is present in a cell. The lacI protein also can bind to a region of the operon between the lac promoter and the structural genes referred to as the lac operator (lacO). In the absence of lactose (allolactose) the lacI protein tightly binds to the operator and prevents RNA polymerase from transcribing the polycistronic mRNA. When lactose binds to the lacI protein, the lacI protein cannot bind to the lacO gene, and RNA polymerase proceeds to produce the polycistronic mRNA corresponding to the lacZ, lacY, and lacA genes. © 2013 Nature Education Adapted from Pierce, Benjamin. Genetics: A Conceptual Approach, 2nd ed. All rights reserved.

Page 31: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 30

3.5.5. Beyond Operons - Modification of expression of prokaryotic genes (return)

Additional regulation of operons is often used to produce further fine-tuning of transcription. This can vary with each operon in Prokaryotic genomes. A common type of additional regulation has been shown for the lac operon and many other catabolic operons. Glucose is the preferred carbon source in E. coli. In the presence of glucose, lactose will not be utilized. This means that if an abundant supply of glucose and lactose are both available, the lac operon will not be induced until the glucose is used up. This phenomenon is often referred to as catabolite repression, and the critical components of catabolite repression in the lac operon are shown in Figure 3.28.

When the concentration of intracellular glucose is low (Figure 3.28, upper panel) the levels of the signal molecule cAMP are high, and cAMP binds to CAP protein. The association between RNA polymerase and promoter DNA is enhanced when the CAP-cAMP complex is present. Enhanced RNA polymerase binding leads to a high rate of transcription (provided that the operator is free) and translation of the lac operon polycistronic mRNA. The resulting mRNA transcripts are translated into the enzymes beta-galactosidase, permease, and transacetylase, and these enzymes are

A

B

Figure 3.27. The tryptophan operon of E. Coli consists of five structural genes (trpE, trpD, trpC, trpB, and trpA) with a common promoter, operator, and terminator. A separate promoter regulates the trpR regulatory protein (trp repressor). Transcription of the trp operon produces a polycistronic mRNA that contains a leader peptide and coding sequences for the 5 structural genes that produce the 5 enzymes required to make tryptophan. Since tryptophan is an amino acid required for cell growth, the trp operon is “repressed” when cells have access to an abundant supply of tryptophan (panel A), and becomes “derepressed” when cells are starving for tryptophan (panel B). A) Tryptophan present, repressor bound to operator, operon repressed. When complexed with tryptophan, the repressor protein binds tightly to the trp operator, thereby preventing RNA polymerase from transcribing the operon structural genes. B) Tryptophan absent, repressor not bond to operator, operon derepressed. In the absence of tryptophan, the free trp repressor cannot bind to the operator site. RNA polymerase can therefore move past the operator and transcribe the trp operon structural genes, giving the cell the capability to synthesize tryptophan.

Page 32: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 31

used to break down lactose into glucose and Galactose. The latter can subsequently be converted into glucose.

When the glucose concentration in the cell is high (Figure 3.28, lower panel), low concentrations of cAMP result in decreased binding of cAMP to CAP. Therefore, the cAMP-CAP complex is not bound to the bacterial DNA, and as a result, neither is RNA polymerase. This lowers the rate of transcription and polycistronic mRNA production is decreased for the lacZ, lacY, and lacA genes. The absence of these proteins reduces glucose production from lactose, leading to the use of the available glucose prior to the use of any lactose.

The interaction of CAP with DNA and with cAMP directly regulates the production of mRNA. Some type of interaction of proteins with regulatory regions in the DNA mediates the phenomenon of catabolite repression in operons associated with carbon source utilization in prokaryotes.

In anabolic operons (typical of amino acid synthesis), a phenomenon of additional regultation referred to as attenuation has been documented. The example most commonly considered involves the trp operon discussed above.

The leader sequence in the polycistronic mRNA of the trp operon contains several trp codons, and can form 3 different stem-loop structures. Depending on the

Figure 3.28. Diagram showing the major effects of low glucose (upper panel) and high glucose (lower panel) on the expression of lac operon genes. © 2013 Nature Education Adapted from Pierce, Benjamin. Genetics: A Conceptual Approach, 2nd ed. (New York: W. H. Freeman and Company), 446. All rights reserved.

Page 33: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 32

amount of available tryptophan, one of two structures can be produced (Figure 3.29) One structure leads to termination of transcription in the leader sequence

when trp is abundant. While the other structure does not terminate transcription, and the polycistronic mRNA is produced.

Several amino acid synthetic operons (e.g. phenyl-alanine, histidine, leucine, threonine, and isoleucine-valine) demonstrate this same type of attenuation. Consequently, this mechanism is relatively widespread as a means of modulating and fine-tuning pathways for amino acid biosynthesis.

3.5.6. Transcription in Eukaryotes (return)

Although transcription in Eukaryotes follows the general principles outlined above for Prokaryotes, there are many specific details that are different. Recall that there are as many as five Eukaryotic RNA polymerases. While each of these transcribes different types of RNA, they are all Multisubunit RNA polymerases that function in related ways. The mechanism of the important RNA polymerase II that produces mRNAs will be described here, but each of the 5 has similar mechanisms for initiation, elongation, and termination of transcription.

Eukaryotic mRNAs are nearly always monocistronic mRNAs with a general structure as shown in Figure 3.30. The key transcribed features are a 5’-UTR (untranslated region), a coding region, and 3’-UTR.

Figure 3.29. Attenuation of the trp operon. The diagram at the center shows the general folding of the leader sequence of the trp polycistronic mRNA and labeling of strands. The mRNA is folded in four parallel strands connected at the bottom by two small hairpin loops between strands 1 and 2 and strands 3 and 4 and by one large hairpin loop at the top between strands 2 and 3. In the structure on the left, strands 1 and 2 and strands 3 and 4 are stabilized by base pairing. This structure terminates transcription of the trp operon in the presence of high tryptophan. In contrast, strands 2 and 3 are stabilized by base pairing in the structure on the right, which allows transcription of the trp operon to continue in the presence of low tryptophan. © 1981 Nature Publishing Group Yanofsky, C. Attenuation in the control of expression of bacterial operons. Nature 289, 753 (1981). All rights reserved.

Page 34: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 33

Other nontranscribed features that are typical of mRNAs in

Eukaryotic cells include a 5’-Cap structure and a poly-A tail that will be described in more detail below.

Promoters in many Eukaryotes have been analyzed either by the use of directed mutations within promoter sequences or by comparative analysis of multiple genes from different organisms. These studies have revealed that there are two types of elements found in

Eukaryotic promoters, core promoter elements and promoter proximal elements.

Core promoter elements are located near the transcription start site and specify where transcription begins. Examples include:

1) The initiator element (Inr), a pyrimidine-rich A that spans the transcription start site;

2) The TATA box (also known as a TATA element or Goldberg–Hogness box) at -30 nt (full sequence is TATAAAA). This element aids in local DNA denaturation and sets the start point for transcription.

Promoter-proximal elements are required for high levels of transcription. They are further upstream from the start site, at positions between -50 and -200. These elements generally function in either orientation. Examples include:

1) The CAAT box, located at about -75. 2) The GC box, consensus sequence GGGCGG,

located at about -90.

Various combinations of core and proximal elements are found near different genes. Promoter-proximal elements are key to understanding the rate at which transcription initiation occurs and thus the level of gene expression.

AAAAAA

G7-Me

AAAAAA

G7-Me

AAAAAA

5’ 3’

upstream Enhancers Promoter

TATA box 5’ UTR 3’ UTR

DNA

Coding Region

Exon 1 Exon 2 Exon 3

Intron 1 Intron 2

5’ Cap 3’ Poly-A tail

Primary Transcript

Pre-mRNA

Final mRNA

Gene Transcription by RNA Polymerase II

Nuclear Processing – 5’ Capping & poly-A tail addition

Nuclear Processing – Intron removal & transport to the cytoplasm

5’ UTR 3’ UTR Protein Coding Region 5’ Cap 3’ Poly-A tail

Figure 3.30. Diagram showing the elements and structure of a typical eukaryotic mRNA-producing gene. Note that a primary transcript is produced which is subsequently modified by the addition of a 7-methyl guanosine (Cap), and the poly-A tail. Subsequently, introns are spiced from the transcript to make a finished mRNA ready to exit the nucleus.

Page 35: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 34

Eukaryotic Transcription initiation requires assembly of RNA polymerase II and binding of general transcription factors (GTFs) on the core promoter at the TATA box (see Figure 3.31) forming a preinitiation

complex (PIC). Note that the PIC is sometimes referred to simply as the transcription initiation complex. GTFs are needed for initiation by all RNA polymerases and are numbered to match their corresponding RNA polymerase and lettered in the order of discovery (e.g., TFIID was the fourth GTF discovered that works with RNA polymerase II). The general transcription factors along with other proteins forming specific PICs at a particular promoter poise RNA polymerase to begin transcription of the gene behind the promoter.

Once the PIC forms, RNA polymerase will initiate transcription. However, the rate at which transcription initiation occurs at a particular gene depends on 2 factors. The first factor is the number and types of enhancer/silencer sequence elements found in the promoter. These sequence elements can be from 50 nt to over 1,000 nt in length. Enhancer/silencer elements must be located in cis (meaning close to) to promoter/coding sequence in order to effect the expression of a gene. Some enhancer/silencer sequences have been found that are as much as 1 megabase (1,000,000 nt) away from the transcription start site (TATA box), but most are within a few thousand bases or less of the TATA box.

The second factor regulating the rate of transcript initiation is proteins that can bind to specific enhancer or silencer sequence elements. Activators are proteins

Figure 3.31. Eukaryotic transcription begins with the formation of a transcription preinitiation complex (PIC) on the TATA box in the promoter of the gene. The PIC is a large complex of proteins that is necessary for the transcription of protein-coding genes in eukaryotes. The preinitiation complex helps position RNA polymerase II over gene transcription start sites, denatures the DNA, and positions the DNA in the RNA polymerase II active site for transcription. The minimal PIC includes RNA polymerase II and six general transcription factors: TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. Additional regulatory complexes (co-activators and chromatin-remodeling complexes) could also be components of the PIC.

Page 36: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 35

that bind to enhancer sequences. Activator proteins also contain protein-protein interaction domains that allow them to bind to and affect the behavior of other proteins. These other proteins could be RNA polymer- ase itself; other general transcription factors in the PIC; or other adapter proteins that interact with the PIC (see Figure 3.32).

Repressor proteins can either bind to silencer or enhancer sequence elements in the promoter. In so doing they reverse the effect of activator proteins by

either interfering with the critical protein-protein interactions of activators or by binding tightly to enhancer sequences keeping activators from binding.

Thus, activator and repressor proteins are important in transcription regulation. They are recognized by promoter-proximal elements and other enhancer/ silencer sequence elements found upstream of the promoter, and they are specific for groups of similarly regulated genes. These proteins mediate the rate of transcription initiation for genes that contain recognized sequence elements. The presence or absence of specific activator and repressor sequences in a specific cell either because of cell type or because of environmental factors can mediate the initiation of transcription. For example, housekeeping genes (used in all cell types for basic cellular functions) have common promoter-proximal elements and are recognized by activator proteins found in all cells. Examples of genes with housekeeping functions include: actin, hexokinase, and Glucose-6-phosphate dehydrogenase.

Genes expressed only in some cell types or at particular times have promoter-proximal elements recognized by activator proteins found only in specific cell types or times. Enhancers are another cis-acting element. They are required for maximal transcription of a gene.

Figure 3.32. An activator protein binding to a promoter-proximal enhancer seuqence, interacting with an adapter protein, and the PIC to enhance transcription initiation.

Page 37: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 36

Enhancers/silencers are usually upstream of the transcription initiation site but may also be downstream. They may modulate from a distance of thousands of base pairs away from the initiation site.

Because there are similar enhancer and silencer sequences in front of several genes that are coordinately regulated, and each gene promoter has its own unique spectrum of such sequences, Eukaryotic cells can avoid the necessity of contiguous organization of genes into operons as is common in prokaryotes.

Additionally, each tissue produces a set of tissue-specific and general activator and repressor proteins, and the spectrum of these proteins can be influenced by environmental factors such as cellular surroundings, temperature, chemical environment, etc. This affords the ability of each cell to “customize” the expression of genes depending on the protein functions that are required in each cell based on cell type and cellular environment. This phenomenon is referred to as combinatorial gene regulation and is illustrated in Figure 3.33.

Once transcription initiation has occurred, the RNA polymerase moves away from the TATA box as the transcript is elongated. This is fundamentally the work of RNA polymerase, and the other proteins of the PIC now leave the complex to be recycled to form new PICs while RNA polymerase elongates the primary transcript nucleotide chain to complete the formation of the primary transcript.

3.5.7. Processing the primary transcript into a mature mRNA (return)

As shown in Figure 3.30, the primary transcript must be processed in 3 significant ways to become a mature mRNA. This processing all takes place in the nucleus of the cell and prepares the mRNA for transport to the

Figure 3.33. Combinatorial gene regulation leads to the coordinate regulation of batteries of genes in Eukaryotes. The types of enhancer and silencer sequences in front of each gene determine the level of transcription of each gene based on the activator and repressor proteins present in each cell/tissue type and the environment surrounding each cell.

Page 38: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 37

cytosol of the cell where it will subsequently be translated to produce a protein.

First, the primary transcript must acquire a cap at its 5’- end. The cap prepares the transcript for transport from the nucleus, provides stability to attack by exonucleases in the cytoplasm, and aids in the initiation of the translation process. Structurally, a cap consists of a 7-methyl guanosine attached by 3 phosphate groups

to the 5’-end of the transcript. Note that the cap is reversed compared to the RNA strand, i.e. it is attached 5’ to 5’ not 5’ to 3’ as are the other nucleotides in the transcript. The cap can be attached to the transcript during transcription before completion of the primary transcript, but it is critical to efficient transport of the mRNA from the nucleus so it must be attached in the nucleus.

The second processing step occurs at the 3’-end of the transcript (Figure 3.35), and is involved in transcript termination of elongation by RNA polymerase II. Note that other eukaryotic RNA polymerases may have other mechanisms of transcript termination since they do not produce poly adenylated transcripts.

The process for addition of the poly-A tail involves a complex of proteins that assembles at a poly-A addition consensus sequence (AAUAAA). The proteins involved in the cleavage step of the termination process include:

1) CPSF (cleavage and polyadenylation specificity factor).

2) CstF (cleavage stimulation factor). 3) Two cleavage factor proteins (CFI and CFII).

Following cleavage, the enzyme poly(A) polymerase (PAP) adds A nucleotides to the 3’ end of the cleaved transcript RNA, using ATP as a substrate. PAP is bound to CPSF during this process. Typically, about 200-250 A’s

Figure 3.34. Structure of the 5’-Cap added to Eukaryotic primary RNA transcripts. The cap consists of a 7-methyl guanosine residueattached 5’ to 5’ at the 5’ end of the transcript by 3 phosphate groups (a phosphotetraester).

Page 39: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 38

are added. PABII (poly-A binding protein II) binds the poly-A tail as it is produced. Upon completion of the poly-A tail, further transcription is terminated with the release of the pre-mRNA transcript from the protein complex.

The third step in the process of producing a mature mRNA from a pre-mRNA involves removal of sequences that are found in the DNA coding sequence and pre-mRNA that are absent from the mature mRNA that is found in the cytoplasm of the cell. These removed sequences are called introns. The parts of the pre-mRNA that remain in the mature mRNA are called exons (see Figure 3.30).

The removal of introns from the primary transcript to is a process referred to as splicing, and it typically involves a protein RNP particle referred to as a spliceosome.

Spliceosomes are small nuclear ribonucleoprotein particles (snRNPs) associated with pre-mRNAs. snRNAs that were previously discussed are structural parts of spliceosome RNPs. The principal snRNAs involved are U1, U2, U4, U5, and U6. Each of these snRNAs is associated with several proteins; e.g. U4 and U6 are part of the same snRNP. Others are in their own snRNPs. Each snRNP type is abundant (~105 copies per

A. Cleavage

B. Poly-A tail addition

Figure 3.35. The addition of a poly-A tail to the transcript terminates transcription of the pre-mRNA. The process involves 2 steps: A) cleavage of the growing primary transcript by a complex of proteins that recognize a poly-A addition signal in the transcript; B) Addition of 200-300 A’s to the 3’ end of the transcript by PolyA polymerase (PAP).

Page 40: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 39

nucleus) consistent with the critical role that these snRNPs play in nuclear processes.

The steps of RNA splicing are outlined in Figure 3.36:

1) U1 snRNP binds the 5’ splice junction of the intron, as a result of base-pairing of the U1 snRNA to the intron RNA.

2) U2 snRNP binds by base pairing to the branch-point sequence upstream of the 3’ splice junction.

3) U4/U6 and U5 snRNPs interact and then bind the U1 and U2 snRNPs, creating a loop in the intron.

4) U4 snRNP dissociates from the complex, forming the active spliceosome.

5) The spliceosome cleaves the intron at the 5’ splice junction, freeing it from exon 1. The free 5’ end of the intron bonds to a specific nucleotide (usually A) in the branch-point sequence to form an RNA lariat.

6) The spliceosome cleaves the intron at the 3’ junction, liberating the intron lariat. Exons 1 and 2 are ligated, and the snRNPs are released.

One of the most interesting aspects of intron splicing is that there can be different transcripts created based on how introns are spliced. This is referred to as alternative splicing can be used to produce different polypeptides from the same gene as shown in Figures 3.37 and 3.38.

Figure 3.36. The process of intron spicing conducted by U2-dependent spiceosomes. Note that there are other types of spiceosomes, and that there are a few introns that are spliced independent of spliceosomes. The binding of at least 5 RNP complexes containing snRNAs and proteins ultimately produce a structure that holds the transcript cleaved ends together while the intron is spliced out producing a “lariat” structure. The exon ends of the transcript are then ligated together producing a mature mRNA with the intron removed from the sequence.

Page 41: Chapter 3. The Beginnings of Genomic Biology – Molecular ......Chapter 3. The Beginnings of Genomic Biology – Molecular Genetics Contents 3. The beginnings of Genomic Biology –

CONCEPTS OF GENOMIC BIOLOGY Page 40

From the above discussion it is clear that processing of a mature mRNA from the primary RNA transcript, and the transport of the mature mRNA from the nucleus to the cytoplasm are steps that can influence the amount of translatable mRNA for a particular protein that exists in a cell. The details of the steps we have discussed have emerged from a series of original molecular genetic studies, and have been greatly embellished more recently by functional genomic studies that we will investigate further in subsequent chapters.

Figure 3.37. A schematic representation of alternative splicing. The figure illustrates different types of alternative splicing: exon inclusion or skipping, alternative splice-site selection, mutually exclusive exons, and intron retention. For an individual pre-mRNA, different alternative exons often show different types of alternative-splicing patterns. © 2002 Nature Publishing Group Cartegni, L., Chew, S. L., & Krainer, A. R. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Reviews Genetics 3, 285–298 (2002). All rights reserved.

Figure 3.38 Alternative splicing of 1 primary transcript to produce 3 different proteins.