essentials of molecular genetics.pdf
TRANSCRIPT
-
7/30/2019 Essentials of Molecular Genetics.pdf
1/47
ESSENTIALS OF MOLECULAR GENETICS
Prepared by Faculty of the Albert Einstein College of Medicine
(September, 1993; revised September, 2002)
-
7/30/2019 Essentials of Molecular Genetics.pdf
2/47
2
CONTENTS
What Is Molecular Genetics?......................................................................................................4
Classical Genetics and the Definition of the Gene ....................................................................4Classical Genetics Defines the Gene by the Study of Mutations ...............................................4
Mutations Can Be Dominant Or Recessive ...............................................................................5
The Complementation Test Identifies the Gene as a Unit of Activity ........................................5
A Complementation Test Sometimes Gives the "Wrong" Answer..............................................6
Transmission Genetics.................................................................................................................7Classical Genetics Defined the Rules Governing Genetic Transmission..................................7
Cytologists Discovered the Cellular Structures That Contained the Genes..............................7
Genetic Recombination between Genes in Single Linkage Groups Results from Exchange of
Material between Homologous Chromosomes ..........................................................................8
The Frequency of Genetic Recombination Can Be Used to Map Genes on Chromosomes ......9
Construction of a Genetic Map Is an Important Step in the Definition of Genes .....................9
Organisms Being Studied Today..............................................................................................10
Genetic Mapping Techniques in Various Organisms.............................................................11
The Physical Characteristics of Genomes ...............................................................................16
Genomes Consist of DNA Molecules, and Vary Widely in Size...............................................16
Bacterial Genomes Contain Some 4300 Genes, Higher Organisms May Have As Many As
30,000 or More ........................................................................................................................16
Genome Projects ........................................................................................................................17Current Methods Make the Sequencing of Whole Genomes Possible.....................................17
Construction of Physical Maps: Overlapping Clones. ............................................................17
The Physical Map Is Correlated with the Genetic Map ..........................................................17
Eukaryotic Genomes Contain a Large Amount of Repetitive DNA. ........................................18
There Are Several Kinds of Repeated Sequences ....................................................................18
Maintenance and transmission of the genetic material..........................................................20
Special Sequences Control the Replication and Transmission of the Genetic Material .........20
Enzymatic Mechanisms Repair DNA Damage and Recombine the DNA Strands ..................20
Recombinant DNA and the Construc-tion of Transgenic Organisms ..................................21Genes May Be Amplified in Pure Form by "Cloning" Them in Microorganisms. ..................21
The Polymerase Chain Reaction (PCR) Is a Way to "Clone" DNA Directly In Vitro.............22
Genes Are Cloned by Isolating Them from Clone Libraries or Clone Banks .........................23
A Variety of Vectors Provide a Range of Options for the Generation of a Clone Library .....23
Clone Libraries May Be Screened in a Number of Ways ........................................................24
Constructing Transgenic Organisms.......................................................................................25
Basic Elements of Bacterial Genetics.......................................................................................26
The Genetics of Bacteria Has Several Unique Features .........................................................26
Bacterial Cells Exchange Genetic Material in a Process Known as Conjugation .................27
The Bacterial Genetic Map Is Defined by the Time of Transfer During Conjugation ............27
The Bacterial Genetic Map and the Bacterial Chromosome Are Circular.............................27
The F Plasmid Encodes Genetic Functions Required for Transfer of DNA............................28
Integration of the F Plasmid into the Bacterial Chromosome Can Result in Mobilization of the
Chromosome for Transfer........................................................................................................28
-
7/30/2019 Essentials of Molecular Genetics.pdf
3/47
3
Plasmids Can Be Used to Construct Partially-Diploid Bacterial Strains...............................28
Plasmids Play an Important Role in the Transmission of Drug Resistance............................28
In Transformation, Bacterial Cells Take Up DNA Directly ....................................................29
Bacterial Viruses Play a Role in Genetic Exchange Between Bacteria ..................................29
Study of Bacteriophages Has Played a Central Role in the Development of Molecular Biology.
..................................................................................................................................................29 Bacterial Viruses May Kill the Host Cell or Coexist with It ...................................................30
Inferring Wild Type Gene Function from Mutant Phenotype..............................................30
To Infer Wild Type Gene Function, It Is First Necessary to Determine How the Mutation
Affects Gene Activity................................................................................................................30
Types of Mutations Are Defined by Structure and by Affects on Gene Activity ......................31
Rare Spontaneous Mutations Are of All Types ........................................................................31
Chemical Mutagens Tend to Induce Point Mutations, Radiation Tends to Produce
Rearrangements .......................................................................................................................31
Null Mutations Are Important in the Determination of the Biological Process in which a Gene
Participates..............................................................................................................................32
In Some Organisms the Null Phenotype Is Best Determined by Gene Knockout ....................33Null Mutations Can Be Identified As Mutations That Behave Genetically Like a Deficiency of
the Gene ...................................................................................................................................33
Null Mutations Have Several Characteristics That Distinguish Them from Non-Null Mutations
..................................................................................................................................................34
New Null Alleles May Be Isolated by a Non-Complementation Screen ..................................34
Hypomorphic Mutations Lower But Do Not Eliminate Gene Activity.....................................34
Gene Activity Is Raised by Hypermor-phic Mutations ............................................................35
Antimorphic Mutations Produce a Poison Gene Product .......................................................35
Neomorphic Mutations Result in a Novel Gene Activity .........................................................36
A Gain-of-function Mutant Phenotype May Be Eliminated by Introducing a Loss-of-function
Mutation at the Same Locus.....................................................................................................36Determining the Time and Place of Gene Action....................................................................36
The Time and Location of Gene Expression Can Be Determined by a Number of Biochemical
Means.......................................................................................................................................36
Reporter Genes Provide a Sensitive and Versatile Assay of Gene Expression .......................37
Gene Knockout Frequently Reveals That a Gene's Activity Is Not Required Everywhere It Is
Expressed .................................................................................................................................38
The Tissue Where Gene Activity Is Required May Be Determined by Mosaic Analysis .........38
Gene Product Synthesis and Gene Product Action Need Not Take Place in the Same
Generation ...............................................................................................................................39
Parental Effects May Be Identified by Genetic Tests ..............................................................39
Temperature-Sensitive Mutations Can Be Used to Determine the Time of Gene Action........40Analyzing Complex Processes by Genetics..............................................................................41
Genetic Analysis Allows the Probing of Complex Biological Processes Involving Multiple
Genes........................................................................................................................................41
Some Genes Involved in a Biological Process May Be Identified As Genetic Modifiers........41
Information About the Order of Gene Action in a Pathway Can Be Obtained by Epistasis
Analysis ....................................................................................................................................44
-
7/30/2019 Essentials of Molecular Genetics.pdf
4/47
What Is Molecular Genetics?
Molecular genetics is an approach to under-
standing the functions of genes. It com-
bines classical genetic analysis with molecu-lar biology to probe the nature of both gene
action and gene transmission. The essential
characteristic of molecular genetics is that
gene products are studied through the
genes that encode them. This contrasts
with a biochemical approach, in which the
gene products themselves are purified and
their activities studied in vitro.
All aspects of cell and organismal
structure and function are potentially ame-
nable to a molecular genetic approach. Be-cause genes are similar in all organisms, this
approach has many essential aspects in
common whether the organism being stud-
ied is a bacterium, a fungus, or a mammal.
The purpose of this booklet is to define and
describe these common aspects, and to point
out how they are applied in practice in the
diverse organisms that are being studied to-
day.
Gene cloning, that is, the isolation of
a gene so that its nucleotide sequence maybe determined, is central to molecular genet-
ics. Genes identified through a classical ge-
netic analysis of mutations may be cloned to
ascertain the structure of the gene product
and to permit biochemical studies of gene
activity. Alternatively, genes may be de-
fined first by the biochemical identification
of their gene product. In this case gene
cloning allows the isolation and study of
mutant forms. In either approach, starting
with a mutation or starting with a clonedgene, the techniques of classical genetic
analysis are used to draw conclusions about
gene function from the phenotype of muta-
tions.
In addition to gene function, molecu-
lar genetics is also concerned with the
transmission of the genetic material. Genes
are carried by chromosomes, whose function
is to maintain the integrity of each cell's
complement of genetic information through
cell division, and from one generation to the
next. Chromosomes contain specialized se-
quences whose function is to control chro-mosome replication, recombination, and dis-
tribution to daughter cells. The understand-
ing of such sequences can also be ap-
proached by cloning, sequencing, and the
identification of mutations.
A long term goal of molecular genet-
ics is understand gene function in the con-
text of the life, development, and reproduc-
tion of the individual, as well as the evolu-
tion of the species.
Classical Genetics and the Definition of
the Gene
Classical Genetics Defines the Gene by the
Study of Mutations
Long before it was known that genes con-
sisted of strings of nucleotides that deter-
mined the structure of proteins, it was possi-
ble to infer their existence and many of their
properties. Different forms of genes, calledalleles or mutations, were recognized by
their effects on the phenotype of the organ-
ism, that is, the organism's form and func-
tion. The complete set of allelic forms of an
organism's genes is termed its genotype.
Classical genetic studies involving
crosses between organisms with differing
genotypes and phenotypes, beginning with
Mendel, revealed that higher plants and ani-
mals are diploid, that is, they have two
copies of each gene, one derived from eachparent. Gametes, on the other hand, as well
as the genomes of some higher organisms
and most prokaryotes, have only one copy
of each gene and are said to be haploid.
With respect to a particular gene, a diploid
organism is said to be homozygous if both
-
7/30/2019 Essentials of Molecular Genetics.pdf
5/47
5
copies of the gene are the same, and het-
erozygous if two different allelic forms of
the gene are present. A heterozygote is also
known as a hybrid of the two parental forms.
An otherwise diploid organism is said to be
hemizygous for any gene present in onlyone copy, for example, genes on the X
chromosome ofDrosophila males.
Mutations Can Be Dominant Or Recessive
Since there are usually two copies of
each gene per cell, it is possible to ask what
will be the result if the two copies are differ-
ent. Through the analysis of such heterozy-
gotes, it has been possible to infer a great
deal about the properties of genes and geneproducts. Consider two alleles of a single
gene, a and b. Suppose the homozygote a/a
has the phenotype A, and the homozygote
b/b has the phenotype B. If the a/b hetero-
zygote has the phenotypeA, then a is said to
be dominant with respect to b, and b is said
to be recessive with respect to a. IfA is the
most common phenotype found in nature,
then A is called the wild type, and a is the
wildtype allele. In this case, b would be
considered a recessive mutant allele of the
gene, where the mutant phenotype is only
observed when in homozygous form. How-
ever, the wild type need not be the dominant
form, and it is possible to have mutant forms
that are dominant over wild type. Another
alternative is that the phenotype of a/b is a
mixture ofA andB characteristics, or has an
intermediate phenotype between A and B;
for example, ifA is "large" andB is "small",
the phenotype of the a/b organism might be
"medium sized", or ifA is red and B is
white, the phenotype of the a/b organism
might be pink. In this case, each of the alle-
lic forms is said to be incompletely domi-
nant or semidominant with respect to the
other. If the phenotypes respectively char-
acteristic of each allele are both expressed in
the hybrid, then the two alleles are said to be
codominant. This is the case, for example,
with different allelic forms of blood group
antigens.
The Complementation Test Identifies theGene as a Unit of Activity
In addition to making it possible to deter-
mine whether one allelic form of a gene is
dominant or recessive with respect to an-
other, diploidy makes possible a fundamen-
tal genetic test to determine whether two
mutations with the same or similar pheno-
types are in the same gene: the complemen-
tation test. A determination of the number
of genes involved is essential to begin un-raveling the role of genes in a particular pro-
cess. Suppose, for example, the genetic ba-
sis of fruit fly eye color is being studied. If
wild type fruit fly eyes are red, and two mu-
tant strains of flies have white eyes, it will
be important to know whether the two muta-
tions are in the same gene, or define two
separate genes, both of which are necessary
to make red eyes. It is by means of the
complementation test that the gene as a unit
of function is defined.
In a complementation test, an organ-
ism that is heterozygous in trans for two
mutations with similar phenotypes is con-
structed by genetic crosses, and its pheno-
type is observed. Heterozygous in trans
means that one mutant allele has been ob-
tained from one parent, and the other mutant
allele has been obtained from the other par-
ent. It is necessary that both mutations be
recessive, so that the phenotype of a hetero-
zygote for each mutant allele singly is wild
type. If the trans-heterozygote is also found
to be wild type, then the two mutations are
said to "complement" one another. If the
trans-heterozygote is found to be mutant in
phenotype, then the two mutations are said
to "fail to complement" one another.
-
7/30/2019 Essentials of Molecular Genetics.pdf
6/47
6
GENETIC NOMENCLATURE IN VARIOUS ORGANISMSE. coli yeast C. elegans Drosophila mouse
Phenotype Gal-, Lac+ Ade-, Cdcts Dpy, Unc white agoutiGene galK, lacZ ade2, cdc28 dpy-5 white AAllele
Recessive galK13, lacZ23 ade2-1 dpy-5(bx27) w, wa a, a
bDominant same ADE2-27 dpy-5(bx27d) Ubx A, Ay, Avy
Ts- same dpy-5(bx27ts) wtsWild type not written ADE2 dpy-5(+) w+, Ubx+ +, A+
The table gives one or two examples of a gene or mutation name. Notice that among the differences in usage
between the organsims, there are some consistencies: Phenotypes are written non-italicized, usually three
letters with the first letter only capitalized. Gene names, alleles, genotypes generally, on the other hand, are
italicized. In several systems, capitals denote dominance, small letters recessiveness.
How are these two different results
to be interpreted? If the trans-heterozygote
has wild type phenotype, that is, if the muta-
tions complement one another, this impliesthat the trans-heterozygote has all the ge-
netic functions needed for expression of the
wild type phenotype. In other words, the
chromosomes from each mutant parent
make up for the deficiency present on the
chromosomes of the other. If one parent is
mutant in say gene a, the second parent must
carry a wild type copy of gene a. Since the
mutation in a is recessive, this gives wild
type gene a function. If the second parent
has a wild type copy of gene a, its own mu-
tation must be in a different gene from a.
Evidently, the mutations carried by the two
parents are in different genes.
The same kind of reasoning applies
to non-complementation. In this case, nei-
ther parent makes up for the deficiency of
the other; evidently they must be deficient in
the same gene. Thus, the general interpreta-
tion of the complementation test is as fol-
lows: if two mutations complement, then
they are likely to lie in different genes; if
two mutations fail to complement, then they
are likely to lie in the same gene.
Note that a complementation test
cannot be carried out with a dominant muta-
tion. In order to determine the gene in
which a dominant mutation lies, it is usually
first necessary to isolate a recessive allele at
the same locus. This is discussed further in
a later section.
In diploid organisms the trans-heterozygote required by the complementa-
tion test is easily constructed by mating to-
gether two single mutant strains. However,
there are other ways of determining the re-
sult of having multiple allelic forms in the
same cell, including methods applicable to
haploid organisms. For example, in bacteria
a so-called merodiploid can be constructed
by putting one copy of the gene being tested
on a plasmid. Upon introduction of the
plasmid, the organism becomes diploid over
just that short segment of the chromosome
carried by the plasmid. This technique is
used in yeast as well. In both bacteria and
yeast complementation is useful in
determining whether a cloned DNA segment
carries the wild type copy of a mutated gene.
If it does, the cloned DNA segment will
complement the mutation when the DNA
segment is introduced into the cell; this is
often termed "complementation rescue".
Complementation rescue is also used to
identify wild type genes in C. elegans, into
which DNA may be introduced by
icroinjection.m
A Complementation Test Sometimes Gives
the "Wrong" Answer
-
7/30/2019 Essentials of Molecular Genetics.pdf
7/47
7
Although the reasoning used above to inter-
pret the complementation test is valid for the
majority of cases, it is not universally appli-
cable. In some instances, the trans-
heterozygote may have a mutant phenotype
even though the two mutations being testedare in different genes. This is called sec-
ond-site non-complementation, or inter-
genic non-complementation (these terms
are equivalent). This can occur due to a
cumulative effect on the trans-heterozygote
of having only one wild type copy of each of
two genes, or of having two mutant alleles,
even though when heterozygous singly mu-
tations in the two genes are recessive.
Likewise, in some instances the
trans-heterozygote may have a wild typephenotype even though the two mutations
are in the same gene. This is known as in-
tragenic complementation. This comes
about if each of the two mutant genes pro-
duces a mutant gene product (as opposed to
no gene product), and the two mutant gene
products, when present in the same cell, can
each supply the deficiency or remedy the
defect of the other. Acting together, the two
mutant gene products provide wild type
gene function. Because of the possibility ofintergenic non-complementation and intra-
genic complementation, the complementa-
tion test is always combined with genetic
mapping to provide a less ambiguous deter-
mination whether two mutations define one
or two genes.
Transmission Genetics
Classical Genetics Defined the Rules Gov-
erning Genetic Transmission
When Mendel, and later Morgan and other
geneticists discovered that there were ge-
netic entities termed genes that could mutate
to different forms, they also discovered how
those genes were transmitted from genera-
tion to generation.
Mendel realized that pea plants car-
ried two copies of each gene. To maintain
this number, each gamete had to contain one
copy. The diploid condition was restored
when two gametes joined at fertilization.
Evidently, during formation of the gametesin the gonad, one of the two copies of each
gene had to be selected to be incorporated
into each sperm cell or egg cell. The separa-
tion of the two alleles during formation of
the gametes is termed segregation.
Mendel wondered how this process
occurred. By studying plants carrying muta-
tions in more than one gene, he determined
that the allelic forms of the two genes un-
derwent independent assortment when
they were segregated to the gametes. Thatis, the particular allelic form of one gene
that went into a gamete did not affect which
allelic form of the other gene went into that
gamete. The result was that in the next gen-
eration of plants new combinations of the
allelic forms could be found in predictable
ratios.
When additional mutations in other
organisms were studied, examples that ap-
peared to violate this rule were soon found.
In those examples, particular allelic forms oftwo different genes tended to stay together
when gametes were formed. Such genes
were said to be linked. After many exam-
ples were studied, it was shown that genes
could be placed into linkage groups. Genes
in one linkage group tended to stay together
in the gametes, and to assort independently
of genes in other linkage groups. The first
genes that Mendel had studied happened all
to fall into different linkage groups.
Cytologists Discovered the Cellular Struc-
tures That Contained the Genes
The foundation of genetics was consolidated
when it was discovered that chromosomes
behaved in the same way that Mendel's hy-
-
7/30/2019 Essentials of Molecular Genetics.pdf
8/47
8
pothetical genes did. At the same time that
geneticists were defining the properties of
the abstract entities they called genes (at the
end of the 19th and beginning of the 20th
centuries), cytologists were discovering the
components of cells visible with a micro-scope. In examining the nucleus, they found
it contained multiple chromosomes ("col-
ored bodies" seen because they accepted
certain stains) present as morphological
pairs. Copies of each pair were faithfully
allocated to daughter cells at cell division in
a process termed mitosis. During develop-
ment of gametes, there was a reduction di-
vision at which only one member of each
pair entered each gamete, in a process simi-
lar to the segregation of Mendels alleles.This unique form of cell division was
termed meiosis. It was further shown that
all of the different pairs of chromosomes
were necessary for normal development of
the organism.
Thus chromosomes were essential
and behaved like genes. However, it was
found that there were many fewer chromo-
somes than there were genetically-definable
genes. Thus each chromosome would have
to be associated with many genes. Eventu-ally it became apparent that the correct cor-
relation was not between genes and chromo-
somes, but between linkage groups and
chromosomes. Organisms had the same
number of chromosome pairs as genetic
linkage groups. Linked genes went together
into gametes because they were present on a
single chromosome, whereas unlinked genes
were on different chromosomes which as-
sorted independently. The two cellular cop-
ies of each chromosome are known as ho-mologs and together constitute a homolo-
gous pair. Each member of a pair generally
carries the same genes, although the allelic
forms of these genes may differ. Thus the
presence in the cell of two homologous
chromosomes corresponds to the diploid ge-
netic condition found by Mendel.
Genetic Recombination between Genes in
Single Linkage Groups Results from Ex-
change of Material between Homologous
Chromosomes
When two marked (mutated) genes are pres-
ent in a genetic cross, there is a possibility
of both parental and non-parentalcombi-
nations of alleles among the gametes. Sup-
pose the two genes a and b are marked in a
cross, such that one parent has the alleles A
and B (genotype AB/AB) and the other par-
ent has the alleles a and b (genotype ab/ab).
The genotypes of all the F1 hybrid progeny
are AB/ab. (In a cross such as this, follow-
ing Mendels nomenclature, the parentalgeneration is known as the Po generation,
and the progeny of the cross constitute the
F1 generation [for first filial generation].
The next generation is the F2 generation,
and so forth.) Let the F1 hybrid be back
crossed to the ab/ab parent. In this back
cross, also known as a test cross, the ab/ab
parent supplies only one type of gamete, ab.
But for the F1 hybrid parent, there are sev-
eral possibilities. The possibilities for the
genotypes of the progeny are AB/ab, ab/ab,
Ab/ab, or aB/ab, where the alleles written
before the slash are from the F1 hybrid par-
ent, and the alleles written after the slash are
from the ab/ab parent. Regarding the alleles
from the F1 hybrid parent, progeny with
genotypes AB/ab and ab/ab are derived from
F1 gametes with the parental (Po) configura-
tions of alleles (AB and ab), whereas prog-
eny with the genotypes Ab/ab and aB/ab are
derived from gametes with non-parental
configurations (Ab and aB). During meiosis
in the F1 hybrid parent, the genes a and b
are said to have recombined to give these
non-parental combinations.
By definition, unlinked genes re-
combine at a frequency of 50%. They are
assorted randomly to the gametes, half of
-
7/30/2019 Essentials of Molecular Genetics.pdf
9/47
9
which get the parental combination and half
of which get the non-parental (recombinant)
combination. Linked genes are genes for
which the frequency of recombination is less
than 50%.
Genes on different chromosomes un-dergo random assortment and hence recom-
bine at a frequency of 50%. Genes on the
same chromosome also recombine. This is
because, during meiosis, homologous chro-
mosomes pair and undergo a physical ex-
change of material. In this way, non-
parental combinations of alleles can be
made even for linked genes. The frequency
of the physical exchange event varies
greatly from organism to organism and from
chromosome to chromosome. It may be sohigh that two genes on the same chromo-
some become genetically unlinked, assorting
randomly. (If the frequency of exchange is
very high, the frequency of genetic recom-
bination rises to a maximum of only 50%.
This is because double, quadruple, etc., ex-
change events restore the parental configura-
tion.) At the other extreme, it may be so
low that two genes virtually never recom-
bine and are said to be tightly linked.
The Frequency of Genetic Recombination
Can Be Used to Map Genes on Chromo-
somes
The frequency of physical exchange, and
hence of genetic recombination, between
genes on single chromosomes depends not
only on the organism and chromosome, but
also on the physical distance between the
genes on the chromosome. The probability
of an exchange is higher if the genes are fur-
ther apart, and lower if they are closer to-
gether. This provides the basis for con-
structing a genetic map. By determining
the frequency of the non-parental, that is
recombinant, combination of alleles among
the progeny of a cross, a recombination
frequency is calculated. Genes are then ar-
rayed along a linear map depending on their
recombinational "distances" from each
other.
A genetic map gives the linear order
of genes on a chromosome determined bygenetic studies. Because of the general cor-
relation between the amount of DNA be-
tween two genes and the probability of the
occurrence of an exchange event, the genetic
map resembles the physical array of the
genes along the chromosome. However, the
resemblance is far from perfect. While the
order of the genes should be correct, the
relative distances between them may not
reflect the actual relative physical distances
between them. The probability of exchangeper nucleotide is not constant, and in fact
can vary a great deal from region to region.
Some regions are hotspots of recombination
where exchange occurs frequently, and
likewise there are regions where exchange is
suppressed. Genes on opposite sides of a
hotspot, though physically close together,
will appear far apart on the genetic map.
Genes in regions of little recombination,
though physically far apart, will appear
close together on the genetic map. A physi-cal map displays where genes are physically
located along a chromosome or molecule of
DNA, as determined by molecular as op-
posed to genetic studies. Correlation of ge-
netic maps and physical maps is an impor-
tant component of genome projects, as dis-
cussed further below.
Construction of a Genetic Map Is an Impor-
tant Step in the Definition of Genes
As discussed earlier, mutations can be as-
signed to the same or different genes by a
complementation test. This test rests on the
gene as a unit of biochemical activity. How-
ever, the possibilities of intergenic non-
complementation, and intragenic comple-
-
7/30/2019 Essentials of Molecular Genetics.pdf
10/47
10
mentation make this test not absolutely reli-
able. Additional information can be readily
obtained as to whether two mutations lie in
the same or different genes if they are ge-
netically mapped relative to one another.
Mutations that map to different linkagegroups, or that lie far apart within a single
linkage group, must be in separate genes.
Likewise, mutations that are tightly linked
and have similar phenotypes could well lie
in a single gene even if they complement
each other.
Organisms Being Studied Today
Many organisms are currently being studied
using molecular genetic tech-niques. A few
of the more commonly studied include the
bacterium Escheri-chia coli, the yeast Sac-
charomyces cerevisiae, the nematode Cae-
norhabdites elegans, the fruit fly Droso-
phila melanogaster, the flowering plant
Arabidopsis thaliana, the mouse Mus mus-
culus, and the humanHomo sapiens. These
organisms each have special features that
permit study of important aspects of biol-
ogy. Other organisms are used as well,
often to study some particular problem. For
example, the molecular genetics of the small
tropical aquarium fish, the zebrafish, is be-
ing developed. It is hoped that this organ-
ism will serve as a vertebrate amenable to
the same kind of in-depth analysis as is fo-
cused on Drosophila, C. elegans, and
Arabidopsis. Embryogenesis of the frog
Xenopus is studied because of its large, rap-
idly-developing eggs, while the slime moldDictyostelium serves as a model to study
cell mobility, cell-cell signaling, and pattern
formation. Ciliated protozoans have proven
to be excellent for the analysis of telomeres,
because their macronuclei contain a large
number of small chromosomes.
For many organisms, classical ge-
netic analysis is not possible, because the
sexual cycle is either too long (e.g.Xeno-
pus), non-existent (e.g.Dictyostelium), or
uncontrollable (H. sapiens). This limitation
is becoming less and less of a drawback asan ever-expanding arsenal of molecular ge-
netic techniques is developed for isolating
genes, modifying them in vitro, and placing
them back into the genome.
Some of the special features of the
important organisms follow. E. coli and the
related Salmonella typhimurium were the
first organisms to be studied in molecular
detail and remain the best understood on a
molecular level (although this is changing).
Special advantages are extremely fastgrowth (cells can divide every 20 minutes),
very small genome size, about 1/1000 that
of humans with about 1/10 the number of
genes. Mutations in about 1,500 genes out of
a predicted total of 4,300 are already known.
E. coli lacks a true sexual cycle but the
technology for moving genes between dif-
ferent E. coli strains is very well developed
and is technically simple. E. coli is good for
studying detailed molecular function of pro-
teins.Prokaryotes like E. coli perform
many functions on a molecular level quite
differently from eukaryotes. The eukaryote
S. cerevisiae serves as a useful microorgan-
ism that has many of the advantages ofE.
coli, but with much greater similarity to
higher organisms. S. cerevisiae also has a
sexual cycle and Mendelian genetics. Gene
replacement is simple in yeast and permits
rapid reverse genetic as well as genetic stud-
ies.Though yeast is good for studying
cellular processes, obviously it does not
permit studies of how multicellular organ-
isms develop and function. Two organisms
used to study animal development are C.
elegans and Drosophila (known affection-
ately as worms and flies). Both organisms
-
7/30/2019 Essentials of Molecular Genetics.pdf
11/47
11
are small, develop quickly and boast a large
catalog of developmental mutations and so-
phisticated classical and molecular genetics.
The small plantArabidopsis provides an or-
ganism for studying higher plant develop-
ment.There is great interest in human bi-
ology and the mouse serves as a convenient
and similar (!) mammal. The development
of gene replacement technology for the
mouse means that the role of genes in
mammals can be tested directly. It has be-
come much easier now to create a "knock-
out" mouse or a conditional knock-out
mouse that lacks any gene of interest.
Human genetics offers special
opportunities and difficulties. Unlike theother organisms it is not ethical to ex-
perimentally manipulate humans. On the
other hand, the earth has about 1010 humans
who notice even subtle developmental
problems and often report them to those
aware of genetic diseases (doctors).
Molecular pedigree analysis permits the
study of human genetics.
Genetic Mapping Techniques in VariousOrganisms
While the underlying principles are the
same, the approach taken to mapping muta-
tions and constructing genetic maps varies
from organism to organism. Obviously, the
techniques available to the experimenter for
mapping mutations in yeast, growing as a
colony on a plate, will differ from those
available for mapping human genes. Below
are summarized briefly the steps employedfor various popular experimental eukaryotes.
Techniques employed with bacteria are pre-
sented in the next section.
Yeast Mapping of genes in the yeast Sac-
charomyces cerevisiae generally occurs by
cloning the relevant gene, determining the
DNA sequence of only a short segment, and
comparing that sequence to the yeast geno-
mic database for identical sequences with
known chromosomal locations. When the
cloned gene is not available, the genetic
technique known as tetrad analysis is typi-cally used to determine the map position.
Genetic Mapping in Yeast Tetrad analysis
involves crossing a haploid mutant strain to
a series of tester strains of the opposite mat-
ing type containing marked chromosomes.
Following meiosis, four haploid spores, the
meiotic products of the cross, are contained
as a tetrad within a single ascus, enabling
accurate analysis of a single meiotic event.
The segregation of the mutant phenotype
from markers specific to a given chromo-some can be followed. Distribution of the
mutant gene (x) and a given marker (m) to
different chromosomes or to distant loca-
tions on the same chromosome yields pre-
dominantly random segregation of the two
genes (X, M) within a tetrad, i.e. a tetratype
with XM, Xm, xm and xM progeny. (Even
though yeast chromosomes are small, the
frequency of recombination is compara-
tively high.) If the mutant gene (x) is linked
to the marker (m), then tetrads of the paren-tal ditype are predominant, i.e. Xm, Xm,
xM, xM progeny within a single tetrad.
This analysis is then repeated with strains
containing markers at intervals scattered
along all the 16 yeast chromosomes until
linkage is observed.
For recessive mutations, the mapping
process can be simplified by using strains
carrying marked, unstable chromosomes.
Loss of a specifically marked chromosome
is induced by the cross to the mutant strain.A recessive mutant can exhibit its phenotype
upon loss of the homologous chromosome,
thereby permitting its chromosomal assign-
ment. The location of the mutant gene along
this chromosome can then be determined by
the frequency of its recombination with
known markers along the chromosome.
-
7/30/2019 Essentials of Molecular Genetics.pdf
12/47
12
Mitotic cross-over mapping, result-
ing from reciprocal exchange of genes lo-
cated distally to the cross-over point, is a
rapid method to determine the arm of the
chromosome on which the gene resides and
can be performed in sectored colonies. Thefrequency of cosegregation of genes that are
far apart on the same arm of the chromo-
some is indicative of the localization of the
mutated gene to a defined region of the
chromosome. Fine mapping can be
achieved by meiotic mapping (tetrad analy-
sis) with markers known to reside in the vi-
cinity of this chromosomal region.
Caenorhabditis elegans The nematode C.
elegans has six linkage groups, all of aboutthe same size. There are two sexes: her-
maphrodites and males. Hermaphrodites are
morphologically similar to females, but
make sperm as well as oocytes. They can
fertilize their own eggs internally, or they
can be fertilized by males. Hermaphrodites
are XX, males are XO, and there are five
pairs of autosomes. Genetic analysis in C.
elegans is greatly aided by the possibility of
storing frozen mutant stocks indefinitely in
liquid nitrogen refrigerators.Genetics in C. elegans is somewhat
unusual in having the possibility of examin-
ing the self progeny of a single hermaphro-
dite. This simplifies certain operations. In
general, genetic mapping in C. elegans con-
sists of constructing a hermaphrodite het-
erozygous for mutations of interest, and then
observing the self progeny of that hermaph-
rodite for recombination between the muta-
tions.
To map a new mutation, first thelinkage group containing the mutation is de-
termined. This is done by determining its
linkage to known marker mutations. First, a
hermaphrodite is constructed that is het-
erozygous for the mutation of interest and a
morphological or behavioral mutation of
known linkage. For example, a male carry-
ing the new mutation may be mated to a
marked hermaphrodite. The heterozygous
hermaphrodite cross progeny are then al-
lowed to self. The frequency with which the
double homozygote is present among the
self progeny reveals whether the two muta-tions are linked. If they are unlinked (mean-
ingprobably on different chromosomes) the
frequency of the double homozygote is 1/16
(1/4 of the animals homozygous for the
marker mutation will also be homozygous
for the unknown mutation). If the two muta-
tions are linked, the frequency of the double
homozygote is much lower. This test is car-
ried out with markers for each of the six
linkage groups until linkage is found.
Once the linkage group of the newmutation is known, its position on the link-
age group is determined. In a three factor
cross, segregation from a hermaphrodite
carrying two known mutations on one chro-
mosome and the unknown mutation on the
homologous chromosome is analyzed.
Animals carrying a chromosome recombi-
nant for the known mutations are isolated,
and the presence or absence of the unknown
mutation on the recombinant chromosome is
established. In this way, the location of theunknown mutation is determined to be to the
left of, inside of, or to the right of the inter-
val defined by the known mutations. If it
lies inside the interval, its position within
the interval can be determined from the ra-
tios of genotypes among the recombinants.
In a two factor cross, the recombi-
nation distance between the new mutation
and a known mutation is determined. This
is done by analyzing the frequency of re-
combinants among the progeny of a her-maphrodite that is heterozygous for a cis-
double mutant chromosome, that is, a
chromosome bearing both the unknown
mutation and a known mutation. The cis-
double is conveniently obtained as a
segregant from a three-factor cross.
-
7/30/2019 Essentials of Molecular Genetics.pdf
13/47
13
It is possible to determine the genetic
map position of a cloned gene or other DNA
segment by taking advantage of the C. ele-
gans physical map. A set of overlapping
cosmid and YAC clones is available cover-
ing the entire C. elegans genome. YACgrids are available consisting of a single
nitrocellulose filter onto which DNA of a
representative set of YAC clones has been
spotted, in order, representing the six C. ele-
gans chromosomes. The DNA fragment to
be mapped is labelled and hybridized to this
filter, and the subset of overlapping YACs
to which it hybridizes reveals its genetic lo-
cation. The physical position of the DNA
may be further refined by locating its posi-
tion on available cosmids. Its genetic func-tion may be determined in a transgenic ani-
mal constructed by microinjection of the
DNA.
Drosophila melanogaster D. melanogas-
terhas only 4 pairs of chromosomes: 1st (or
X), 2nd, 3rd, and 4th. Determining where
on a linkage group a gene maps is not usu-
ally difficult. Crosses with known markers
are employed and linkage or independent
assortment observed among progeny. Anunusual feature is the lack of meiotic re-
combination in males. In practice this sim-
plifies genetic mapping, because one can
breed a mutation only from the male parent
and be certain no recombination has oc-
curred, or from the female parent and be cer-
tain all the recombination occurred in one
generation.
Successful freezing and thawing of
Drosophila is only just being developed, and
most mutations are maintained in continuousculture. Special chromosomes called Bal-
ancer chromosomes have been developed,
which suppress recombination and chromo-
some segregation such that the progeny are
always genetically identical to their parents.
One very important feature is the
giant polytene chromosomes of the larval
salivary gland cells. These are thousands of
times larger than normal chromosomes and
make it routine to see chromosome rear-
rangements under the microscope. Labelled
DNA probes can easily be hybridized to the
polytene chromosomes and this allows de-termination of the position of a cloned se-
quence in the genome within a day.
Mouse Gene mapping in the mouse may
be carried out by the use of three different
test populations. These are (1) conventional
crosses, i.e., backcross (F1 x parent) or F2
(F1 x F1) populations, (2) recombinant-
inbred (RI) strains, or (3) interspecific
backcrosses (ISB).
If the gene has not been cloned andone must rely on a phenotype demonstrable
only in protein gels, cells, or individual
mice, mapping can be extremely tedious. If
the phenotypic differences occur among
mice of different inbred strains, particularly
those involved in RI strains or ISB's (see
below), then all three of the types of test
populations may be usable for mapping pur-
poses. If, however, the mutation is a newly
detected one present only in progeny of the
original mutant mouse, and if there are nohints of map location from existing experi-
mental data, all options for mapping can be
extremely costly in time and research funds.
If highly specific DNA probes are
available for the gene to be mapped, the first
step is to seek a restriction enzyme that re-
veals a RFLP (restriction fragment length
polymorphism) in tests with genomic DNA
from mice of various inbred strains. This
RFLP should permit the use of one or more
of the three approaches.If no RFLP can be identified, it is
possible to analyze a set of clones of inter-
specific hybrid cells. Progeny of the fusion
of a hamster and a mouse cell begin with
complete chromosomal complements from
both parents, but they gradually lose most of
the mouse chromosomes. There exist sets of
-
7/30/2019 Essentials of Molecular Genetics.pdf
14/47
14
clones derived from such hamster/mouse
hybrids in which each clone retains only one
or two mouse chromosomes. The probe will
hybridize only with DNA from clones with
the mouse chromosome bearing the gene to
be mapped, and the gene is said to be syn-tenic with that chromosome. Note that the
homologous hamster gene may also hybrid-
ize with the probe, but it will usually pro-
duce a restriction fragment of a different
size from that of the mouse fragment. While
synteny can usually be established in this
way, it is only rarely possible to place the
gene on a particular portion of the chromo-
some by this method, e.g., when one of the
clones contains a chromosome with a trans-
location.A more virtuosic method is physical
mapping by hybridization of a radioactive
probe to spreads of banded chromosomes.
This procedure allows identification of the
chromosome carrying the gene and gives a
rough indication of its position on that chro-
mosome. The procedure is much more
difficult than with Drosophila, and less ac-
curate, because mouse chromosomes are not
polytene and have fewer bands.
Conventional crosses.Backcross: P1 (AB/AB) x P2 (ab/ab)
F1 (AB/ab); F1 x P2 results in 4 pheno-
typic combinations, AB, Ab, aB and ab in
frequencies ranging from 1:1:1:1 (no link-
age) to 2:0:0:2 (tight linkage); the recom-
bination frequency is the percentage of
mice with recombinant phenotypes (Ab
and aB) in the total backcross population.
F2: P1 (Ab/Ab) x P2 (aB/aB) F1
(Ab/aB); F1 x F1 results in the same
four phenotypic combinations in frequen-cies ranging from 9:3:3:1 (no linkage) to
0:8:8:0 (tight linkage); the recombination
frequency is still a function of these ratios,
but they must be converted into the re-
combination frequency using mathemati-
cal formulae.
RI strains. Several sets of RI
strains are available. Existing RI sets have
been typed for an enormous number of al-
lelic differences. Careful comparison of
the strain distribution patterns (SDP)
for the gene to be mapped with otherknown SDP's often produces a quite pre-
cise ordering of the gene with respect to
nearby genes.
ISB. In any given chromosomal
segment, RFLP's and other types of DNA
polymorphisms are more likely to occur be-
tween individuals of different species than
between individuals of the same species.
Although interspecies hybrids are often ster-
ile, hybrid females from crosses of the labo-
ratory mouse Mus musculus) and a relatedspecies (M. spretus) are fertile, and back-
crosses of the hybrid to M. musculus males
can readily be obtained in large numbers.
There exist sets of genomic DNAs from
each of>100 individual mice of such an ISB
that have already been typed for many DNA
polymorphisms. If it has not been possible
to identify a suitable polymorphism among
mice of different inbred strains, chances are
good that one can be found between the two
mouse species. Testing these DNAs withthe new RFLP makes it possible to compare
its segregation pattern among the ISB DNAs
with those of other markers in essentially the
same manner used for RI strains.
Humans A major effort, the Human Ge-
nome Project, was undertaken to obtain de-
tailed physical and genetic maps and the
complete nucleotide sequence of the human
genome. Analysis and annotation of this
sequence will eventually identify all of theestimated 50,000 human genes. Such an
accomplishment will enhance investigators'
ability to isolate distinct genes, particularly
those in which mutations are responsible for
human diseases. Many of the techniques
described above for physical and genetic
mapping in lower organisms are applicable
-
7/30/2019 Essentials of Molecular Genetics.pdf
15/47
15
to humans, with the obvious exception of
experimental crosses. Traditionally, human
genes have been cloned by isolating the en-
coded protein and using this information to
screen libraries with antibodies or oligonu-
cleotide probes. When cloned nucleic acidprobes are then available, standard ap-
proaches toward physical mapping may be
carried out, including somatic cell hybrid
analyses and more recently, in situ hybridi-
zation techniques.
Genetic mapping, as in other organ-
isms, relies upon the frequency of recombi-
nation between various genetic loci, i.e., ge-
netic linkage analysis. The human genome
comprises approximately 3000 centiMor-
gans, where 1 cM is defined as the geneticlength over which one observes recombina-
tion 1% of the time. Assuming a haploid
genome of ~3 x 109 bp, 1 cM corresponds to
approximately 1 million base pairs. A ge-
netic linkage map allows one to clone genes
by virtue of a distinct phenotype or trait re-
sulting from a mutation, even if nothing at
all is known about the protein encoded by
the gene. As opposed to physical mapping,
this latter approach requires only that the
phenotype be linked to some polymorphicmarker, a technique known as positional
cloning.
As in lower organisms, the creation
of a useful genetic linkage map depends
upon the existence of polymorphic loci dis-
tributed throughout the genome. Histori-
cally, the first polymorphisms which pro-
vided a suitable approach for large scale ge-
netic mapping in humans were based upon
restriction fragment length polymorphisms
(RFLPs). However, RFLPs are not foundwith sufficient frequency to saturate the hu-
man genome. More recently, other types of
polymorphisms have become popular, in-
cluding mini-satellite DNAs or variable
number of tandem repeats (VNTRs), and
micro-satellites, particularly "CA" repeats.
Regions of DNA containing (CA)n, where
the number of repeats (n) is highly polymor-
phic, are dispersed throughout the genome.
These show a high degree of heterozygosity
and are inherited in typical Mendelian fash-
ion. By identifying the sequences which
flank various "CA" repeats, polymerasechain reaction (PCR) primers can be de-
signed which amplify fragments of differing
sizes, depending upon "n". The number of
PCR primer pairs which uniquely amplify
distinct "CA" repeats is constantly growing.
Using these and other polymorphic loci dis-
tributed throughout the genome, a highly
detailed genetic linkage map of the human
genome is being compiled. There are now
several thousand such highly polymorphic
loci which are distributed throughout thehuman genome, with markers spaced at less
than 5 cM. Thus, finding tight linkage be-
tween a phenotypic trait and some "CA" re-
peat or other polymorphic locus becomes
increasingly more probable.
In addition to identifying polymor-
phic loci, PCR primers that amplify distinct
segments of genomic DNA also provide an
approach to physical mapping and eventu-
ally isolation of the gene of interest. Once a
PCR primer pair is found which identifies apolymorphism that is tightly linked to the
phenotype of interest, genomic DNA librar-
ies can be screened using the same PCR
primer pair. Clones which are identified by
definition contain genomic DNA which is
also tightly linked to the gene of interest.
There are now several methods which per-
mit isolation of genes or parts of genes from
genomic DNA. Most prominent among
these methods are conventional methods of
screening cDNA libraries and newer meth-ods such as cDNA selection by affinity hy-
bridization and exon-amplification. Each of
the genes identified in this fashion would be
considered a candidate for the disease locus
being studied. Based upon the properties of
various candidate genes, such as their pat-
terns of expression, the nature of the en-
-
7/30/2019 Essentials of Molecular Genetics.pdf
16/47
16
coded proteins, or identifiable mutations,
where the phenotype is some disease state,
the gene of interest can be unambiguously
identified.
As the density of genetic and physi-
cal markers increases, maps which incorpo-rate all types of markers (integrated maps)
are emerging. These maps are facilitating
isolation of genes for diseases which are in-
herited in a simple Mendelian fashion.
These maps are expected to help in identify-
ing genes in complex human genetic dis-
eases.
The Physical Characteristics of Genomes
Genomes Consist of DNA Molecules, andVary Widely in Size
The holy grail of classical geneticists was to
understand the physical structure of a gene
and how this structure allowed it to carry out
its two functions: to determine the character-
istics of the organism and to transmit those
characteristics to the next generation. By
the time the molecular structure of the ge-
netic molecule, DNA, was determined by
Watson and Crick in 1952, so much wasknown about the properties of genes from
classical genetic studies that it was immedi-
ately apparent from the DNA structure how
in a general way these two functions were
carried out: the information for the organism
was present in the form of a code, and the
information was replicated by base pair
complementarity. Since that time many bi-
ologists have concentrated their efforts on
determining the precise code for particular
organisms, and on understanding how thecode is read out, implemented, and transmit-
ted.
The total information for an organ-
ism is contained in its genome, comprising
the nuclear and plastid (mitochondrial, cho-
roplast) chromosomes. Each nuclear
chromosome consists of a single DNA
molecule held within a protein scaffolding.
held within a protein scaffolding. There
may be from one to over a hundred chromo-
somes in the nucleus, depending on the or-
ganism. The genomes of organisms that live
independently range in size from approxi-
mately 106
base pairs in bacteria to over 1011
base pairs in some amphibians. The ge-
nomes of "quasi" organisms, such as viruses,
that utilize the cellular machinery of another
organism to replicate, can be much smaller.
Small viral genomes, such as those of retro-
viruses, certain tumor viruses such as SV40,
or bacteriophages such as X174 are as
small as 5,000 base pairs. Some biologists
even view transposable elements as a kind
of organism, termed "selfish DNA", that
perpetuate their own existence within hostgenomes. Transposable elements may be as
short as 1,000 bases and encode just a single
gene.
Bacterial Genomes Contain Some 4300
Genes, Higher Organisms May Have As
Many As 30,000 or More
Because genes may be tightly
packed, even overlapping, on the DNA
molecule, or widely separated by "junk" se-quences, the number of genes in the genome
does not necessarily correlate with the
amount of DNA. In general, genes are more
densely packed in the genomes of prokaryo-
tes than in those of eukaryotes. The bacte-
riophage genome, 50,000 base pairs, con-
tains some 50 genes, or about one gene per
1000 base pairs (kilobase pairs, or kb).
The determination of the complete nucleo-
tide sequence of the chromosome, in 1982,
was a landmark achievement in the analysisof genome structure. The frequency of
genes in the E. coli genome, which is esti-
mated to contain 4,300 genes in 4.7 mb of
DNA, is somewhat lower.
The number of genes in the genomes
of higher organisms was been the subject of
much debate and speculation. Prior to the
-
7/30/2019 Essentials of Molecular Genetics.pdf
17/47
17
direct sequencing of large genomes, two ap-
proaches were taken to estimate the gene
number. A genetic approach was to esti-
mate the total number of genes from the fre-
quency of lethal mutations obtained upon
mutagenesis. Estimates obtained by this ap-proach were too low for at least two reasons:
many genes may encode non-essential prod-
ucts, and many essential products may be
redundantly encoded.
A biochemical approach to deter-
mine the number of genes measured the rate
of annealing of mRNA to DNA. By this
method it is possible to make an estimate of
the complexity of the mixture. Complexity
is defined as the total number of non-
repeating sequences within a mixture of nu-cleic acids. By means of such kinetic meas-
urements, it was estimated that mammalian
genomes contained some tens of thousands
of expressed genes.
Genome Projects
Current Methods Make the Sequencing of
Whole Genomes Possible
Until the mid-1990s, the genetic and physi-cal study of the genetic make-up of organ-
isms proceeded in a piecemeal fashion.
Genes and genetic loci were studied one at a
time, as they became relevant to a particular
research topic or project. However, with the
advent of cloning vectors that can contain
much larger inserts of intact chromosomes
and improved sequencing technologies, the
sequencing of entire genomes has become
feasible. In this approach, the complete
DNA sequence of an organism is deter-mined, and all of the genes potentially iden-
tified by computer analysis of the sequence.
By carrying out all of the cloning and se-
quencing at once in a unified project, the
goal of obtaining a complete sequence oc-
curs much more rapidly, allowing investiga-
tors to concentrate on analyzing their inte-
grated function in the life of the organism.
Construction of Physical Maps: Overlap-
ping Clones.
The complete DNA sequence of genomes is
obtained by the automated determination
and analysis of vast quantities of DNA se-
quence. However, before this can be done,
the genome must first be obtained in frag-
ments of sequencable length (a few hundred
to a few thousand base pairs) whose rela-
tionship to one another is known. For this
purpose, a complete physical map is con-
structed. This consists of overlapping
cloned fragments of the genome, usually incosmid, P1, BAC, or YAC vectors. As such
a map is being constructed, overlapping
groups of contiguous fragments, termed
contigs, are built up. Contigs are progres-
sively joined to each other as more and more
fragments are mapped, until, when the
physical map is completed, the number of
contigs equals the number of chromosomes.
The Physical Map Is Correlated with theGenetic Map
The availability of complete genomic se-
quences allows correlation of the physical
sequence with genetic markers and the use
of mutants to understand the function of the
sequence. Physical and genetic maps are
correlated in several ways. Physical se-
quences that differ between strains (or in
lineages, e.g. of humans), resulting in "re-
striction fragment length poly-morphisms" (RFLPs), are genetically
mapped in the same way that any other ge-
netic difference is mapped. (An RFLP re-
sults whenever a particular, detectable (eg.
by hybridization to a probe) restriction
fragment differs in size between two organ-
isms. This can come about because one or
-
7/30/2019 Essentials of Molecular Genetics.pdf
18/47
18
both of the restriction sites that define the
fragment are mutated, because a new restric-
tion site arises between them, because DNA
has been deleted or inserted between the two
sites, or because some other rearrangement
has separated them.) Cloned DNA frag-ments may also be mapped to chromosomes
and parts of chromosomes by in situ hy-
bridization techniques. This approach is
particularly powerful in Drosophila, where
the genetic map is already correlated in de-
tail with the polytene chromosome banding
pattern. A third approach is the identifica-
tion of functional genes on cloned DNA
fragments by the complementation of known
mutations (complementation rescue). This
approach is particularly powerful in organ-isms that are easily transformed, such as
yeast and C. elegans.
Eukaryotic Genomes Contain a Large
Amount of Repetitive DNA.
In spite of the fact that eukaryotic genomes
may have more genes than originally
thought, it remains true that these genomes
contain a great deal of non-coding sequence.
Some of this "extra DNA" appears simply to
be non-functional unique sequence.
Unique sequence is DNA sequence that oc-
curs only once in the haploid genome.
Some extra DNA is accounted for by in-
trons. Other sequences make up distinct
classes ofrepetitive DNA. Repetitive DNA
is DNA sequence present more than once
per haploid genome. Repetitive DNA can
make up anywhere from a small fraction to a
majority of the genomic DNA of eukaryotic
organisms. Typically it represents some
20% to 50%.
The first evidence that genomes con-
tained DNA apart from unique sequences
came from analysis ofreannealing kinetics.
When genomic DNA was denatured (e.g. by
heating) to cause the strands to separate, and
then allowed to reanneal to the double-
stranded form (at a lower temperature), the
rate of reannealing was not consistent with a
single kinetic component.
When double-stranded DNA is dena-
tured and renatured, the rate of reannealing,like the rate of other bimolecular reactions,
is dependent on concentration. In the case
of double-stranded DNA, the relevant con-
centration is the concentration ofsimilar or
identical DNA sequences, since only these
can interact to anneal. The concentration of
a pair of similar sequences in a mixture of
nucleic acids depends on the complexity of
the mixture, that is, the number ofdifferent
sequences in the mixture. When the kinetics
of renaturation of eukaryotic genomic DNAwas measured, it was found that much of the
DNA reannealed at a rate higher than ex-
pected for unique sequences. This indicated
that these sequences were repeated within
the genome. In fact, there were several ki-
netic components, indicating sequences pre-
sent from 10 times to millions of times in
the genome. This kind of kinetic analysis is
called Cotanalysis, because the kinetic data
were typically presented in a plot of percent
DNA annealed versus the product of theDNA concentration (Co) and time of anneal-
ing (T).
There Are Several Kinds of Repeated Se-
quences
The fastest kinetic component in a
Cot analysis of eukaryotic DNA typically
annealed essentially instantaneously, and in
a concentration-independent manner. This
component consists of inverted repeats.
These are similar sequences joined close
together and in inverted orientation, so that
they reanneal in a so-called "snap-back" or
"foldback" reaction. Inverted repeats are
often members of other repetitive sequence
-
7/30/2019 Essentials of Molecular Genetics.pdf
19/47
19
families elsewhere present as isolated re-
peats.
The second fastest kinetic compo-
nent consists of sequences present millions
of times in the genome. These are simple
sequences consisting of long stretches of ashort repeat, such as ...ATATATATAT...
(from crab) or ...AAGAGAAGAG... (from
Drosphila). Such sequences are also known
as satellite sequences. This stems from
their behavior during density analysis of
DNA. When the density of eukaryotic DNA
is analyzed by buoyant density centrifuga-
tion in a CsCl density gradient, it is found to
have several components of different den-
sity. The gradient profile consists ofmain
band DNA, containing the unique se-quences, including most of the genes, and
satellite bands, so-called because they lie
along side the main band on the profile. The
anomalous, repetitive structure of the simple
sequence DNA accounts for its variant
buoyant density. In some eukaryotic ge-
nomes there is little or no satellite or simple-
sequence DNA, whereas in others such se-
quences may make up over 50% of the total.
The function of satellite DNA is not known.
Speculation focuses on a possible role dur-ing pairing of homologous chromosomes.
The next slowest kinetic component,
lying between the satellite sequences and the
unique sequences in rate of annealing, con-
sists of the so-called middle repetitive se-
quences. There are a great variety of such
sequences. Some are genes present in mul-
tiple copies in the genome. Genes for com-
mon cellular components such as ribosomal
RNA or histone proteins are often present in
multiple copies. The multiple copies maybe dispersed in the genome, or may be pres-
ent in tandem arrays at a single locus.
Non-functional, corrupted (mu-tated)
copies of genes, called pseudogenes, make
up another component of the middle repeti-
tive DNA. These sequences may have
arisen in a duplication event, or by reverse
transcription of an RNA copy of the gene,
followed by insertion of the DNA copy into
the genome. DNA copies of mRNA's,
known as processed pseudogenes, are char-
acterized by the presence of polyA tails and
absence of introns. This indicates their ori-gin from reverse transcription of cellular
mRNA, followed by insertion of the DNA
copy into the genome. Between 5% and
10% of the human genome is made up of a
large pseudogene family known as the Alu
family (named after the restriction enzyme,
AluI, that was first used to identify it).
These 300 bp repeats, present hundreds of
thousands of times in the genome, probably
originated as DNA copies of the short cellu-
lar RNA known as 7SL RNA. 7SL RNAfunctions normally as a component of the
cellular mechanism that translocates newly
synthesized proteins across membranes of
the rough endoplasmic reticulum. Short re-
peats such as the Alu repeats have been
dubbed SINES, for "short, interspersed se-
quences".
Other families of middle repetitive
sequences consist of transposable elements.
These are present in all genomes and have a
great variety of structures and modes oftransposition. They make up the LINES, or
"long interspersed sequences", in mammal-
ian genomes, and are in some cases related
to the genomes of retroviruses. Endogenous
retroviral genomes themselves are another
component of the middle repetitive DNA of
mammals. Transposable elements make up
some 20% of theDrosophila genome.
Finally, there is a class of middle
repetitive sequences that so far have eluded
explanation. These sequences typically con-sist of a few hundred base pairs, interspersed
among other sequences around the genome.
They have been given the general name in-
terspersed repeats. They make up families
of anywhere from a few to hundreds of
thousands of members, and there are typi-
cally hundreds to thousands of families in
-
7/30/2019 Essentials of Molecular Genetics.pdf
20/47
20
eukaryotic genomes. They usually account
for a large proportion of the middle repeti-
tive DNA. In spite of their prevalence and
ubiquity, the origin and function of these
interspersed repeats remains a mystery.
While they certainly have an origin, the sus-picion is that they have no function. They
are the ultimatejunk DNA.
Maintenance and transmission of the ge-
netic material
Special Sequences Control the Replication
and Transmission of the Genetic Material
Most organisms use DNA as their genetic
material. The exceptions are some virusesthat use RNA. The symmetry of DNA per-
mits replication by polymerases to create
two exact copies of the genetic material.
One mechanism of replication involves ini-
tiation of synthesis at a single point, the ori-
gin of replication, and replication to com-
pletion. Many bacteria, plasmids and vi-
ruses replicate in this fashion. Another
mechanism involves initiation of DNA
synthesis at many points on the genome and
synthesis until the replication forks meet.There may or may not be origins of replica-
tion that are used during every round of rep-
lication. Eukaryotes use multiple origins on
a single DNA molecule. Also eukaryotes
have linear genomes which require ends
with special structures, called telomeres,
both for protection of the DNA, and to per-
mit the end to be correctly replicated.
Telomeres have a unique physical structure
that includes multiple short DNA repeats
with nicks and a capping hairpin structure.Once a genome has been replicated,
each copy must be accurately partitioned
into the two daughter cells. For the bacterial
circular genome and for some plasmids this
is accomplished by having a partition se-
quence in the DNA near the origin of repli-
cation. These sequences attach to regions of
the cell wall that grow apart during cell divi-
sion, dragging the two newly replicated ge-
nomes apart. For some plasmids and for the
plasmid-like DNA of mitochondria and
chloroplasts the genome is maintained inmultiple copies and the cell depends at least
partly on statistics to ensure that each
daughter cell or organelle gets at least one
copy of the genome. Other mechanisms
then ensure amplification of the genome.
Eukaryotes generally have their ge-
nomes distributed on several chromo-somes
and thus have special problems in assuring
that each daughter cell gets exactly the right
set of chromosomes after replication. A
special structure, the centromere, and at-tached cytoskeletal machinery, the mitotic
apparatus (mitotic spindle), ensure accu-
rate segregation of chromosomes. During
meiosis, in which a diploid cell undergoes
reductive divisions to yield haploid cells,
synapsis, or pairing of homologous chromo-
somes, and a unique meiotic apparatus are
required to ensure that haploid gametes get
exactly one of each chromosome.
Enzymatic Mechanisms Repair DNA Dam-
age and Recombine the DNA Strands
As the genetic material, DNA is pre-
cious and must be protected from damage.
Ultraviolet light, ionizing radiation, and
DNA modifying chemicals can damage
DNA. Many mechanisms exist to repair
damage that occurs. Excision repair path-
ways exploit the fact that two copies of ge-
netic information are stored in the two
strands of DNA. Damaged bases can be re-
moved on one strand and then recopied from
the other. Recombinational repair mecha-
nisms work by shuffling damaged and un-
damaged segments that are present in more
than one copy in the cell to try to put to-
gether a 'good' genome.
-
7/30/2019 Essentials of Molecular Genetics.pdf
21/47
21
Even in the absence of detectable
DNA damage DNA sequences may 'recom-
bine'. Homologous recombination is at the
heart of both classical genetics and modern
"gene-targeting". The mechanism of such
recombination or cross-over events is con-troversial and probably varies according to
the organism, but involves breaks in DNA,
unwinding of strands, hybridization to ho-
mologous segments of DNA and new DNA
synthesis and endonuclease strand cleavage.
The net result is equivalent to a physical
cleavage of DNA and rejoining to a different
partner. Some transposons and viruses cata-
lyze recombination events that involve spe-
cific DNA sequences that may not be ho-
mologous (or only for a few bases).
Recombinant DNA and the Construc-tion
of Transgenic Organisms
Genes May Be Amplified in Pure Form by
"Cloning" Them in Microorganisms.
Early genetics was dependent on naturally
occurring mechanisms for the study of ge-
netic function. In the 1970's techniques
were developed to manipulate DNA in vitroand move it across species boundaries.
These cloning techniques rely on enzymes
that work on DNA. Restriction endonucle-
ases (commonly called restriction en-
zymes) cut DNA at specific sequences, of-
ten palindromic sequences. (A "palin-
drome" is a word or sentence that reads the
same forwards or backwards, like "A man, a
plan, a canal, Panama.", or "Madam, I'm
Adam.".) For example the restriction en-
zyme BamHI cuts at GGATCC. BamHI iscalled a 6-cutter because its recognition se-
quence is six bases long. On average one
expects a specific six-base sequence like
GGATCC to occur once every 4Kb of DNA,
but of course some fragments are much big-
ger or smaller. Furthermore, the average
size depends on the GC/AT content of the
DNA being cut, and the relative numbers of
G or C vs A or T nucleotides in the restric-
tion site. The size of DNA fragments can be
determined on agarose gels. About 150 re-
striction enzymes with different recognition
sequences are available commercially. Theposition of restriction sites in a piece of
DNA can be determined, giving a restric-
tion map useful for subsequent manipula-
tions. Fragments of DNA can be joined to
one another by another enzyme, DNA li-
gase. Together the ability to cut DNA,
separate fragments by size, and then rejoin
them in a new combination in vitro, forms
the basis for the powerful cloning technolo-
gies.
Though new DNA molecules can bemade in vitro, the yield is usually low.
However, by cloning DNA into a vector
capable of replication, the recombinant
DNA can be amplified in vivo. Further-
more, by placing the recombinant DNA into
a microorganism, a single defined segment
of a large genome can be separated from the
remainder of the genome simply by select-
ing a clone of organisms, like a bacterial
colony or a phage plaque. This is the origin
of the term "cloning". Depending on thevector being used, a variety of methods are
then available for separating the vector plus
insert from the host microorganism's own
DNA.
Common sources of vector DNA are
viruses and plasmids that are capable of rep-
lication inE. coli. E coli is a useful host for
amplifying DNA since it is easy to grow to
high density (2 X 10-9 cells/ml) and has rela-
tively little DNA of its own. Some virus
vectors (e.g. the filamentous phage M13)only infect certain stains ofE. coli. Vectors
such as yeast YACs and mammalian retrovi-
ral expression vectors are shuttle vectors
that can replicate in bothE. coli and eukary-
otic cells. Often bulk recombinant DNA is
made inE. coli and an experiment is done in
another organism.
-
7/30/2019 Essentials of Molecular Genetics.pdf
22/47
22
An example of vector cloning fol-
lows. The bacteriophage has a genome of
about 50kb. A region containing about 1/3
of the genome serves no function (the so-
called financial district) and can be replaced
by other DNA including E. coli or foreignDNA. These clones can replicate just like
the original bacterial virus, but now when-
ever they duplicate they also duplicate the
inserted DNA.
For the most part DNA is DNA and
can be moved from organism to organism
without problems. However there are three
common problems in transferring DNA that
we will discuss briefly. 1) DNA can be
modified in ways that affect function. 2)
DNA can contain sequences that get rear-ranged in some organisms. 3) A cloned se-
quence may make a protein toxic to some
cells.
DNA modifications are common and
include sequence specific methylation of
bases. These methylations can affect gene
function and resistance to digestion with
specific restriction enzymes. Strains ofE.
coli that lack many of the offending DNA
methylases (e.g. mcrA) have been con-
structed. Also, some strains ofE. coli makerestriction enzymes that destroy unmodified
DNA. Take care! Often a specific cloning
project requires a specific host strain that
modifies or does not modify DNA (see the
New England Biolabs catalog or Molecular
Cloning for details).
E. coli does not like DNA containing
short direct repeats or with inverted repeats,
both of which tend to get deleted from
cloned fragments by the very active E. coli
recombination pathway. This can be a prob-lem when cloning eukaryotic DNA in which
such structures are common. E. coli host
vectors that are defective in recombinases
(like recA) are helpful but do not completely
solve the problem. E. coli vectors do not
tolerate more than about 20kb of DNA.
Yeast artificial chromosomes (YACs) are
useful for cloning up to 400kb of DNA. As
the name implies YACs are grown in yeast.
Eukaryotic cells such as yeast are tolerant of
repeated DNA, and hence repetitive se-
quences that cannot be cloned inE. coli can
often be cloned in a YAC.Cloned DNA may express proteins
that kill a specific host. For example, even
though eukaryotic promoters and introns do
not function in E. coli, often a polypeptide
derived from one exon will be expressed in
E. coli. E. coli is especially sensitive to hy-
drophobic proteins that interfere with secre-
tion (a secA strain may tolerate such clones)
and to DNA binding proteins.
The Polymerase Chain Reaction (PCR) Is aWay to "Clone" DNA Directly In Vitro
Instead of amplifying a defined DNA seg-
ment by ligating it to a vector and introduc-
ing it into a microorganism, it is possible to
amplify it enzymatically by the polymerase
chain reaction (PCR). In PCR, the DNA
segment between two short (15 to 30
nucleotides long) single-stranded
oligonucleotide primers is copied by a
primer-dependent DNA polymerase. The
polymerase used is from a thermophilic
bacterium. This makes it possible to carry
out many cycles of synthesis automatically
by alternately heating the reaction mixture
to melt all DNA strands (the polymerase is
not inactivated by the high temperature re-
quired for this), and then cooling it to allow
the primers to anneal and the polymerase to
function by extending them. In each succes-
sive cycle of melting and replication, the
amount of the DNA segment between the
two primers increases exponentially, as the
product of each synthetic round serves as
template in the next.
PCR can be extremely specific and
sensitive. Specificity is provided if each of
the primers anneals to only the single, in-
-
7/30/2019 Essentials of Molecular Genetics.pdf
23/47
23
tended sequence. In 30 cycles of polymeri-
zation, biochemically detect-able and useful
quantities of a sequence from 50 to 5000
bases in length can be amplified from tiny
amounts of complex mixtures, such as the
genomic DNA of vertebrates. The syntheticproduct can subsequently be sequenced,
used as a labeled probe, or cloned for further
in vitro modification.
Genes Are Cloned by Isolating Them from
Clone Libraries or Clone Banks
The fundamental importance of gene clon-
ing is that it allows the purification of a sin-
gle gene out of the thousands or tens of
thousands present in the genomes of com-plex organisms. To accomplish this feat, it
is first necessary to introduce all the genes
of the organism under study into a culture of
microorganisms. The task is then to identify
a clone of the microorganism that contains
the single gene of interest. The mixed cul-
ture of microorganisms is termed a clone
library orclo