essentials of molecular genetics.pdf

7/30/2019 Essentials of Molecular Genetics.pdf

1/47

ESSENTIALS OF MOLECULAR GENETICS

Prepared by Faculty of the Albert Einstein College of Medicine

(September, 1993; revised September, 2002)


2/47

2

CONTENTS

What Is Molecular Genetics?......................................................................................................4

Classical Genetics and the Definition of the Gene ....................................................................4Classical Genetics Defines the Gene by the Study of Mutations ...............................................4

Mutations Can Be Dominant Or Recessive ...............................................................................5

The Complementation Test Identifies the Gene as a Unit of Activity ........................................5

A Complementation Test Sometimes Gives the "Wrong" Answer..............................................6

Transmission Genetics.................................................................................................................7Classical Genetics Defined the Rules Governing Genetic Transmission..................................7

Cytologists Discovered the Cellular Structures That Contained the Genes..............................7

Genetic Recombination between Genes in Single Linkage Groups Results from Exchange of

Material between Homologous Chromosomes ..........................................................................8

The Frequency of Genetic Recombination Can Be Used to Map Genes on Chromosomes ......9

Construction of a Genetic Map Is an Important Step in the Definition of Genes .....................9

Organisms Being Studied Today..............................................................................................10

Genetic Mapping Techniques in Various Organisms.............................................................11

The Physical Characteristics of Genomes ...............................................................................16

Genomes Consist of DNA Molecules, and Vary Widely in Size...............................................16

Bacterial Genomes Contain Some 4300 Genes, Higher Organisms May Have As Many As

30,000 or More ........................................................................................................................16

Genome Projects ........................................................................................................................17Current Methods Make the Sequencing of Whole Genomes Possible.....................................17

Construction of Physical Maps: Overlapping Clones. ............................................................17

The Physical Map Is Correlated with the Genetic Map ..........................................................17

Eukaryotic Genomes Contain a Large Amount of Repetitive DNA. ........................................18

There Are Several Kinds of Repeated Sequences ....................................................................18

Maintenance and transmission of the genetic material..........................................................20

Special Sequences Control the Replication and Transmission of the Genetic Material .........20

Enzymatic Mechanisms Repair DNA Damage and Recombine the DNA Strands ..................20

Recombinant DNA and the Construc-tion of Transgenic Organisms ..................................21Genes May Be Amplified in Pure Form by "Cloning" Them in Microorganisms. ..................21

The Polymerase Chain Reaction (PCR) Is a Way to "Clone" DNA Directly In Vitro.............22

Genes Are Cloned by Isolating Them from Clone Libraries or Clone Banks .........................23

A Variety of Vectors Provide a Range of Options for the Generation of a Clone Library .....23

Clone Libraries May Be Screened in a Number of Ways ........................................................24

Constructing Transgenic Organisms.......................................................................................25

Basic Elements of Bacterial Genetics.......................................................................................26

The Genetics of Bacteria Has Several Unique Features .........................................................26

Bacterial Cells Exchange Genetic Material in a Process Known as Conjugation .................27

The Bacterial Genetic Map Is Defined by the Time of Transfer During Conjugation ............27

The Bacterial Genetic Map and the Bacterial Chromosome Are Circular.............................27

The F Plasmid Encodes Genetic Functions Required for Transfer of DNA............................28

Integration of the F Plasmid into the Bacterial Chromosome Can Result in Mobilization of the

Chromosome for Transfer........................................................................................................28


3/47

3

Plasmids Can Be Used to Construct Partially-Diploid Bacterial Strains...............................28

Plasmids Play an Important Role in the Transmission of Drug Resistance............................28

In Transformation, Bacterial Cells Take Up DNA Directly ....................................................29

Bacterial Viruses Play a Role in Genetic Exchange Between Bacteria ..................................29

Study of Bacteriophages Has Played a Central Role in the Development of Molecular Biology.

..................................................................................................................................................29 Bacterial Viruses May Kill the Host Cell or Coexist with It ...................................................30

Inferring Wild Type Gene Function from Mutant Phenotype..............................................30

To Infer Wild Type Gene Function, It Is First Necessary to Determine How the Mutation

Affects Gene Activity................................................................................................................30

Types of Mutations Are Defined by Structure and by Affects on Gene Activity ......................31

Rare Spontaneous Mutations Are of All Types ........................................................................31

Chemical Mutagens Tend to Induce Point Mutations, Radiation Tends to Produce

Rearrangements .......................................................................................................................31

Null Mutations Are Important in the Determination of the Biological Process in which a Gene

Participates..............................................................................................................................32

In Some Organisms the Null Phenotype Is Best Determined by Gene Knockout ....................33Null Mutations Can Be Identified As Mutations That Behave Genetically Like a Deficiency of

the Gene ...................................................................................................................................33

Null Mutations Have Several Characteristics That Distinguish Them from Non-Null Mutations

..................................................................................................................................................34

New Null Alleles May Be Isolated by a Non-Complementation Screen ..................................34

Hypomorphic Mutations Lower But Do Not Eliminate Gene Activity.....................................34

Gene Activity Is Raised by Hypermor-phic Mutations ............................................................35

Antimorphic Mutations Produce a Poison Gene Product .......................................................35

Neomorphic Mutations Result in a Novel Gene Activity .........................................................36

A Gain-of-function Mutant Phenotype May Be Eliminated by Introducing a Loss-of-function

Mutation at the Same Locus.....................................................................................................36Determining the Time and Place of Gene Action....................................................................36

The Time and Location of Gene Expression Can Be Determined by a Number of Biochemical

Means.......................................................................................................................................36

Reporter Genes Provide a Sensitive and Versatile Assay of Gene Expression .......................37

Gene Knockout Frequently Reveals That a Gene's Activity Is Not Required Everywhere It Is

Expressed .................................................................................................................................38

The Tissue Where Gene Activity Is Required May Be Determined by Mosaic Analysis .........38

Gene Product Synthesis and Gene Product Action Need Not Take Place in the Same

Generation ...............................................................................................................................39

Parental Effects May Be Identified by Genetic Tests ..............................................................39

Temperature-Sensitive Mutations Can Be Used to Determine the Time of Gene Action........40Analyzing Complex Processes by Genetics..............................................................................41

Genetic Analysis Allows the Probing of Complex Biological Processes Involving Multiple

Genes........................................................................................................................................41

Some Genes Involved in a Biological Process May Be Identified As Genetic Modifiers........41

Information About the Order of Gene Action in a Pathway Can Be Obtained by Epistasis

Analysis ....................................................................................................................................44


4/47

What Is Molecular Genetics?

Molecular genetics is an approach to under-

standing the functions of genes. It com-

bines classical genetic analysis with molecu-lar biology to probe the nature of both gene

action and gene transmission. The essential

characteristic of molecular genetics is that

gene products are studied through the

genes that encode them. This contrasts

with a biochemical approach, in which the

gene products themselves are purified and

their activities studied in vitro.

All aspects of cell and organismal

structure and function are potentially ame-

nable to a molecular genetic approach. Be-cause genes are similar in all organisms, this

approach has many essential aspects in

common whether the organism being stud-

ied is a bacterium, a fungus, or a mammal.

The purpose of this booklet is to define and

describe these common aspects, and to point

out how they are applied in practice in the

diverse organisms that are being studied to-

day.

Gene cloning, that is, the isolation of

a gene so that its nucleotide sequence maybe determined, is central to molecular genet-

ics. Genes identified through a classical ge-

netic analysis of mutations may be cloned to

ascertain the structure of the gene product

and to permit biochemical studies of gene

activity. Alternatively, genes may be de-

fined first by the biochemical identification

of their gene product. In this case gene

cloning allows the isolation and study of

mutant forms. In either approach, starting

with a mutation or starting with a clonedgene, the techniques of classical genetic

analysis are used to draw conclusions about

gene function from the phenotype of muta-

tions.

In addition to gene function, molecu-

lar genetics is also concerned with the

transmission of the genetic material. Genes

are carried by chromosomes, whose function

is to maintain the integrity of each cell's

complement of genetic information through

cell division, and from one generation to the

next. Chromosomes contain specialized se-

quences whose function is to control chro-mosome replication, recombination, and dis-

tribution to daughter cells. The understand-

ing of such sequences can also be ap-

proached by cloning, sequencing, and the

identification of mutations.

A long term goal of molecular genet-

ics is understand gene function in the con-

text of the life, development, and reproduc-

tion of the individual, as well as the evolu-

tion of the species.

Classical Genetics and the Definition of

the Gene

Classical Genetics Defines the Gene by the

Study of Mutations

Long before it was known that genes con-

sisted of strings of nucleotides that deter-

mined the structure of proteins, it was possi-

ble to infer their existence and many of their

properties. Different forms of genes, calledalleles or mutations, were recognized by

their effects on the phenotype of the organ-

ism, that is, the organism's form and func-

tion. The complete set of allelic forms of an

organism's genes is termed its genotype.

Classical genetic studies involving

crosses between organisms with differing

genotypes and phenotypes, beginning with

Mendel, revealed that higher plants and ani-

mals are diploid, that is, they have two

copies of each gene, one derived from eachparent. Gametes, on the other hand, as well

as the genomes of some higher organisms

and most prokaryotes, have only one copy

of each gene and are said to be haploid.

With respect to a particular gene, a diploid

organism is said to be homozygous if both


5/47

5

copies of the gene are the same, and het-

erozygous if two different allelic forms of

the gene are present. A heterozygote is also

known as a hybrid of the two parental forms.

An otherwise diploid organism is said to be

hemizygous for any gene present in onlyone copy, for example, genes on the X

chromosome ofDrosophila males.

Mutations Can Be Dominant Or Recessive

Since there are usually two copies of

each gene per cell, it is possible to ask what

will be the result if the two copies are differ-

ent. Through the analysis of such heterozy-

gotes, it has been possible to infer a great

deal about the properties of genes and geneproducts. Consider two alleles of a single

gene, a and b. Suppose the homozygote a/a

has the phenotype A, and the homozygote

b/b has the phenotype B. If the a/b hetero-

zygote has the phenotypeA, then a is said to

be dominant with respect to b, and b is said

to be recessive with respect to a. IfA is the

most common phenotype found in nature,

then A is called the wild type, and a is the

wildtype allele. In this case, b would be

considered a recessive mutant allele of the

gene, where the mutant phenotype is only

observed when in homozygous form. How-

ever, the wild type need not be the dominant

form, and it is possible to have mutant forms

that are dominant over wild type. Another

alternative is that the phenotype of a/b is a

mixture ofA andB characteristics, or has an

intermediate phenotype between A and B;

for example, ifA is "large" andB is "small",

the phenotype of the a/b organism might be

"medium sized", or ifA is red and B is

white, the phenotype of the a/b organism

might be pink. In this case, each of the alle-

lic forms is said to be incompletely domi-

nant or semidominant with respect to the

other. If the phenotypes respectively char-

acteristic of each allele are both expressed in

the hybrid, then the two alleles are said to be

codominant. This is the case, for example,

with different allelic forms of blood group

antigens.

The Complementation Test Identifies theGene as a Unit of Activity

In addition to making it possible to deter-

mine whether one allelic form of a gene is

dominant or recessive with respect to an-

other, diploidy makes possible a fundamen-

tal genetic test to determine whether two

mutations with the same or similar pheno-

types are in the same gene: the complemen-

tation test. A determination of the number

of genes involved is essential to begin un-raveling the role of genes in a particular pro-

cess. Suppose, for example, the genetic ba-

sis of fruit fly eye color is being studied. If

wild type fruit fly eyes are red, and two mu-

tant strains of flies have white eyes, it will

be important to know whether the two muta-

tions are in the same gene, or define two

separate genes, both of which are necessary

to make red eyes. It is by means of the

complementation test that the gene as a unit

of function is defined.

In a complementation test, an organ-

ism that is heterozygous in trans for two

mutations with similar phenotypes is con-

structed by genetic crosses, and its pheno-

type is observed. Heterozygous in trans

means that one mutant allele has been ob-

tained from one parent, and the other mutant

allele has been obtained from the other par-

ent. It is necessary that both mutations be

recessive, so that the phenotype of a hetero-

zygote for each mutant allele singly is wild

type. If the trans-heterozygote is also found

to be wild type, then the two mutations are

said to "complement" one another. If the

trans-heterozygote is found to be mutant in

phenotype, then the two mutations are said

to "fail to complement" one another.


6/47

6

GENETIC NOMENCLATURE IN VARIOUS ORGANISMSE. coli yeast C. elegans Drosophila mouse

Phenotype Gal-, Lac+ Ade-, Cdcts Dpy, Unc white agoutiGene galK, lacZ ade2, cdc28 dpy-5 white AAllele

Recessive galK13, lacZ23 ade2-1 dpy-5(bx27) w, wa a, a

bDominant same ADE2-27 dpy-5(bx27d) Ubx A, Ay, Avy

Ts- same dpy-5(bx27ts) wtsWild type not written ADE2 dpy-5(+) w+, Ubx+ +, A+

The table gives one or two examples of a gene or mutation name. Notice that among the differences in usage

between the organsims, there are some consistencies: Phenotypes are written non-italicized, usually three

letters with the first letter only capitalized. Gene names, alleles, genotypes generally, on the other hand, are

italicized. In several systems, capitals denote dominance, small letters recessiveness.

How are these two different results

to be interpreted? If the trans-heterozygote

has wild type phenotype, that is, if the muta-

tions complement one another, this impliesthat the trans-heterozygote has all the ge-

netic functions needed for expression of the

wild type phenotype. In other words, the

chromosomes from each mutant parent

make up for the deficiency present on the

chromosomes of the other. If one parent is

mutant in say gene a, the second parent must

carry a wild type copy of gene a. Since the

mutation in a is recessive, this gives wild

type gene a function. If the second parent

has a wild type copy of gene a, its own mu-

tation must be in a different gene from a.

Evidently, the mutations carried by the two

parents are in different genes.

The same kind of reasoning applies

to non-complementation. In this case, nei-

ther parent makes up for the deficiency of

the other; evidently they must be deficient in

the same gene. Thus, the general interpreta-

tion of the complementation test is as fol-

lows: if two mutations complement, then

they are likely to lie in different genes; if

two mutations fail to complement, then they

are likely to lie in the same gene.

Note that a complementation test

cannot be carried out with a dominant muta-

tion. In order to determine the gene in

which a dominant mutation lies, it is usually

first necessary to isolate a recessive allele at

the same locus. This is discussed further in

a later section.

In diploid organisms the trans-heterozygote required by the complementa-

tion test is easily constructed by mating to-

gether two single mutant strains. However,

there are other ways of determining the re-

sult of having multiple allelic forms in the

same cell, including methods applicable to

haploid organisms. For example, in bacteria

a so-called merodiploid can be constructed

by putting one copy of the gene being tested

on a plasmid. Upon introduction of the

plasmid, the organism becomes diploid over

just that short segment of the chromosome

carried by the plasmid. This technique is

used in yeast as well. In both bacteria and

yeast complementation is useful in

determining whether a cloned DNA segment

carries the wild type copy of a mutated gene.

If it does, the cloned DNA segment will

complement the mutation when the DNA

segment is introduced into the cell; this is

often termed "complementation rescue".

Complementation rescue is also used to

identify wild type genes in C. elegans, into

which DNA may be introduced by

icroinjection.m

A Complementation Test Sometimes Gives

the "Wrong" Answer


7/47

7

Although the reasoning used above to inter-

pret the complementation test is valid for the

majority of cases, it is not universally appli-

cable. In some instances, the trans-

heterozygote may have a mutant phenotype

even though the two mutations being testedare in different genes. This is called sec-

ond-site non-complementation, or inter-

genic non-complementation (these terms

are equivalent). This can occur due to a

cumulative effect on the trans-heterozygote

of having only one wild type copy of each of

two genes, or of having two mutant alleles,

even though when heterozygous singly mu-

tations in the two genes are recessive.

Likewise, in some instances the

trans-heterozygote may have a wild typephenotype even though the two mutations

are in the same gene. This is known as in-

tragenic complementation. This comes

about if each of the two mutant genes pro-

duces a mutant gene product (as opposed to

no gene product), and the two mutant gene

products, when present in the same cell, can

each supply the deficiency or remedy the

defect of the other. Acting together, the two

mutant gene products provide wild type

gene function. Because of the possibility ofintergenic non-complementation and intra-

genic complementation, the complementa-

tion test is always combined with genetic

mapping to provide a less ambiguous deter-

mination whether two mutations define one

or two genes.

Transmission Genetics

Classical Genetics Defined the Rules Gov-

erning Genetic Transmission

When Mendel, and later Morgan and other

geneticists discovered that there were ge-

netic entities termed genes that could mutate

to different forms, they also discovered how

those genes were transmitted from genera-

tion to generation.

Mendel realized that pea plants car-

ried two copies of each gene. To maintain

this number, each gamete had to contain one

copy. The diploid condition was restored

when two gametes joined at fertilization.

Evidently, during formation of the gametesin the gonad, one of the two copies of each

gene had to be selected to be incorporated

into each sperm cell or egg cell. The separa-

tion of the two alleles during formation of

the gametes is termed segregation.

Mendel wondered how this process

occurred. By studying plants carrying muta-

tions in more than one gene, he determined

that the allelic forms of the two genes un-

derwent independent assortment when

they were segregated to the gametes. Thatis, the particular allelic form of one gene

that went into a gamete did not affect which

allelic form of the other gene went into that

gamete. The result was that in the next gen-

eration of plants new combinations of the

allelic forms could be found in predictable

ratios.

When additional mutations in other

organisms were studied, examples that ap-

peared to violate this rule were soon found.

In those examples, particular allelic forms oftwo different genes tended to stay together

when gametes were formed. Such genes

were said to be linked. After many exam-

ples were studied, it was shown that genes

could be placed into linkage groups. Genes

in one linkage group tended to stay together

in the gametes, and to assort independently

of genes in other linkage groups. The first

genes that Mendel had studied happened all

to fall into different linkage groups.

Cytologists Discovered the Cellular Struc-

tures That Contained the Genes

The foundation of genetics was consolidated

when it was discovered that chromosomes

behaved in the same way that Mendel's hy-


8/47

8

pothetical genes did. At the same time that

geneticists were defining the properties of

the abstract entities they called genes (at the

end of the 19th and beginning of the 20th

centuries), cytologists were discovering the

components of cells visible with a micro-scope. In examining the nucleus, they found

it contained multiple chromosomes ("col-

ored bodies" seen because they accepted

certain stains) present as morphological

pairs. Copies of each pair were faithfully

allocated to daughter cells at cell division in

a process termed mitosis. During develop-

ment of gametes, there was a reduction di-

vision at which only one member of each

pair entered each gamete, in a process simi-

lar to the segregation of Mendels alleles.This unique form of cell division was

termed meiosis. It was further shown that

all of the different pairs of chromosomes

were necessary for normal development of

the organism.

Thus chromosomes were essential

and behaved like genes. However, it was

found that there were many fewer chromo-

somes than there were genetically-definable

genes. Thus each chromosome would have

to be associated with many genes. Eventu-ally it became apparent that the correct cor-

relation was not between genes and chromo-

somes, but between linkage groups and

chromosomes. Organisms had the same

number of chromosome pairs as genetic

linkage groups. Linked genes went together

into gametes because they were present on a

single chromosome, whereas unlinked genes

were on different chromosomes which as-

sorted independently. The two cellular cop-

ies of each chromosome are known as ho-mologs and together constitute a homolo-

gous pair. Each member of a pair generally

carries the same genes, although the allelic

forms of these genes may differ. Thus the

presence in the cell of two homologous

chromosomes corresponds to the diploid ge-

netic condition found by Mendel.

Genetic Recombination between Genes in

Single Linkage Groups Results from Ex-

change of Material between Homologous

Chromosomes

When two marked (mutated) genes are pres-

ent in a genetic cross, there is a possibility

of both parental and non-parentalcombi-

nations of alleles among the gametes. Sup-

pose the two genes a and b are marked in a

cross, such that one parent has the alleles A

and B (genotype AB/AB) and the other par-

ent has the alleles a and b (genotype ab/ab).

The genotypes of all the F1 hybrid progeny

are AB/ab. (In a cross such as this, follow-

ing Mendels nomenclature, the parentalgeneration is known as the Po generation,

and the progeny of the cross constitute the

F1 generation [for first filial generation].

The next generation is the F2 generation,

and so forth.) Let the F1 hybrid be back

crossed to the ab/ab parent. In this back

cross, also known as a test cross, the ab/ab

parent supplies only one type of gamete, ab.

But for the F1 hybrid parent, there are sev-

eral possibilities. The possibilities for the

genotypes of the progeny are AB/ab, ab/ab,

Ab/ab, or aB/ab, where the alleles written

before the slash are from the F1 hybrid par-

ent, and the alleles written after the slash are

from the ab/ab parent. Regarding the alleles

from the F1 hybrid parent, progeny with

genotypes AB/ab and ab/ab are derived from

F1 gametes with the parental (Po) configura-

tions of alleles (AB and ab), whereas prog-

eny with the genotypes Ab/ab and aB/ab are

derived from gametes with non-parental

configurations (Ab and aB). During meiosis

in the F1 hybrid parent, the genes a and b

are said to have recombined to give these

non-parental combinations.

By definition, unlinked genes re-

combine at a frequency of 50%. They are

assorted randomly to the gametes, half of


9/47

9

which get the parental combination and half

of which get the non-parental (recombinant)

combination. Linked genes are genes for

which the frequency of recombination is less

than 50%.

Genes on different chromosomes un-dergo random assortment and hence recom-

bine at a frequency of 50%. Genes on the

same chromosome also recombine. This is

because, during meiosis, homologous chro-

mosomes pair and undergo a physical ex-

change of material. In this way, non-

parental combinations of alleles can be

made even for linked genes. The frequency

of the physical exchange event varies

greatly from organism to organism and from

chromosome to chromosome. It may be sohigh that two genes on the same chromo-

some become genetically unlinked, assorting

randomly. (If the frequency of exchange is

very high, the frequency of genetic recom-

bination rises to a maximum of only 50%.

This is because double, quadruple, etc., ex-

change events restore the parental configura-

tion.) At the other extreme, it may be so

low that two genes virtually never recom-

bine and are said to be tightly linked.

The Frequency of Genetic Recombination

Can Be Used to Map Genes on Chromo-

somes

The frequency of physical exchange, and

hence of genetic recombination, between

genes on single chromosomes depends not

only on the organism and chromosome, but

also on the physical distance between the

genes on the chromosome. The probability

of an exchange is higher if the genes are fur-

ther apart, and lower if they are closer to-

gether. This provides the basis for con-

structing a genetic map. By determining

the frequency of the non-parental, that is

recombinant, combination of alleles among

the progeny of a cross, a recombination

frequency is calculated. Genes are then ar-

rayed along a linear map depending on their

recombinational "distances" from each

other.

A genetic map gives the linear order

of genes on a chromosome determined bygenetic studies. Because of the general cor-

relation between the amount of DNA be-

tween two genes and the probability of the

occurrence of an exchange event, the genetic

map resembles the physical array of the

genes along the chromosome. However, the

resemblance is far from perfect. While the

order of the genes should be correct, the

relative distances between them may not

reflect the actual relative physical distances

between them. The probability of exchangeper nucleotide is not constant, and in fact

can vary a great deal from region to region.

Some regions are hotspots of recombination

where exchange occurs frequently, and

likewise there are regions where exchange is

suppressed. Genes on opposite sides of a

hotspot, though physically close together,

will appear far apart on the genetic map.

Genes in regions of little recombination,

though physically far apart, will appear

close together on the genetic map. A physi-cal map displays where genes are physically

located along a chromosome or molecule of

DNA, as determined by molecular as op-

posed to genetic studies. Correlation of ge-

netic maps and physical maps is an impor-

tant component of genome projects, as dis-

cussed further below.

Construction of a Genetic Map Is an Impor-

tant Step in the Definition of Genes

As discussed earlier, mutations can be as-

signed to the same or different genes by a

complementation test. This test rests on the

gene as a unit of biochemical activity. How-

ever, the possibilities of intergenic non-

complementation, and intragenic comple-


10/47

10

mentation make this test not absolutely reli-

able. Additional information can be readily

obtained as to whether two mutations lie in

the same or different genes if they are ge-

netically mapped relative to one another.

Mutations that map to different linkagegroups, or that lie far apart within a single

linkage group, must be in separate genes.

Likewise, mutations that are tightly linked

and have similar phenotypes could well lie

in a single gene even if they complement

each other.

Organisms Being Studied Today

Many organisms are currently being studied

using molecular genetic tech-niques. A few

of the more commonly studied include the

bacterium Escheri-chia coli, the yeast Sac-

charomyces cerevisiae, the nematode Cae-

norhabdites elegans, the fruit fly Droso-

phila melanogaster, the flowering plant

Arabidopsis thaliana, the mouse Mus mus-

culus, and the humanHomo sapiens. These

organisms each have special features that

permit study of important aspects of biol-

ogy. Other organisms are used as well,

often to study some particular problem. For

example, the molecular genetics of the small

tropical aquarium fish, the zebrafish, is be-

ing developed. It is hoped that this organ-

ism will serve as a vertebrate amenable to

the same kind of in-depth analysis as is fo-

cused on Drosophila, C. elegans, and

Arabidopsis. Embryogenesis of the frog

Xenopus is studied because of its large, rap-

idly-developing eggs, while the slime moldDictyostelium serves as a model to study

cell mobility, cell-cell signaling, and pattern

formation. Ciliated protozoans have proven

to be excellent for the analysis of telomeres,

because their macronuclei contain a large

number of small chromosomes.

For many organisms, classical ge-

netic analysis is not possible, because the

sexual cycle is either too long (e.g.Xeno-

pus), non-existent (e.g.Dictyostelium), or

uncontrollable (H. sapiens). This limitation

is becoming less and less of a drawback asan ever-expanding arsenal of molecular ge-

netic techniques is developed for isolating

genes, modifying them in vitro, and placing

them back into the genome.

Some of the special features of the

important organisms follow. E. coli and the

related Salmonella typhimurium were the

first organisms to be studied in molecular

detail and remain the best understood on a

molecular level (although this is changing).

Special advantages are extremely fastgrowth (cells can divide every 20 minutes),

very small genome size, about 1/1000 that

of humans with about 1/10 the number of

genes. Mutations in about 1,500 genes out of

a predicted total of 4,300 are already known.

E. coli lacks a true sexual cycle but the

technology for moving genes between dif-

ferent E. coli strains is very well developed

and is technically simple. E. coli is good for

studying detailed molecular function of pro-

teins.Prokaryotes like E. coli perform

many functions on a molecular level quite

differently from eukaryotes. The eukaryote

S. cerevisiae serves as a useful microorgan-

ism that has many of the advantages ofE.

coli, but with much greater similarity to

higher organisms. S. cerevisiae also has a

sexual cycle and Mendelian genetics. Gene

replacement is simple in yeast and permits

rapid reverse genetic as well as genetic stud-

ies.Though yeast is good for studying

cellular processes, obviously it does not

permit studies of how multicellular organ-

isms develop and function. Two organisms

used to study animal development are C.

elegans and Drosophila (known affection-

ately as worms and flies). Both organisms


11/47

11

are small, develop quickly and boast a large

catalog of developmental mutations and so-

phisticated classical and molecular genetics.

The small plantArabidopsis provides an or-

ganism for studying higher plant develop-

ment.There is great interest in human bi-

ology and the mouse serves as a convenient

and similar (!) mammal. The development

of gene replacement technology for the

mouse means that the role of genes in

mammals can be tested directly. It has be-

come much easier now to create a "knock-

out" mouse or a conditional knock-out

mouse that lacks any gene of interest.

Human genetics offers special

opportunities and difficulties. Unlike theother organisms it is not ethical to ex-

perimentally manipulate humans. On the

other hand, the earth has about 1010 humans

who notice even subtle developmental

problems and often report them to those

aware of genetic diseases (doctors).

Molecular pedigree analysis permits the

study of human genetics.

Genetic Mapping Techniques in VariousOrganisms

While the underlying principles are the

same, the approach taken to mapping muta-

tions and constructing genetic maps varies

from organism to organism. Obviously, the

techniques available to the experimenter for

mapping mutations in yeast, growing as a

colony on a plate, will differ from those

available for mapping human genes. Below

are summarized briefly the steps employedfor various popular experimental eukaryotes.

Techniques employed with bacteria are pre-

sented in the next section.

Yeast Mapping of genes in the yeast Sac-

charomyces cerevisiae generally occurs by

cloning the relevant gene, determining the

DNA sequence of only a short segment, and

comparing that sequence to the yeast geno-

mic database for identical sequences with

known chromosomal locations. When the

cloned gene is not available, the genetic

technique known as tetrad analysis is typi-cally used to determine the map position.

Genetic Mapping in Yeast Tetrad analysis

involves crossing a haploid mutant strain to

a series of tester strains of the opposite mat-

ing type containing marked chromosomes.

Following meiosis, four haploid spores, the

meiotic products of the cross, are contained

as a tetrad within a single ascus, enabling

accurate analysis of a single meiotic event.

The segregation of the mutant phenotype

from markers specific to a given chromo-some can be followed. Distribution of the

mutant gene (x) and a given marker (m) to

different chromosomes or to distant loca-

tions on the same chromosome yields pre-

dominantly random segregation of the two

genes (X, M) within a tetrad, i.e. a tetratype

with XM, Xm, xm and xM progeny. (Even

though yeast chromosomes are small, the

frequency of recombination is compara-

tively high.) If the mutant gene (x) is linked

to the marker (m), then tetrads of the paren-tal ditype are predominant, i.e. Xm, Xm,

xM, xM progeny within a single tetrad.

This analysis is then repeated with strains

containing markers at intervals scattered

along all the 16 yeast chromosomes until

linkage is observed.

For recessive mutations, the mapping

process can be simplified by using strains

carrying marked, unstable chromosomes.

Loss of a specifically marked chromosome

is induced by the cross to the mutant strain.A recessive mutant can exhibit its phenotype

upon loss of the homologous chromosome,

thereby permitting its chromosomal assign-

ment. The location of the mutant gene along

this chromosome can then be determined by

the frequency of its recombination with

known markers along the chromosome.


12/47

12

Mitotic cross-over mapping, result-

ing from reciprocal exchange of genes lo-

cated distally to the cross-over point, is a

rapid method to determine the arm of the

chromosome on which the gene resides and

can be performed in sectored colonies. Thefrequency of cosegregation of genes that are

far apart on the same arm of the chromo-

some is indicative of the localization of the

mutated gene to a defined region of the

chromosome. Fine mapping can be

achieved by meiotic mapping (tetrad analy-

sis) with markers known to reside in the vi-

cinity of this chromosomal region.

Caenorhabditis elegans The nematode C.

elegans has six linkage groups, all of aboutthe same size. There are two sexes: her-

maphrodites and males. Hermaphrodites are

morphologically similar to females, but

make sperm as well as oocytes. They can

fertilize their own eggs internally, or they

can be fertilized by males. Hermaphrodites

are XX, males are XO, and there are five

pairs of autosomes. Genetic analysis in C.

elegans is greatly aided by the possibility of

storing frozen mutant stocks indefinitely in

liquid nitrogen refrigerators.Genetics in C. elegans is somewhat

unusual in having the possibility of examin-

ing the self progeny of a single hermaphro-

dite. This simplifies certain operations. In

general, genetic mapping in C. elegans con-

sists of constructing a hermaphrodite het-

erozygous for mutations of interest, and then

observing the self progeny of that hermaph-

rodite for recombination between the muta-

tions.

To map a new mutation, first thelinkage group containing the mutation is de-

termined. This is done by determining its

linkage to known marker mutations. First, a

hermaphrodite is constructed that is het-

erozygous for the mutation of interest and a

morphological or behavioral mutation of

known linkage. For example, a male carry-

ing the new mutation may be mated to a

marked hermaphrodite. The heterozygous

hermaphrodite cross progeny are then al-

lowed to self. The frequency with which the

double homozygote is present among the

self progeny reveals whether the two muta-tions are linked. If they are unlinked (mean-

ingprobably on different chromosomes) the

frequency of the double homozygote is 1/16

(1/4 of the animals homozygous for the

marker mutation will also be homozygous

for the unknown mutation). If the two muta-

tions are linked, the frequency of the double

homozygote is much lower. This test is car-

ried out with markers for each of the six

linkage groups until linkage is found.

Once the linkage group of the newmutation is known, its position on the link-

age group is determined. In a three factor

cross, segregation from a hermaphrodite

carrying two known mutations on one chro-

mosome and the unknown mutation on the

homologous chromosome is analyzed.

Animals carrying a chromosome recombi-

nant for the known mutations are isolated,

and the presence or absence of the unknown

mutation on the recombinant chromosome is

established. In this way, the location of theunknown mutation is determined to be to the

left of, inside of, or to the right of the inter-

val defined by the known mutations. If it

lies inside the interval, its position within

the interval can be determined from the ra-

tios of genotypes among the recombinants.

In a two factor cross, the recombi-

nation distance between the new mutation

and a known mutation is determined. This

is done by analyzing the frequency of re-

combinants among the progeny of a her-maphrodite that is heterozygous for a cis-

double mutant chromosome, that is, a

chromosome bearing both the unknown

mutation and a known mutation. The cis-

double is conveniently obtained as a

segregant from a three-factor cross.


13/47

13

It is possible to determine the genetic

map position of a cloned gene or other DNA

segment by taking advantage of the C. ele-

gans physical map. A set of overlapping

cosmid and YAC clones is available cover-

ing the entire C. elegans genome. YACgrids are available consisting of a single

nitrocellulose filter onto which DNA of a

representative set of YAC clones has been

spotted, in order, representing the six C. ele-

gans chromosomes. The DNA fragment to

be mapped is labelled and hybridized to this

filter, and the subset of overlapping YACs

to which it hybridizes reveals its genetic lo-

cation. The physical position of the DNA

may be further refined by locating its posi-

tion on available cosmids. Its genetic func-tion may be determined in a transgenic ani-

mal constructed by microinjection of the

DNA.

Drosophila melanogaster D. melanogas-

terhas only 4 pairs of chromosomes: 1st (or

X), 2nd, 3rd, and 4th. Determining where

on a linkage group a gene maps is not usu-

ally difficult. Crosses with known markers

are employed and linkage or independent

assortment observed among progeny. Anunusual feature is the lack of meiotic re-

combination in males. In practice this sim-

plifies genetic mapping, because one can

breed a mutation only from the male parent

and be certain no recombination has oc-

curred, or from the female parent and be cer-

tain all the recombination occurred in one

generation.

Successful freezing and thawing of

Drosophila is only just being developed, and

most mutations are maintained in continuousculture. Special chromosomes called Bal-

ancer chromosomes have been developed,

which suppress recombination and chromo-

some segregation such that the progeny are

always genetically identical to their parents.

One very important feature is the

giant polytene chromosomes of the larval

salivary gland cells. These are thousands of

times larger than normal chromosomes and

make it routine to see chromosome rear-

rangements under the microscope. Labelled

DNA probes can easily be hybridized to the

polytene chromosomes and this allows de-termination of the position of a cloned se-

quence in the genome within a day.

Mouse Gene mapping in the mouse may

be carried out by the use of three different

test populations. These are (1) conventional

crosses, i.e., backcross (F1 x parent) or F2

(F1 x F1) populations, (2) recombinant-

inbred (RI) strains, or (3) interspecific

backcrosses (ISB).

If the gene has not been cloned andone must rely on a phenotype demonstrable

only in protein gels, cells, or individual

mice, mapping can be extremely tedious. If

the phenotypic differences occur among

mice of different inbred strains, particularly

those involved in RI strains or ISB's (see

below), then all three of the types of test

populations may be usable for mapping pur-

poses. If, however, the mutation is a newly

detected one present only in progeny of the

original mutant mouse, and if there are nohints of map location from existing experi-

mental data, all options for mapping can be

extremely costly in time and research funds.

If highly specific DNA probes are

available for the gene to be mapped, the first

step is to seek a restriction enzyme that re-

veals a RFLP (restriction fragment length

polymorphism) in tests with genomic DNA

from mice of various inbred strains. This

RFLP should permit the use of one or more

of the three approaches.If no RFLP can be identified, it is

possible to analyze a set of clones of inter-

specific hybrid cells. Progeny of the fusion

of a hamster and a mouse cell begin with

complete chromosomal complements from

both parents, but they gradually lose most of

the mouse chromosomes. There exist sets of


14/47

14

clones derived from such hamster/mouse

hybrids in which each clone retains only one

or two mouse chromosomes. The probe will

hybridize only with DNA from clones with

the mouse chromosome bearing the gene to

be mapped, and the gene is said to be syn-tenic with that chromosome. Note that the

homologous hamster gene may also hybrid-

ize with the probe, but it will usually pro-

duce a restriction fragment of a different

size from that of the mouse fragment. While

synteny can usually be established in this

way, it is only rarely possible to place the

gene on a particular portion of the chromo-

some by this method, e.g., when one of the

clones contains a chromosome with a trans-

location.A more virtuosic method is physical

mapping by hybridization of a radioactive

probe to spreads of banded chromosomes.

This procedure allows identification of the

chromosome carrying the gene and gives a

rough indication of its position on that chro-

mosome. The procedure is much more

difficult than with Drosophila, and less ac-

curate, because mouse chromosomes are not

polytene and have fewer bands.

Conventional crosses.Backcross: P1 (AB/AB) x P2 (ab/ab)

F1 (AB/ab); F1 x P2 results in 4 pheno-

typic combinations, AB, Ab, aB and ab in

frequencies ranging from 1:1:1:1 (no link-

age) to 2:0:0:2 (tight linkage); the recom-

bination frequency is the percentage of

mice with recombinant phenotypes (Ab

and aB) in the total backcross population.

F2: P1 (Ab/Ab) x P2 (aB/aB) F1

(Ab/aB); F1 x F1 results in the same

four phenotypic combinations in frequen-cies ranging from 9:3:3:1 (no linkage) to

0:8:8:0 (tight linkage); the recombination

frequency is still a function of these ratios,

but they must be converted into the re-

combination frequency using mathemati-

cal formulae.

RI strains. Several sets of RI

strains are available. Existing RI sets have

been typed for an enormous number of al-

lelic differences. Careful comparison of

the strain distribution patterns (SDP)

for the gene to be mapped with otherknown SDP's often produces a quite pre-

cise ordering of the gene with respect to

nearby genes.

ISB. In any given chromosomal

segment, RFLP's and other types of DNA

polymorphisms are more likely to occur be-

tween individuals of different species than

between individuals of the same species.

Although interspecies hybrids are often ster-

ile, hybrid females from crosses of the labo-

ratory mouse Mus musculus) and a relatedspecies (M. spretus) are fertile, and back-

crosses of the hybrid to M. musculus males

can readily be obtained in large numbers.

There exist sets of genomic DNAs from

each of>100 individual mice of such an ISB

that have already been typed for many DNA

polymorphisms. If it has not been possible

to identify a suitable polymorphism among

mice of different inbred strains, chances are

good that one can be found between the two

mouse species. Testing these DNAs withthe new RFLP makes it possible to compare

its segregation pattern among the ISB DNAs

with those of other markers in essentially the

same manner used for RI strains.

Humans A major effort, the Human Ge-

nome Project, was undertaken to obtain de-

tailed physical and genetic maps and the

complete nucleotide sequence of the human

genome. Analysis and annotation of this

sequence will eventually identify all of theestimated 50,000 human genes. Such an

accomplishment will enhance investigators'

ability to isolate distinct genes, particularly

those in which mutations are responsible for

human diseases. Many of the techniques

described above for physical and genetic

mapping in lower organisms are applicable


15/47

15

to humans, with the obvious exception of

experimental crosses. Traditionally, human

genes have been cloned by isolating the en-

coded protein and using this information to

screen libraries with antibodies or oligonu-

cleotide probes. When cloned nucleic acidprobes are then available, standard ap-

proaches toward physical mapping may be

carried out, including somatic cell hybrid

analyses and more recently, in situ hybridi-

zation techniques.

Genetic mapping, as in other organ-

isms, relies upon the frequency of recombi-

nation between various genetic loci, i.e., ge-

netic linkage analysis. The human genome

comprises approximately 3000 centiMor-

gans, where 1 cM is defined as the geneticlength over which one observes recombina-

tion 1% of the time. Assuming a haploid

genome of ~3 x 109 bp, 1 cM corresponds to

approximately 1 million base pairs. A ge-

netic linkage map allows one to clone genes

by virtue of a distinct phenotype or trait re-

sulting from a mutation, even if nothing at

all is known about the protein encoded by

the gene. As opposed to physical mapping,

this latter approach requires only that the

phenotype be linked to some polymorphicmarker, a technique known as positional

cloning.

As in lower organisms, the creation

of a useful genetic linkage map depends

upon the existence of polymorphic loci dis-

tributed throughout the genome. Histori-

cally, the first polymorphisms which pro-

vided a suitable approach for large scale ge-

netic mapping in humans were based upon

restriction fragment length polymorphisms

(RFLPs). However, RFLPs are not foundwith sufficient frequency to saturate the hu-

man genome. More recently, other types of

polymorphisms have become popular, in-

cluding mini-satellite DNAs or variable

number of tandem repeats (VNTRs), and

micro-satellites, particularly "CA" repeats.

Regions of DNA containing (CA)n, where

the number of repeats (n) is highly polymor-

phic, are dispersed throughout the genome.

These show a high degree of heterozygosity

and are inherited in typical Mendelian fash-

ion. By identifying the sequences which

flank various "CA" repeats, polymerasechain reaction (PCR) primers can be de-

signed which amplify fragments of differing

sizes, depending upon "n". The number of

PCR primer pairs which uniquely amplify

distinct "CA" repeats is constantly growing.

Using these and other polymorphic loci dis-

tributed throughout the genome, a highly

detailed genetic linkage map of the human

genome is being compiled. There are now

several thousand such highly polymorphic

loci which are distributed throughout thehuman genome, with markers spaced at less

than 5 cM. Thus, finding tight linkage be-

tween a phenotypic trait and some "CA" re-

peat or other polymorphic locus becomes

increasingly more probable.

In addition to identifying polymor-

phic loci, PCR primers that amplify distinct

segments of genomic DNA also provide an

approach to physical mapping and eventu-

ally isolation of the gene of interest. Once a

PCR primer pair is found which identifies apolymorphism that is tightly linked to the

phenotype of interest, genomic DNA librar-

ies can be screened using the same PCR

primer pair. Clones which are identified by

definition contain genomic DNA which is

also tightly linked to the gene of interest.

There are now several methods which per-

mit isolation of genes or parts of genes from

genomic DNA. Most prominent among

these methods are conventional methods of

screening cDNA libraries and newer meth-ods such as cDNA selection by affinity hy-

bridization and exon-amplification. Each of

the genes identified in this fashion would be

considered a candidate for the disease locus

being studied. Based upon the properties of

various candidate genes, such as their pat-

terns of expression, the nature of the en-


16/47

16

coded proteins, or identifiable mutations,

where the phenotype is some disease state,

the gene of interest can be unambiguously

identified.

As the density of genetic and physi-

cal markers increases, maps which incorpo-rate all types of markers (integrated maps)

are emerging. These maps are facilitating

isolation of genes for diseases which are in-

herited in a simple Mendelian fashion.

These maps are expected to help in identify-

ing genes in complex human genetic dis-

eases.

The Physical Characteristics of Genomes

Genomes Consist of DNA Molecules, andVary Widely in Size

The holy grail of classical geneticists was to

understand the physical structure of a gene

and how this structure allowed it to carry out

its two functions: to determine the character-

istics of the organism and to transmit those

characteristics to the next generation. By

the time the molecular structure of the ge-

netic molecule, DNA, was determined by

Watson and Crick in 1952, so much wasknown about the properties of genes from

classical genetic studies that it was immedi-

ately apparent from the DNA structure how

in a general way these two functions were

carried out: the information for the organism

was present in the form of a code, and the

information was replicated by base pair

complementarity. Since that time many bi-

ologists have concentrated their efforts on

determining the precise code for particular

organisms, and on understanding how thecode is read out, implemented, and transmit-

ted.

The total information for an organ-

ism is contained in its genome, comprising

the nuclear and plastid (mitochondrial, cho-

roplast) chromosomes. Each nuclear

chromosome consists of a single DNA

molecule held within a protein scaffolding.

held within a protein scaffolding. There

may be from one to over a hundred chromo-

somes in the nucleus, depending on the or-

ganism. The genomes of organisms that live

independently range in size from approxi-

mately 106

base pairs in bacteria to over 1011

base pairs in some amphibians. The ge-

nomes of "quasi" organisms, such as viruses,

that utilize the cellular machinery of another

organism to replicate, can be much smaller.

Small viral genomes, such as those of retro-

viruses, certain tumor viruses such as SV40,

or bacteriophages such as X174 are as

small as 5,000 base pairs. Some biologists

even view transposable elements as a kind

of organism, termed "selfish DNA", that

perpetuate their own existence within hostgenomes. Transposable elements may be as

short as 1,000 bases and encode just a single

gene.

Bacterial Genomes Contain Some 4300

Genes, Higher Organisms May Have As

Many As 30,000 or More

Because genes may be tightly

packed, even overlapping, on the DNA

molecule, or widely separated by "junk" se-quences, the number of genes in the genome

does not necessarily correlate with the

amount of DNA. In general, genes are more

densely packed in the genomes of prokaryo-

tes than in those of eukaryotes. The bacte-

riophage genome, 50,000 base pairs, con-

tains some 50 genes, or about one gene per

1000 base pairs (kilobase pairs, or kb).

The determination of the complete nucleo-

tide sequence of the chromosome, in 1982,

was a landmark achievement in the analysisof genome structure. The frequency of

genes in the E. coli genome, which is esti-

mated to contain 4,300 genes in 4.7 mb of

DNA, is somewhat lower.

The number of genes in the genomes

of higher organisms was been the subject of

much debate and speculation. Prior to the


17/47

17

direct sequencing of large genomes, two ap-

proaches were taken to estimate the gene

number. A genetic approach was to esti-

mate the total number of genes from the fre-

quency of lethal mutations obtained upon

mutagenesis. Estimates obtained by this ap-proach were too low for at least two reasons:

many genes may encode non-essential prod-

ucts, and many essential products may be

redundantly encoded.

A biochemical approach to deter-

mine the number of genes measured the rate

of annealing of mRNA to DNA. By this

method it is possible to make an estimate of

the complexity of the mixture. Complexity

is defined as the total number of non-

repeating sequences within a mixture of nu-cleic acids. By means of such kinetic meas-

urements, it was estimated that mammalian

genomes contained some tens of thousands

of expressed genes.

Genome Projects

Current Methods Make the Sequencing of

Whole Genomes Possible

Until the mid-1990s, the genetic and physi-cal study of the genetic make-up of organ-

isms proceeded in a piecemeal fashion.

Genes and genetic loci were studied one at a

time, as they became relevant to a particular

research topic or project. However, with the

advent of cloning vectors that can contain

much larger inserts of intact chromosomes

and improved sequencing technologies, the

sequencing of entire genomes has become

feasible. In this approach, the complete

DNA sequence of an organism is deter-mined, and all of the genes potentially iden-

tified by computer analysis of the sequence.

By carrying out all of the cloning and se-

quencing at once in a unified project, the

goal of obtaining a complete sequence oc-

curs much more rapidly, allowing investiga-

tors to concentrate on analyzing their inte-

grated function in the life of the organism.

Construction of Physical Maps: Overlap-

ping Clones.

The complete DNA sequence of genomes is

obtained by the automated determination

and analysis of vast quantities of DNA se-

quence. However, before this can be done,

the genome must first be obtained in frag-

ments of sequencable length (a few hundred

to a few thousand base pairs) whose rela-

tionship to one another is known. For this

purpose, a complete physical map is con-

structed. This consists of overlapping

cloned fragments of the genome, usually incosmid, P1, BAC, or YAC vectors. As such

a map is being constructed, overlapping

groups of contiguous fragments, termed

contigs, are built up. Contigs are progres-

sively joined to each other as more and more

fragments are mapped, until, when the

physical map is completed, the number of

contigs equals the number of chromosomes.

The Physical Map Is Correlated with theGenetic Map

The availability of complete genomic se-

quences allows correlation of the physical

sequence with genetic markers and the use

of mutants to understand the function of the

sequence. Physical and genetic maps are

correlated in several ways. Physical se-

quences that differ between strains (or in

lineages, e.g. of humans), resulting in "re-

striction fragment length poly-morphisms" (RFLPs), are genetically

mapped in the same way that any other ge-

netic difference is mapped. (An RFLP re-

sults whenever a particular, detectable (eg.

by hybridization to a probe) restriction

fragment differs in size between two organ-

isms. This can come about because one or


18/47

18

both of the restriction sites that define the

fragment are mutated, because a new restric-

tion site arises between them, because DNA

has been deleted or inserted between the two

sites, or because some other rearrangement

has separated them.) Cloned DNA frag-ments may also be mapped to chromosomes

and parts of chromosomes by in situ hy-

bridization techniques. This approach is

particularly powerful in Drosophila, where

the genetic map is already correlated in de-

tail with the polytene chromosome banding

pattern. A third approach is the identifica-

tion of functional genes on cloned DNA

fragments by the complementation of known

mutations (complementation rescue). This

approach is particularly powerful in organ-isms that are easily transformed, such as

yeast and C. elegans.

Eukaryotic Genomes Contain a Large

Amount of Repetitive DNA.

In spite of the fact that eukaryotic genomes

may have more genes than originally

thought, it remains true that these genomes

contain a great deal of non-coding sequence.

Some of this "extra DNA" appears simply to

be non-functional unique sequence.

Unique sequence is DNA sequence that oc-

curs only once in the haploid genome.

Some extra DNA is accounted for by in-

trons. Other sequences make up distinct

classes ofrepetitive DNA. Repetitive DNA

is DNA sequence present more than once

per haploid genome. Repetitive DNA can

make up anywhere from a small fraction to a

majority of the genomic DNA of eukaryotic

organisms. Typically it represents some

20% to 50%.

The first evidence that genomes con-

tained DNA apart from unique sequences

came from analysis ofreannealing kinetics.

When genomic DNA was denatured (e.g. by

heating) to cause the strands to separate, and

then allowed to reanneal to the double-

stranded form (at a lower temperature), the

rate of reannealing was not consistent with a

single kinetic component.

When double-stranded DNA is dena-

tured and renatured, the rate of reannealing,like the rate of other bimolecular reactions,

is dependent on concentration. In the case

of double-stranded DNA, the relevant con-

centration is the concentration ofsimilar or

identical DNA sequences, since only these

can interact to anneal. The concentration of

a pair of similar sequences in a mixture of

nucleic acids depends on the complexity of

the mixture, that is, the number ofdifferent

sequences in the mixture. When the kinetics

of renaturation of eukaryotic genomic DNAwas measured, it was found that much of the

DNA reannealed at a rate higher than ex-

pected for unique sequences. This indicated

that these sequences were repeated within

the genome. In fact, there were several ki-

netic components, indicating sequences pre-

sent from 10 times to millions of times in

the genome. This kind of kinetic analysis is

called Cotanalysis, because the kinetic data

were typically presented in a plot of percent

DNA annealed versus the product of theDNA concentration (Co) and time of anneal-

ing (T).

There Are Several Kinds of Repeated Se-

quences

The fastest kinetic component in a

Cot analysis of eukaryotic DNA typically

annealed essentially instantaneously, and in

a concentration-independent manner. This

component consists of inverted repeats.

These are similar sequences joined close

together and in inverted orientation, so that

they reanneal in a so-called "snap-back" or

"foldback" reaction. Inverted repeats are

often members of other repetitive sequence


19/47

19

families elsewhere present as isolated re-

peats.

The second fastest kinetic compo-

nent consists of sequences present millions

of times in the genome. These are simple

sequences consisting of long stretches of ashort repeat, such as ...ATATATATAT...

(from crab) or ...AAGAGAAGAG... (from

Drosphila). Such sequences are also known

as satellite sequences. This stems from

their behavior during density analysis of

DNA. When the density of eukaryotic DNA

is analyzed by buoyant density centrifuga-

tion in a CsCl density gradient, it is found to

have several components of different den-

sity. The gradient profile consists ofmain

band DNA, containing the unique se-quences, including most of the genes, and

satellite bands, so-called because they lie

along side the main band on the profile. The

anomalous, repetitive structure of the simple

sequence DNA accounts for its variant

buoyant density. In some eukaryotic ge-

nomes there is little or no satellite or simple-

sequence DNA, whereas in others such se-

quences may make up over 50% of the total.

The function of satellite DNA is not known.

Speculation focuses on a possible role dur-ing pairing of homologous chromosomes.

The next slowest kinetic component,

lying between the satellite sequences and the

unique sequences in rate of annealing, con-

sists of the so-called middle repetitive se-

quences. There are a great variety of such

sequences. Some are genes present in mul-

tiple copies in the genome. Genes for com-

mon cellular components such as ribosomal

RNA or histone proteins are often present in

multiple copies. The multiple copies maybe dispersed in the genome, or may be pres-

ent in tandem arrays at a single locus.

Non-functional, corrupted (mu-tated)

copies of genes, called pseudogenes, make

up another component of the middle repeti-

tive DNA. These sequences may have

arisen in a duplication event, or by reverse

transcription of an RNA copy of the gene,

followed by insertion of the DNA copy into

the genome. DNA copies of mRNA's,

known as processed pseudogenes, are char-

acterized by the presence of polyA tails and

absence of introns. This indicates their ori-gin from reverse transcription of cellular

mRNA, followed by insertion of the DNA

copy into the genome. Between 5% and

10% of the human genome is made up of a

large pseudogene family known as the Alu

family (named after the restriction enzyme,

AluI, that was first used to identify it).

These 300 bp repeats, present hundreds of

thousands of times in the genome, probably

originated as DNA copies of the short cellu-

lar RNA known as 7SL RNA. 7SL RNAfunctions normally as a component of the

cellular mechanism that translocates newly

synthesized proteins across membranes of

the rough endoplasmic reticulum. Short re-

peats such as the Alu repeats have been

dubbed SINES, for "short, interspersed se-

quences".

Other families of middle repetitive

sequences consist of transposable elements.

These are present in all genomes and have a

great variety of structures and modes oftransposition. They make up the LINES, or

"long interspersed sequences", in mammal-

ian genomes, and are in some cases related

to the genomes of retroviruses. Endogenous

retroviral genomes themselves are another

component of the middle repetitive DNA of

mammals. Transposable elements make up

some 20% of theDrosophila genome.

Finally, there is a class of middle

repetitive sequences that so far have eluded

explanation. These sequences typically con-sist of a few hundred base pairs, interspersed

among other sequences around the genome.

They have been given the general name in-

terspersed repeats. They make up families

of anywhere from a few to hundreds of

thousands of members, and there are typi-

cally hundreds to thousands of families in


20/47

20

eukaryotic genomes. They usually account

for a large proportion of the middle repeti-

tive DNA. In spite of their prevalence and

ubiquity, the origin and function of these

interspersed repeats remains a mystery.

While they certainly have an origin, the sus-picion is that they have no function. They

are the ultimatejunk DNA.

Maintenance and transmission of the ge-

netic material

Special Sequences Control the Replication

and Transmission of the Genetic Material

Most organisms use DNA as their genetic

material. The exceptions are some virusesthat use RNA. The symmetry of DNA per-

mits replication by polymerases to create

two exact copies of the genetic material.

One mechanism of replication involves ini-

tiation of synthesis at a single point, the ori-

gin of replication, and replication to com-

pletion. Many bacteria, plasmids and vi-

ruses replicate in this fashion. Another

mechanism involves initiation of DNA

synthesis at many points on the genome and

synthesis until the replication forks meet.There may or may not be origins of replica-

tion that are used during every round of rep-

lication. Eukaryotes use multiple origins on

a single DNA molecule. Also eukaryotes

have linear genomes which require ends

with special structures, called telomeres,

both for protection of the DNA, and to per-

mit the end to be correctly replicated.

Telomeres have a unique physical structure

that includes multiple short DNA repeats

with nicks and a capping hairpin structure.Once a genome has been replicated,

each copy must be accurately partitioned

into the two daughter cells. For the bacterial

circular genome and for some plasmids this

is accomplished by having a partition se-

quence in the DNA near the origin of repli-

cation. These sequences attach to regions of

the cell wall that grow apart during cell divi-

sion, dragging the two newly replicated ge-

nomes apart. For some plasmids and for the

plasmid-like DNA of mitochondria and

chloroplasts the genome is maintained inmultiple copies and the cell depends at least

partly on statistics to ensure that each

daughter cell or organelle gets at least one

copy of the genome. Other mechanisms

then ensure amplification of the genome.

Eukaryotes generally have their ge-

nomes distributed on several chromo-somes

and thus have special problems in assuring

that each daughter cell gets exactly the right

set of chromosomes after replication. A

special structure, the centromere, and at-tached cytoskeletal machinery, the mitotic

apparatus (mitotic spindle), ensure accu-

rate segregation of chromosomes. During

meiosis, in which a diploid cell undergoes

reductive divisions to yield haploid cells,

synapsis, or pairing of homologous chromo-

somes, and a unique meiotic apparatus are

required to ensure that haploid gametes get

exactly one of each chromosome.

Enzymatic Mechanisms Repair DNA Dam-

age and Recombine the DNA Strands

As the genetic material, DNA is pre-

cious and must be protected from damage.

Ultraviolet light, ionizing radiation, and

DNA modifying chemicals can damage

DNA. Many mechanisms exist to repair

damage that occurs. Excision repair path-

ways exploit the fact that two copies of ge-

netic information are stored in the two

strands of DNA. Damaged bases can be re-

moved on one strand and then recopied from

the other. Recombinational repair mecha-

nisms work by shuffling damaged and un-

damaged segments that are present in more

than one copy in the cell to try to put to-

gether a 'good' genome.


21/47

21

Even in the absence of detectable

DNA damage DNA sequences may 'recom-

bine'. Homologous recombination is at the

heart of both classical genetics and modern

"gene-targeting". The mechanism of such

recombination or cross-over events is con-troversial and probably varies according to

the organism, but involves breaks in DNA,

unwinding of strands, hybridization to ho-

mologous segments of DNA and new DNA

synthesis and endonuclease strand cleavage.

The net result is equivalent to a physical

cleavage of DNA and rejoining to a different

partner. Some transposons and viruses cata-

lyze recombination events that involve spe-

cific DNA sequences that may not be ho-

mologous (or only for a few bases).

Recombinant DNA and the Construc-tion

of Transgenic Organisms

Genes May Be Amplified in Pure Form by

"Cloning" Them in Microorganisms.

Early genetics was dependent on naturally

occurring mechanisms for the study of ge-

netic function. In the 1970's techniques

were developed to manipulate DNA in vitroand move it across species boundaries.

These cloning techniques rely on enzymes

that work on DNA. Restriction endonucle-

ases (commonly called restriction en-

zymes) cut DNA at specific sequences, of-

ten palindromic sequences. (A "palin-

drome" is a word or sentence that reads the

same forwards or backwards, like "A man, a

plan, a canal, Panama.", or "Madam, I'm

Adam.".) For example the restriction en-

zyme BamHI cuts at GGATCC. BamHI iscalled a 6-cutter because its recognition se-

quence is six bases long. On average one

expects a specific six-base sequence like

GGATCC to occur once every 4Kb of DNA,

but of course some fragments are much big-

ger or smaller. Furthermore, the average

size depends on the GC/AT content of the

DNA being cut, and the relative numbers of

G or C vs A or T nucleotides in the restric-

tion site. The size of DNA fragments can be

determined on agarose gels. About 150 re-

striction enzymes with different recognition

sequences are available commercially. Theposition of restriction sites in a piece of

DNA can be determined, giving a restric-

tion map useful for subsequent manipula-

tions. Fragments of DNA can be joined to

one another by another enzyme, DNA li-

gase. Together the ability to cut DNA,

separate fragments by size, and then rejoin

them in a new combination in vitro, forms

the basis for the powerful cloning technolo-

gies.

Though new DNA molecules can bemade in vitro, the yield is usually low.

However, by cloning DNA into a vector

capable of replication, the recombinant

DNA can be amplified in vivo. Further-

more, by placing the recombinant DNA into

a microorganism, a single defined segment

of a large genome can be separated from the

remainder of the genome simply by select-

ing a clone of organisms, like a bacterial

colony or a phage plaque. This is the origin

of the term "cloning". Depending on thevector being used, a variety of methods are

then available for separating the vector plus

insert from the host microorganism's own

DNA.

Common sources of vector DNA are

viruses and plasmids that are capable of rep-

lication inE. coli. E coli is a useful host for

amplifying DNA since it is easy to grow to

high density (2 X 10-9 cells/ml) and has rela-

tively little DNA of its own. Some virus

vectors (e.g. the filamentous phage M13)only infect certain stains ofE. coli. Vectors

such as yeast YACs and mammalian retrovi-

ral expression vectors are shuttle vectors

that can replicate in bothE. coli and eukary-

otic cells. Often bulk recombinant DNA is

made inE. coli and an experiment is done in

another organism.


22/47

22

An example of vector cloning fol-

lows. The bacteriophage has a genome of

about 50kb. A region containing about 1/3

of the genome serves no function (the so-

called financial district) and can be replaced

by other DNA including E. coli or foreignDNA. These clones can replicate just like

the original bacterial virus, but now when-

ever they duplicate they also duplicate the

inserted DNA.

For the most part DNA is DNA and

can be moved from organism to organism

without problems. However there are three

common problems in transferring DNA that

we will discuss briefly. 1) DNA can be

modified in ways that affect function. 2)

DNA can contain sequences that get rear-ranged in some organisms. 3) A cloned se-

quence may make a protein toxic to some

cells.

DNA modifications are common and

include sequence specific methylation of

bases. These methylations can affect gene

function and resistance to digestion with

specific restriction enzymes. Strains ofE.

coli that lack many of the offending DNA

methylases (e.g. mcrA) have been con-

structed. Also, some strains ofE. coli makerestriction enzymes that destroy unmodified

DNA. Take care! Often a specific cloning

project requires a specific host strain that

modifies or does not modify DNA (see the

New England Biolabs catalog or Molecular

Cloning for details).

E. coli does not like DNA containing

short direct repeats or with inverted repeats,

both of which tend to get deleted from

cloned fragments by the very active E. coli

recombination pathway. This can be a prob-lem when cloning eukaryotic DNA in which

such structures are common. E. coli host

vectors that are defective in recombinases

(like recA) are helpful but do not completely

solve the problem. E. coli vectors do not

tolerate more than about 20kb of DNA.

Yeast artificial chromosomes (YACs) are

useful for cloning up to 400kb of DNA. As

the name implies YACs are grown in yeast.

Eukaryotic cells such as yeast are tolerant of

repeated DNA, and hence repetitive se-

quences that cannot be cloned inE. coli can

often be cloned in a YAC.Cloned DNA may express proteins

that kill a specific host. For example, even

though eukaryotic promoters and introns do

not function in E. coli, often a polypeptide

derived from one exon will be expressed in

E. coli. E. coli is especially sensitive to hy-

drophobic proteins that interfere with secre-

tion (a secA strain may tolerate such clones)

and to DNA binding proteins.

The Polymerase Chain Reaction (PCR) Is aWay to "Clone" DNA Directly In Vitro

Instead of amplifying a defined DNA seg-

ment by ligating it to a vector and introduc-

ing it into a microorganism, it is possible to

amplify it enzymatically by the polymerase

chain reaction (PCR). In PCR, the DNA

segment between two short (15 to 30

nucleotides long) single-stranded

oligonucleotide primers is copied by a

primer-dependent DNA polymerase. The

polymerase used is from a thermophilic

bacterium. This makes it possible to carry

out many cycles of synthesis automatically

by alternately heating the reaction mixture

to melt all DNA strands (the polymerase is

not inactivated by the high temperature re-

quired for this), and then cooling it to allow

the primers to anneal and the polymerase to

function by extending them. In each succes-

sive cycle of melting and replication, the

amount of the DNA segment between the

two primers increases exponentially, as the

product of each synthetic round serves as

template in the next.

PCR can be extremely specific and

sensitive. Specificity is provided if each of

the primers anneals to only the single, in-


23/47

23

tended sequence. In 30 cycles of polymeri-

zation, biochemically detect-able and useful

quantities of a sequence from 50 to 5000

bases in length can be amplified from tiny

amounts of complex mixtures, such as the

genomic DNA of vertebrates. The syntheticproduct can subsequently be sequenced,

used as a labeled probe, or cloned for further

in vitro modification.

Genes Are Cloned by Isolating Them from

Clone Libraries or Clone Banks

The fundamental importance of gene clon-

ing is that it allows the purification of a sin-

gle gene out of the thousands or tens of

thousands present in the genomes of com-plex organisms. To accomplish this feat, it

is first necessary to introduce all the genes

of the organism under study into a culture of

microorganisms. The task is then to identify

a clone of the microorganism that contains

the single gene of interest. The mixed cul-

ture of microorganisms is termed a clone

library orclo

essentials of molecular genetics.pdf

Documents