essentials of molecular genetics.pdf

Upload: n1123581321

Post on 14-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    1/47

    ESSENTIALS OF MOLECULAR GENETICS

    Prepared by Faculty of the Albert Einstein College of Medicine

    (September, 1993; revised September, 2002)

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    2/47

    2

    CONTENTS

    What Is Molecular Genetics?......................................................................................................4

    Classical Genetics and the Definition of the Gene ....................................................................4Classical Genetics Defines the Gene by the Study of Mutations ...............................................4

    Mutations Can Be Dominant Or Recessive ...............................................................................5

    The Complementation Test Identifies the Gene as a Unit of Activity ........................................5

    A Complementation Test Sometimes Gives the "Wrong" Answer..............................................6

    Transmission Genetics.................................................................................................................7Classical Genetics Defined the Rules Governing Genetic Transmission..................................7

    Cytologists Discovered the Cellular Structures That Contained the Genes..............................7

    Genetic Recombination between Genes in Single Linkage Groups Results from Exchange of

    Material between Homologous Chromosomes ..........................................................................8

    The Frequency of Genetic Recombination Can Be Used to Map Genes on Chromosomes ......9

    Construction of a Genetic Map Is an Important Step in the Definition of Genes .....................9

    Organisms Being Studied Today..............................................................................................10

    Genetic Mapping Techniques in Various Organisms.............................................................11

    The Physical Characteristics of Genomes ...............................................................................16

    Genomes Consist of DNA Molecules, and Vary Widely in Size...............................................16

    Bacterial Genomes Contain Some 4300 Genes, Higher Organisms May Have As Many As

    30,000 or More ........................................................................................................................16

    Genome Projects ........................................................................................................................17Current Methods Make the Sequencing of Whole Genomes Possible.....................................17

    Construction of Physical Maps: Overlapping Clones. ............................................................17

    The Physical Map Is Correlated with the Genetic Map ..........................................................17

    Eukaryotic Genomes Contain a Large Amount of Repetitive DNA. ........................................18

    There Are Several Kinds of Repeated Sequences ....................................................................18

    Maintenance and transmission of the genetic material..........................................................20

    Special Sequences Control the Replication and Transmission of the Genetic Material .........20

    Enzymatic Mechanisms Repair DNA Damage and Recombine the DNA Strands ..................20

    Recombinant DNA and the Construc-tion of Transgenic Organisms ..................................21Genes May Be Amplified in Pure Form by "Cloning" Them in Microorganisms. ..................21

    The Polymerase Chain Reaction (PCR) Is a Way to "Clone" DNA Directly In Vitro.............22

    Genes Are Cloned by Isolating Them from Clone Libraries or Clone Banks .........................23

    A Variety of Vectors Provide a Range of Options for the Generation of a Clone Library .....23

    Clone Libraries May Be Screened in a Number of Ways ........................................................24

    Constructing Transgenic Organisms.......................................................................................25

    Basic Elements of Bacterial Genetics.......................................................................................26

    The Genetics of Bacteria Has Several Unique Features .........................................................26

    Bacterial Cells Exchange Genetic Material in a Process Known as Conjugation .................27

    The Bacterial Genetic Map Is Defined by the Time of Transfer During Conjugation ............27

    The Bacterial Genetic Map and the Bacterial Chromosome Are Circular.............................27

    The F Plasmid Encodes Genetic Functions Required for Transfer of DNA............................28

    Integration of the F Plasmid into the Bacterial Chromosome Can Result in Mobilization of the

    Chromosome for Transfer........................................................................................................28

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    3/47

    3

    Plasmids Can Be Used to Construct Partially-Diploid Bacterial Strains...............................28

    Plasmids Play an Important Role in the Transmission of Drug Resistance............................28

    In Transformation, Bacterial Cells Take Up DNA Directly ....................................................29

    Bacterial Viruses Play a Role in Genetic Exchange Between Bacteria ..................................29

    Study of Bacteriophages Has Played a Central Role in the Development of Molecular Biology.

    ..................................................................................................................................................29 Bacterial Viruses May Kill the Host Cell or Coexist with It ...................................................30

    Inferring Wild Type Gene Function from Mutant Phenotype..............................................30

    To Infer Wild Type Gene Function, It Is First Necessary to Determine How the Mutation

    Affects Gene Activity................................................................................................................30

    Types of Mutations Are Defined by Structure and by Affects on Gene Activity ......................31

    Rare Spontaneous Mutations Are of All Types ........................................................................31

    Chemical Mutagens Tend to Induce Point Mutations, Radiation Tends to Produce

    Rearrangements .......................................................................................................................31

    Null Mutations Are Important in the Determination of the Biological Process in which a Gene

    Participates..............................................................................................................................32

    In Some Organisms the Null Phenotype Is Best Determined by Gene Knockout ....................33Null Mutations Can Be Identified As Mutations That Behave Genetically Like a Deficiency of

    the Gene ...................................................................................................................................33

    Null Mutations Have Several Characteristics That Distinguish Them from Non-Null Mutations

    ..................................................................................................................................................34

    New Null Alleles May Be Isolated by a Non-Complementation Screen ..................................34

    Hypomorphic Mutations Lower But Do Not Eliminate Gene Activity.....................................34

    Gene Activity Is Raised by Hypermor-phic Mutations ............................................................35

    Antimorphic Mutations Produce a Poison Gene Product .......................................................35

    Neomorphic Mutations Result in a Novel Gene Activity .........................................................36

    A Gain-of-function Mutant Phenotype May Be Eliminated by Introducing a Loss-of-function

    Mutation at the Same Locus.....................................................................................................36Determining the Time and Place of Gene Action....................................................................36

    The Time and Location of Gene Expression Can Be Determined by a Number of Biochemical

    Means.......................................................................................................................................36

    Reporter Genes Provide a Sensitive and Versatile Assay of Gene Expression .......................37

    Gene Knockout Frequently Reveals That a Gene's Activity Is Not Required Everywhere It Is

    Expressed .................................................................................................................................38

    The Tissue Where Gene Activity Is Required May Be Determined by Mosaic Analysis .........38

    Gene Product Synthesis and Gene Product Action Need Not Take Place in the Same

    Generation ...............................................................................................................................39

    Parental Effects May Be Identified by Genetic Tests ..............................................................39

    Temperature-Sensitive Mutations Can Be Used to Determine the Time of Gene Action........40Analyzing Complex Processes by Genetics..............................................................................41

    Genetic Analysis Allows the Probing of Complex Biological Processes Involving Multiple

    Genes........................................................................................................................................41

    Some Genes Involved in a Biological Process May Be Identified As Genetic Modifiers........41

    Information About the Order of Gene Action in a Pathway Can Be Obtained by Epistasis

    Analysis ....................................................................................................................................44

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    4/47

    What Is Molecular Genetics?

    Molecular genetics is an approach to under-

    standing the functions of genes. It com-

    bines classical genetic analysis with molecu-lar biology to probe the nature of both gene

    action and gene transmission. The essential

    characteristic of molecular genetics is that

    gene products are studied through the

    genes that encode them. This contrasts

    with a biochemical approach, in which the

    gene products themselves are purified and

    their activities studied in vitro.

    All aspects of cell and organismal

    structure and function are potentially ame-

    nable to a molecular genetic approach. Be-cause genes are similar in all organisms, this

    approach has many essential aspects in

    common whether the organism being stud-

    ied is a bacterium, a fungus, or a mammal.

    The purpose of this booklet is to define and

    describe these common aspects, and to point

    out how they are applied in practice in the

    diverse organisms that are being studied to-

    day.

    Gene cloning, that is, the isolation of

    a gene so that its nucleotide sequence maybe determined, is central to molecular genet-

    ics. Genes identified through a classical ge-

    netic analysis of mutations may be cloned to

    ascertain the structure of the gene product

    and to permit biochemical studies of gene

    activity. Alternatively, genes may be de-

    fined first by the biochemical identification

    of their gene product. In this case gene

    cloning allows the isolation and study of

    mutant forms. In either approach, starting

    with a mutation or starting with a clonedgene, the techniques of classical genetic

    analysis are used to draw conclusions about

    gene function from the phenotype of muta-

    tions.

    In addition to gene function, molecu-

    lar genetics is also concerned with the

    transmission of the genetic material. Genes

    are carried by chromosomes, whose function

    is to maintain the integrity of each cell's

    complement of genetic information through

    cell division, and from one generation to the

    next. Chromosomes contain specialized se-

    quences whose function is to control chro-mosome replication, recombination, and dis-

    tribution to daughter cells. The understand-

    ing of such sequences can also be ap-

    proached by cloning, sequencing, and the

    identification of mutations.

    A long term goal of molecular genet-

    ics is understand gene function in the con-

    text of the life, development, and reproduc-

    tion of the individual, as well as the evolu-

    tion of the species.

    Classical Genetics and the Definition of

    the Gene

    Classical Genetics Defines the Gene by the

    Study of Mutations

    Long before it was known that genes con-

    sisted of strings of nucleotides that deter-

    mined the structure of proteins, it was possi-

    ble to infer their existence and many of their

    properties. Different forms of genes, calledalleles or mutations, were recognized by

    their effects on the phenotype of the organ-

    ism, that is, the organism's form and func-

    tion. The complete set of allelic forms of an

    organism's genes is termed its genotype.

    Classical genetic studies involving

    crosses between organisms with differing

    genotypes and phenotypes, beginning with

    Mendel, revealed that higher plants and ani-

    mals are diploid, that is, they have two

    copies of each gene, one derived from eachparent. Gametes, on the other hand, as well

    as the genomes of some higher organisms

    and most prokaryotes, have only one copy

    of each gene and are said to be haploid.

    With respect to a particular gene, a diploid

    organism is said to be homozygous if both

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    5/47

    5

    copies of the gene are the same, and het-

    erozygous if two different allelic forms of

    the gene are present. A heterozygote is also

    known as a hybrid of the two parental forms.

    An otherwise diploid organism is said to be

    hemizygous for any gene present in onlyone copy, for example, genes on the X

    chromosome ofDrosophila males.

    Mutations Can Be Dominant Or Recessive

    Since there are usually two copies of

    each gene per cell, it is possible to ask what

    will be the result if the two copies are differ-

    ent. Through the analysis of such heterozy-

    gotes, it has been possible to infer a great

    deal about the properties of genes and geneproducts. Consider two alleles of a single

    gene, a and b. Suppose the homozygote a/a

    has the phenotype A, and the homozygote

    b/b has the phenotype B. If the a/b hetero-

    zygote has the phenotypeA, then a is said to

    be dominant with respect to b, and b is said

    to be recessive with respect to a. IfA is the

    most common phenotype found in nature,

    then A is called the wild type, and a is the

    wildtype allele. In this case, b would be

    considered a recessive mutant allele of the

    gene, where the mutant phenotype is only

    observed when in homozygous form. How-

    ever, the wild type need not be the dominant

    form, and it is possible to have mutant forms

    that are dominant over wild type. Another

    alternative is that the phenotype of a/b is a

    mixture ofA andB characteristics, or has an

    intermediate phenotype between A and B;

    for example, ifA is "large" andB is "small",

    the phenotype of the a/b organism might be

    "medium sized", or ifA is red and B is

    white, the phenotype of the a/b organism

    might be pink. In this case, each of the alle-

    lic forms is said to be incompletely domi-

    nant or semidominant with respect to the

    other. If the phenotypes respectively char-

    acteristic of each allele are both expressed in

    the hybrid, then the two alleles are said to be

    codominant. This is the case, for example,

    with different allelic forms of blood group

    antigens.

    The Complementation Test Identifies theGene as a Unit of Activity

    In addition to making it possible to deter-

    mine whether one allelic form of a gene is

    dominant or recessive with respect to an-

    other, diploidy makes possible a fundamen-

    tal genetic test to determine whether two

    mutations with the same or similar pheno-

    types are in the same gene: the complemen-

    tation test. A determination of the number

    of genes involved is essential to begin un-raveling the role of genes in a particular pro-

    cess. Suppose, for example, the genetic ba-

    sis of fruit fly eye color is being studied. If

    wild type fruit fly eyes are red, and two mu-

    tant strains of flies have white eyes, it will

    be important to know whether the two muta-

    tions are in the same gene, or define two

    separate genes, both of which are necessary

    to make red eyes. It is by means of the

    complementation test that the gene as a unit

    of function is defined.

    In a complementation test, an organ-

    ism that is heterozygous in trans for two

    mutations with similar phenotypes is con-

    structed by genetic crosses, and its pheno-

    type is observed. Heterozygous in trans

    means that one mutant allele has been ob-

    tained from one parent, and the other mutant

    allele has been obtained from the other par-

    ent. It is necessary that both mutations be

    recessive, so that the phenotype of a hetero-

    zygote for each mutant allele singly is wild

    type. If the trans-heterozygote is also found

    to be wild type, then the two mutations are

    said to "complement" one another. If the

    trans-heterozygote is found to be mutant in

    phenotype, then the two mutations are said

    to "fail to complement" one another.

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    6/47

    6

    GENETIC NOMENCLATURE IN VARIOUS ORGANISMSE. coli yeast C. elegans Drosophila mouse

    Phenotype Gal-, Lac+ Ade-, Cdcts Dpy, Unc white agoutiGene galK, lacZ ade2, cdc28 dpy-5 white AAllele

    Recessive galK13, lacZ23 ade2-1 dpy-5(bx27) w, wa a, a

    bDominant same ADE2-27 dpy-5(bx27d) Ubx A, Ay, Avy

    Ts- same dpy-5(bx27ts) wtsWild type not written ADE2 dpy-5(+) w+, Ubx+ +, A+

    The table gives one or two examples of a gene or mutation name. Notice that among the differences in usage

    between the organsims, there are some consistencies: Phenotypes are written non-italicized, usually three

    letters with the first letter only capitalized. Gene names, alleles, genotypes generally, on the other hand, are

    italicized. In several systems, capitals denote dominance, small letters recessiveness.

    How are these two different results

    to be interpreted? If the trans-heterozygote

    has wild type phenotype, that is, if the muta-

    tions complement one another, this impliesthat the trans-heterozygote has all the ge-

    netic functions needed for expression of the

    wild type phenotype. In other words, the

    chromosomes from each mutant parent

    make up for the deficiency present on the

    chromosomes of the other. If one parent is

    mutant in say gene a, the second parent must

    carry a wild type copy of gene a. Since the

    mutation in a is recessive, this gives wild

    type gene a function. If the second parent

    has a wild type copy of gene a, its own mu-

    tation must be in a different gene from a.

    Evidently, the mutations carried by the two

    parents are in different genes.

    The same kind of reasoning applies

    to non-complementation. In this case, nei-

    ther parent makes up for the deficiency of

    the other; evidently they must be deficient in

    the same gene. Thus, the general interpreta-

    tion of the complementation test is as fol-

    lows: if two mutations complement, then

    they are likely to lie in different genes; if

    two mutations fail to complement, then they

    are likely to lie in the same gene.

    Note that a complementation test

    cannot be carried out with a dominant muta-

    tion. In order to determine the gene in

    which a dominant mutation lies, it is usually

    first necessary to isolate a recessive allele at

    the same locus. This is discussed further in

    a later section.

    In diploid organisms the trans-heterozygote required by the complementa-

    tion test is easily constructed by mating to-

    gether two single mutant strains. However,

    there are other ways of determining the re-

    sult of having multiple allelic forms in the

    same cell, including methods applicable to

    haploid organisms. For example, in bacteria

    a so-called merodiploid can be constructed

    by putting one copy of the gene being tested

    on a plasmid. Upon introduction of the

    plasmid, the organism becomes diploid over

    just that short segment of the chromosome

    carried by the plasmid. This technique is

    used in yeast as well. In both bacteria and

    yeast complementation is useful in

    determining whether a cloned DNA segment

    carries the wild type copy of a mutated gene.

    If it does, the cloned DNA segment will

    complement the mutation when the DNA

    segment is introduced into the cell; this is

    often termed "complementation rescue".

    Complementation rescue is also used to

    identify wild type genes in C. elegans, into

    which DNA may be introduced by

    icroinjection.m

    A Complementation Test Sometimes Gives

    the "Wrong" Answer

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    7/47

    7

    Although the reasoning used above to inter-

    pret the complementation test is valid for the

    majority of cases, it is not universally appli-

    cable. In some instances, the trans-

    heterozygote may have a mutant phenotype

    even though the two mutations being testedare in different genes. This is called sec-

    ond-site non-complementation, or inter-

    genic non-complementation (these terms

    are equivalent). This can occur due to a

    cumulative effect on the trans-heterozygote

    of having only one wild type copy of each of

    two genes, or of having two mutant alleles,

    even though when heterozygous singly mu-

    tations in the two genes are recessive.

    Likewise, in some instances the

    trans-heterozygote may have a wild typephenotype even though the two mutations

    are in the same gene. This is known as in-

    tragenic complementation. This comes

    about if each of the two mutant genes pro-

    duces a mutant gene product (as opposed to

    no gene product), and the two mutant gene

    products, when present in the same cell, can

    each supply the deficiency or remedy the

    defect of the other. Acting together, the two

    mutant gene products provide wild type

    gene function. Because of the possibility ofintergenic non-complementation and intra-

    genic complementation, the complementa-

    tion test is always combined with genetic

    mapping to provide a less ambiguous deter-

    mination whether two mutations define one

    or two genes.

    Transmission Genetics

    Classical Genetics Defined the Rules Gov-

    erning Genetic Transmission

    When Mendel, and later Morgan and other

    geneticists discovered that there were ge-

    netic entities termed genes that could mutate

    to different forms, they also discovered how

    those genes were transmitted from genera-

    tion to generation.

    Mendel realized that pea plants car-

    ried two copies of each gene. To maintain

    this number, each gamete had to contain one

    copy. The diploid condition was restored

    when two gametes joined at fertilization.

    Evidently, during formation of the gametesin the gonad, one of the two copies of each

    gene had to be selected to be incorporated

    into each sperm cell or egg cell. The separa-

    tion of the two alleles during formation of

    the gametes is termed segregation.

    Mendel wondered how this process

    occurred. By studying plants carrying muta-

    tions in more than one gene, he determined

    that the allelic forms of the two genes un-

    derwent independent assortment when

    they were segregated to the gametes. Thatis, the particular allelic form of one gene

    that went into a gamete did not affect which

    allelic form of the other gene went into that

    gamete. The result was that in the next gen-

    eration of plants new combinations of the

    allelic forms could be found in predictable

    ratios.

    When additional mutations in other

    organisms were studied, examples that ap-

    peared to violate this rule were soon found.

    In those examples, particular allelic forms oftwo different genes tended to stay together

    when gametes were formed. Such genes

    were said to be linked. After many exam-

    ples were studied, it was shown that genes

    could be placed into linkage groups. Genes

    in one linkage group tended to stay together

    in the gametes, and to assort independently

    of genes in other linkage groups. The first

    genes that Mendel had studied happened all

    to fall into different linkage groups.

    Cytologists Discovered the Cellular Struc-

    tures That Contained the Genes

    The foundation of genetics was consolidated

    when it was discovered that chromosomes

    behaved in the same way that Mendel's hy-

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    8/47

    8

    pothetical genes did. At the same time that

    geneticists were defining the properties of

    the abstract entities they called genes (at the

    end of the 19th and beginning of the 20th

    centuries), cytologists were discovering the

    components of cells visible with a micro-scope. In examining the nucleus, they found

    it contained multiple chromosomes ("col-

    ored bodies" seen because they accepted

    certain stains) present as morphological

    pairs. Copies of each pair were faithfully

    allocated to daughter cells at cell division in

    a process termed mitosis. During develop-

    ment of gametes, there was a reduction di-

    vision at which only one member of each

    pair entered each gamete, in a process simi-

    lar to the segregation of Mendels alleles.This unique form of cell division was

    termed meiosis. It was further shown that

    all of the different pairs of chromosomes

    were necessary for normal development of

    the organism.

    Thus chromosomes were essential

    and behaved like genes. However, it was

    found that there were many fewer chromo-

    somes than there were genetically-definable

    genes. Thus each chromosome would have

    to be associated with many genes. Eventu-ally it became apparent that the correct cor-

    relation was not between genes and chromo-

    somes, but between linkage groups and

    chromosomes. Organisms had the same

    number of chromosome pairs as genetic

    linkage groups. Linked genes went together

    into gametes because they were present on a

    single chromosome, whereas unlinked genes

    were on different chromosomes which as-

    sorted independently. The two cellular cop-

    ies of each chromosome are known as ho-mologs and together constitute a homolo-

    gous pair. Each member of a pair generally

    carries the same genes, although the allelic

    forms of these genes may differ. Thus the

    presence in the cell of two homologous

    chromosomes corresponds to the diploid ge-

    netic condition found by Mendel.

    Genetic Recombination between Genes in

    Single Linkage Groups Results from Ex-

    change of Material between Homologous

    Chromosomes

    When two marked (mutated) genes are pres-

    ent in a genetic cross, there is a possibility

    of both parental and non-parentalcombi-

    nations of alleles among the gametes. Sup-

    pose the two genes a and b are marked in a

    cross, such that one parent has the alleles A

    and B (genotype AB/AB) and the other par-

    ent has the alleles a and b (genotype ab/ab).

    The genotypes of all the F1 hybrid progeny

    are AB/ab. (In a cross such as this, follow-

    ing Mendels nomenclature, the parentalgeneration is known as the Po generation,

    and the progeny of the cross constitute the

    F1 generation [for first filial generation].

    The next generation is the F2 generation,

    and so forth.) Let the F1 hybrid be back

    crossed to the ab/ab parent. In this back

    cross, also known as a test cross, the ab/ab

    parent supplies only one type of gamete, ab.

    But for the F1 hybrid parent, there are sev-

    eral possibilities. The possibilities for the

    genotypes of the progeny are AB/ab, ab/ab,

    Ab/ab, or aB/ab, where the alleles written

    before the slash are from the F1 hybrid par-

    ent, and the alleles written after the slash are

    from the ab/ab parent. Regarding the alleles

    from the F1 hybrid parent, progeny with

    genotypes AB/ab and ab/ab are derived from

    F1 gametes with the parental (Po) configura-

    tions of alleles (AB and ab), whereas prog-

    eny with the genotypes Ab/ab and aB/ab are

    derived from gametes with non-parental

    configurations (Ab and aB). During meiosis

    in the F1 hybrid parent, the genes a and b

    are said to have recombined to give these

    non-parental combinations.

    By definition, unlinked genes re-

    combine at a frequency of 50%. They are

    assorted randomly to the gametes, half of

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    9/47

    9

    which get the parental combination and half

    of which get the non-parental (recombinant)

    combination. Linked genes are genes for

    which the frequency of recombination is less

    than 50%.

    Genes on different chromosomes un-dergo random assortment and hence recom-

    bine at a frequency of 50%. Genes on the

    same chromosome also recombine. This is

    because, during meiosis, homologous chro-

    mosomes pair and undergo a physical ex-

    change of material. In this way, non-

    parental combinations of alleles can be

    made even for linked genes. The frequency

    of the physical exchange event varies

    greatly from organism to organism and from

    chromosome to chromosome. It may be sohigh that two genes on the same chromo-

    some become genetically unlinked, assorting

    randomly. (If the frequency of exchange is

    very high, the frequency of genetic recom-

    bination rises to a maximum of only 50%.

    This is because double, quadruple, etc., ex-

    change events restore the parental configura-

    tion.) At the other extreme, it may be so

    low that two genes virtually never recom-

    bine and are said to be tightly linked.

    The Frequency of Genetic Recombination

    Can Be Used to Map Genes on Chromo-

    somes

    The frequency of physical exchange, and

    hence of genetic recombination, between

    genes on single chromosomes depends not

    only on the organism and chromosome, but

    also on the physical distance between the

    genes on the chromosome. The probability

    of an exchange is higher if the genes are fur-

    ther apart, and lower if they are closer to-

    gether. This provides the basis for con-

    structing a genetic map. By determining

    the frequency of the non-parental, that is

    recombinant, combination of alleles among

    the progeny of a cross, a recombination

    frequency is calculated. Genes are then ar-

    rayed along a linear map depending on their

    recombinational "distances" from each

    other.

    A genetic map gives the linear order

    of genes on a chromosome determined bygenetic studies. Because of the general cor-

    relation between the amount of DNA be-

    tween two genes and the probability of the

    occurrence of an exchange event, the genetic

    map resembles the physical array of the

    genes along the chromosome. However, the

    resemblance is far from perfect. While the

    order of the genes should be correct, the

    relative distances between them may not

    reflect the actual relative physical distances

    between them. The probability of exchangeper nucleotide is not constant, and in fact

    can vary a great deal from region to region.

    Some regions are hotspots of recombination

    where exchange occurs frequently, and

    likewise there are regions where exchange is

    suppressed. Genes on opposite sides of a

    hotspot, though physically close together,

    will appear far apart on the genetic map.

    Genes in regions of little recombination,

    though physically far apart, will appear

    close together on the genetic map. A physi-cal map displays where genes are physically

    located along a chromosome or molecule of

    DNA, as determined by molecular as op-

    posed to genetic studies. Correlation of ge-

    netic maps and physical maps is an impor-

    tant component of genome projects, as dis-

    cussed further below.

    Construction of a Genetic Map Is an Impor-

    tant Step in the Definition of Genes

    As discussed earlier, mutations can be as-

    signed to the same or different genes by a

    complementation test. This test rests on the

    gene as a unit of biochemical activity. How-

    ever, the possibilities of intergenic non-

    complementation, and intragenic comple-

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    10/47

    10

    mentation make this test not absolutely reli-

    able. Additional information can be readily

    obtained as to whether two mutations lie in

    the same or different genes if they are ge-

    netically mapped relative to one another.

    Mutations that map to different linkagegroups, or that lie far apart within a single

    linkage group, must be in separate genes.

    Likewise, mutations that are tightly linked

    and have similar phenotypes could well lie

    in a single gene even if they complement

    each other.

    Organisms Being Studied Today

    Many organisms are currently being studied

    using molecular genetic tech-niques. A few

    of the more commonly studied include the

    bacterium Escheri-chia coli, the yeast Sac-

    charomyces cerevisiae, the nematode Cae-

    norhabdites elegans, the fruit fly Droso-

    phila melanogaster, the flowering plant

    Arabidopsis thaliana, the mouse Mus mus-

    culus, and the humanHomo sapiens. These

    organisms each have special features that

    permit study of important aspects of biol-

    ogy. Other organisms are used as well,

    often to study some particular problem. For

    example, the molecular genetics of the small

    tropical aquarium fish, the zebrafish, is be-

    ing developed. It is hoped that this organ-

    ism will serve as a vertebrate amenable to

    the same kind of in-depth analysis as is fo-

    cused on Drosophila, C. elegans, and

    Arabidopsis. Embryogenesis of the frog

    Xenopus is studied because of its large, rap-

    idly-developing eggs, while the slime moldDictyostelium serves as a model to study

    cell mobility, cell-cell signaling, and pattern

    formation. Ciliated protozoans have proven

    to be excellent for the analysis of telomeres,

    because their macronuclei contain a large

    number of small chromosomes.

    For many organisms, classical ge-

    netic analysis is not possible, because the

    sexual cycle is either too long (e.g.Xeno-

    pus), non-existent (e.g.Dictyostelium), or

    uncontrollable (H. sapiens). This limitation

    is becoming less and less of a drawback asan ever-expanding arsenal of molecular ge-

    netic techniques is developed for isolating

    genes, modifying them in vitro, and placing

    them back into the genome.

    Some of the special features of the

    important organisms follow. E. coli and the

    related Salmonella typhimurium were the

    first organisms to be studied in molecular

    detail and remain the best understood on a

    molecular level (although this is changing).

    Special advantages are extremely fastgrowth (cells can divide every 20 minutes),

    very small genome size, about 1/1000 that

    of humans with about 1/10 the number of

    genes. Mutations in about 1,500 genes out of

    a predicted total of 4,300 are already known.

    E. coli lacks a true sexual cycle but the

    technology for moving genes between dif-

    ferent E. coli strains is very well developed

    and is technically simple. E. coli is good for

    studying detailed molecular function of pro-

    teins.Prokaryotes like E. coli perform

    many functions on a molecular level quite

    differently from eukaryotes. The eukaryote

    S. cerevisiae serves as a useful microorgan-

    ism that has many of the advantages ofE.

    coli, but with much greater similarity to

    higher organisms. S. cerevisiae also has a

    sexual cycle and Mendelian genetics. Gene

    replacement is simple in yeast and permits

    rapid reverse genetic as well as genetic stud-

    ies.Though yeast is good for studying

    cellular processes, obviously it does not

    permit studies of how multicellular organ-

    isms develop and function. Two organisms

    used to study animal development are C.

    elegans and Drosophila (known affection-

    ately as worms and flies). Both organisms

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    11/47

    11

    are small, develop quickly and boast a large

    catalog of developmental mutations and so-

    phisticated classical and molecular genetics.

    The small plantArabidopsis provides an or-

    ganism for studying higher plant develop-

    ment.There is great interest in human bi-

    ology and the mouse serves as a convenient

    and similar (!) mammal. The development

    of gene replacement technology for the

    mouse means that the role of genes in

    mammals can be tested directly. It has be-

    come much easier now to create a "knock-

    out" mouse or a conditional knock-out

    mouse that lacks any gene of interest.

    Human genetics offers special

    opportunities and difficulties. Unlike theother organisms it is not ethical to ex-

    perimentally manipulate humans. On the

    other hand, the earth has about 1010 humans

    who notice even subtle developmental

    problems and often report them to those

    aware of genetic diseases (doctors).

    Molecular pedigree analysis permits the

    study of human genetics.

    Genetic Mapping Techniques in VariousOrganisms

    While the underlying principles are the

    same, the approach taken to mapping muta-

    tions and constructing genetic maps varies

    from organism to organism. Obviously, the

    techniques available to the experimenter for

    mapping mutations in yeast, growing as a

    colony on a plate, will differ from those

    available for mapping human genes. Below

    are summarized briefly the steps employedfor various popular experimental eukaryotes.

    Techniques employed with bacteria are pre-

    sented in the next section.

    Yeast Mapping of genes in the yeast Sac-

    charomyces cerevisiae generally occurs by

    cloning the relevant gene, determining the

    DNA sequence of only a short segment, and

    comparing that sequence to the yeast geno-

    mic database for identical sequences with

    known chromosomal locations. When the

    cloned gene is not available, the genetic

    technique known as tetrad analysis is typi-cally used to determine the map position.

    Genetic Mapping in Yeast Tetrad analysis

    involves crossing a haploid mutant strain to

    a series of tester strains of the opposite mat-

    ing type containing marked chromosomes.

    Following meiosis, four haploid spores, the

    meiotic products of the cross, are contained

    as a tetrad within a single ascus, enabling

    accurate analysis of a single meiotic event.

    The segregation of the mutant phenotype

    from markers specific to a given chromo-some can be followed. Distribution of the

    mutant gene (x) and a given marker (m) to

    different chromosomes or to distant loca-

    tions on the same chromosome yields pre-

    dominantly random segregation of the two

    genes (X, M) within a tetrad, i.e. a tetratype

    with XM, Xm, xm and xM progeny. (Even

    though yeast chromosomes are small, the

    frequency of recombination is compara-

    tively high.) If the mutant gene (x) is linked

    to the marker (m), then tetrads of the paren-tal ditype are predominant, i.e. Xm, Xm,

    xM, xM progeny within a single tetrad.

    This analysis is then repeated with strains

    containing markers at intervals scattered

    along all the 16 yeast chromosomes until

    linkage is observed.

    For recessive mutations, the mapping

    process can be simplified by using strains

    carrying marked, unstable chromosomes.

    Loss of a specifically marked chromosome

    is induced by the cross to the mutant strain.A recessive mutant can exhibit its phenotype

    upon loss of the homologous chromosome,

    thereby permitting its chromosomal assign-

    ment. The location of the mutant gene along

    this chromosome can then be determined by

    the frequency of its recombination with

    known markers along the chromosome.

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    12/47

    12

    Mitotic cross-over mapping, result-

    ing from reciprocal exchange of genes lo-

    cated distally to the cross-over point, is a

    rapid method to determine the arm of the

    chromosome on which the gene resides and

    can be performed in sectored colonies. Thefrequency of cosegregation of genes that are

    far apart on the same arm of the chromo-

    some is indicative of the localization of the

    mutated gene to a defined region of the

    chromosome. Fine mapping can be

    achieved by meiotic mapping (tetrad analy-

    sis) with markers known to reside in the vi-

    cinity of this chromosomal region.

    Caenorhabditis elegans The nematode C.

    elegans has six linkage groups, all of aboutthe same size. There are two sexes: her-

    maphrodites and males. Hermaphrodites are

    morphologically similar to females, but

    make sperm as well as oocytes. They can

    fertilize their own eggs internally, or they

    can be fertilized by males. Hermaphrodites

    are XX, males are XO, and there are five

    pairs of autosomes. Genetic analysis in C.

    elegans is greatly aided by the possibility of

    storing frozen mutant stocks indefinitely in

    liquid nitrogen refrigerators.Genetics in C. elegans is somewhat

    unusual in having the possibility of examin-

    ing the self progeny of a single hermaphro-

    dite. This simplifies certain operations. In

    general, genetic mapping in C. elegans con-

    sists of constructing a hermaphrodite het-

    erozygous for mutations of interest, and then

    observing the self progeny of that hermaph-

    rodite for recombination between the muta-

    tions.

    To map a new mutation, first thelinkage group containing the mutation is de-

    termined. This is done by determining its

    linkage to known marker mutations. First, a

    hermaphrodite is constructed that is het-

    erozygous for the mutation of interest and a

    morphological or behavioral mutation of

    known linkage. For example, a male carry-

    ing the new mutation may be mated to a

    marked hermaphrodite. The heterozygous

    hermaphrodite cross progeny are then al-

    lowed to self. The frequency with which the

    double homozygote is present among the

    self progeny reveals whether the two muta-tions are linked. If they are unlinked (mean-

    ingprobably on different chromosomes) the

    frequency of the double homozygote is 1/16

    (1/4 of the animals homozygous for the

    marker mutation will also be homozygous

    for the unknown mutation). If the two muta-

    tions are linked, the frequency of the double

    homozygote is much lower. This test is car-

    ried out with markers for each of the six

    linkage groups until linkage is found.

    Once the linkage group of the newmutation is known, its position on the link-

    age group is determined. In a three factor

    cross, segregation from a hermaphrodite

    carrying two known mutations on one chro-

    mosome and the unknown mutation on the

    homologous chromosome is analyzed.

    Animals carrying a chromosome recombi-

    nant for the known mutations are isolated,

    and the presence or absence of the unknown

    mutation on the recombinant chromosome is

    established. In this way, the location of theunknown mutation is determined to be to the

    left of, inside of, or to the right of the inter-

    val defined by the known mutations. If it

    lies inside the interval, its position within

    the interval can be determined from the ra-

    tios of genotypes among the recombinants.

    In a two factor cross, the recombi-

    nation distance between the new mutation

    and a known mutation is determined. This

    is done by analyzing the frequency of re-

    combinants among the progeny of a her-maphrodite that is heterozygous for a cis-

    double mutant chromosome, that is, a

    chromosome bearing both the unknown

    mutation and a known mutation. The cis-

    double is conveniently obtained as a

    segregant from a three-factor cross.

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    13/47

    13

    It is possible to determine the genetic

    map position of a cloned gene or other DNA

    segment by taking advantage of the C. ele-

    gans physical map. A set of overlapping

    cosmid and YAC clones is available cover-

    ing the entire C. elegans genome. YACgrids are available consisting of a single

    nitrocellulose filter onto which DNA of a

    representative set of YAC clones has been

    spotted, in order, representing the six C. ele-

    gans chromosomes. The DNA fragment to

    be mapped is labelled and hybridized to this

    filter, and the subset of overlapping YACs

    to which it hybridizes reveals its genetic lo-

    cation. The physical position of the DNA

    may be further refined by locating its posi-

    tion on available cosmids. Its genetic func-tion may be determined in a transgenic ani-

    mal constructed by microinjection of the

    DNA.

    Drosophila melanogaster D. melanogas-

    terhas only 4 pairs of chromosomes: 1st (or

    X), 2nd, 3rd, and 4th. Determining where

    on a linkage group a gene maps is not usu-

    ally difficult. Crosses with known markers

    are employed and linkage or independent

    assortment observed among progeny. Anunusual feature is the lack of meiotic re-

    combination in males. In practice this sim-

    plifies genetic mapping, because one can

    breed a mutation only from the male parent

    and be certain no recombination has oc-

    curred, or from the female parent and be cer-

    tain all the recombination occurred in one

    generation.

    Successful freezing and thawing of

    Drosophila is only just being developed, and

    most mutations are maintained in continuousculture. Special chromosomes called Bal-

    ancer chromosomes have been developed,

    which suppress recombination and chromo-

    some segregation such that the progeny are

    always genetically identical to their parents.

    One very important feature is the

    giant polytene chromosomes of the larval

    salivary gland cells. These are thousands of

    times larger than normal chromosomes and

    make it routine to see chromosome rear-

    rangements under the microscope. Labelled

    DNA probes can easily be hybridized to the

    polytene chromosomes and this allows de-termination of the position of a cloned se-

    quence in the genome within a day.

    Mouse Gene mapping in the mouse may

    be carried out by the use of three different

    test populations. These are (1) conventional

    crosses, i.e., backcross (F1 x parent) or F2

    (F1 x F1) populations, (2) recombinant-

    inbred (RI) strains, or (3) interspecific

    backcrosses (ISB).

    If the gene has not been cloned andone must rely on a phenotype demonstrable

    only in protein gels, cells, or individual

    mice, mapping can be extremely tedious. If

    the phenotypic differences occur among

    mice of different inbred strains, particularly

    those involved in RI strains or ISB's (see

    below), then all three of the types of test

    populations may be usable for mapping pur-

    poses. If, however, the mutation is a newly

    detected one present only in progeny of the

    original mutant mouse, and if there are nohints of map location from existing experi-

    mental data, all options for mapping can be

    extremely costly in time and research funds.

    If highly specific DNA probes are

    available for the gene to be mapped, the first

    step is to seek a restriction enzyme that re-

    veals a RFLP (restriction fragment length

    polymorphism) in tests with genomic DNA

    from mice of various inbred strains. This

    RFLP should permit the use of one or more

    of the three approaches.If no RFLP can be identified, it is

    possible to analyze a set of clones of inter-

    specific hybrid cells. Progeny of the fusion

    of a hamster and a mouse cell begin with

    complete chromosomal complements from

    both parents, but they gradually lose most of

    the mouse chromosomes. There exist sets of

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    14/47

    14

    clones derived from such hamster/mouse

    hybrids in which each clone retains only one

    or two mouse chromosomes. The probe will

    hybridize only with DNA from clones with

    the mouse chromosome bearing the gene to

    be mapped, and the gene is said to be syn-tenic with that chromosome. Note that the

    homologous hamster gene may also hybrid-

    ize with the probe, but it will usually pro-

    duce a restriction fragment of a different

    size from that of the mouse fragment. While

    synteny can usually be established in this

    way, it is only rarely possible to place the

    gene on a particular portion of the chromo-

    some by this method, e.g., when one of the

    clones contains a chromosome with a trans-

    location.A more virtuosic method is physical

    mapping by hybridization of a radioactive

    probe to spreads of banded chromosomes.

    This procedure allows identification of the

    chromosome carrying the gene and gives a

    rough indication of its position on that chro-

    mosome. The procedure is much more

    difficult than with Drosophila, and less ac-

    curate, because mouse chromosomes are not

    polytene and have fewer bands.

    Conventional crosses.Backcross: P1 (AB/AB) x P2 (ab/ab)

    F1 (AB/ab); F1 x P2 results in 4 pheno-

    typic combinations, AB, Ab, aB and ab in

    frequencies ranging from 1:1:1:1 (no link-

    age) to 2:0:0:2 (tight linkage); the recom-

    bination frequency is the percentage of

    mice with recombinant phenotypes (Ab

    and aB) in the total backcross population.

    F2: P1 (Ab/Ab) x P2 (aB/aB) F1

    (Ab/aB); F1 x F1 results in the same

    four phenotypic combinations in frequen-cies ranging from 9:3:3:1 (no linkage) to

    0:8:8:0 (tight linkage); the recombination

    frequency is still a function of these ratios,

    but they must be converted into the re-

    combination frequency using mathemati-

    cal formulae.

    RI strains. Several sets of RI

    strains are available. Existing RI sets have

    been typed for an enormous number of al-

    lelic differences. Careful comparison of

    the strain distribution patterns (SDP)

    for the gene to be mapped with otherknown SDP's often produces a quite pre-

    cise ordering of the gene with respect to

    nearby genes.

    ISB. In any given chromosomal

    segment, RFLP's and other types of DNA

    polymorphisms are more likely to occur be-

    tween individuals of different species than

    between individuals of the same species.

    Although interspecies hybrids are often ster-

    ile, hybrid females from crosses of the labo-

    ratory mouse Mus musculus) and a relatedspecies (M. spretus) are fertile, and back-

    crosses of the hybrid to M. musculus males

    can readily be obtained in large numbers.

    There exist sets of genomic DNAs from

    each of>100 individual mice of such an ISB

    that have already been typed for many DNA

    polymorphisms. If it has not been possible

    to identify a suitable polymorphism among

    mice of different inbred strains, chances are

    good that one can be found between the two

    mouse species. Testing these DNAs withthe new RFLP makes it possible to compare

    its segregation pattern among the ISB DNAs

    with those of other markers in essentially the

    same manner used for RI strains.

    Humans A major effort, the Human Ge-

    nome Project, was undertaken to obtain de-

    tailed physical and genetic maps and the

    complete nucleotide sequence of the human

    genome. Analysis and annotation of this

    sequence will eventually identify all of theestimated 50,000 human genes. Such an

    accomplishment will enhance investigators'

    ability to isolate distinct genes, particularly

    those in which mutations are responsible for

    human diseases. Many of the techniques

    described above for physical and genetic

    mapping in lower organisms are applicable

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    15/47

    15

    to humans, with the obvious exception of

    experimental crosses. Traditionally, human

    genes have been cloned by isolating the en-

    coded protein and using this information to

    screen libraries with antibodies or oligonu-

    cleotide probes. When cloned nucleic acidprobes are then available, standard ap-

    proaches toward physical mapping may be

    carried out, including somatic cell hybrid

    analyses and more recently, in situ hybridi-

    zation techniques.

    Genetic mapping, as in other organ-

    isms, relies upon the frequency of recombi-

    nation between various genetic loci, i.e., ge-

    netic linkage analysis. The human genome

    comprises approximately 3000 centiMor-

    gans, where 1 cM is defined as the geneticlength over which one observes recombina-

    tion 1% of the time. Assuming a haploid

    genome of ~3 x 109 bp, 1 cM corresponds to

    approximately 1 million base pairs. A ge-

    netic linkage map allows one to clone genes

    by virtue of a distinct phenotype or trait re-

    sulting from a mutation, even if nothing at

    all is known about the protein encoded by

    the gene. As opposed to physical mapping,

    this latter approach requires only that the

    phenotype be linked to some polymorphicmarker, a technique known as positional

    cloning.

    As in lower organisms, the creation

    of a useful genetic linkage map depends

    upon the existence of polymorphic loci dis-

    tributed throughout the genome. Histori-

    cally, the first polymorphisms which pro-

    vided a suitable approach for large scale ge-

    netic mapping in humans were based upon

    restriction fragment length polymorphisms

    (RFLPs). However, RFLPs are not foundwith sufficient frequency to saturate the hu-

    man genome. More recently, other types of

    polymorphisms have become popular, in-

    cluding mini-satellite DNAs or variable

    number of tandem repeats (VNTRs), and

    micro-satellites, particularly "CA" repeats.

    Regions of DNA containing (CA)n, where

    the number of repeats (n) is highly polymor-

    phic, are dispersed throughout the genome.

    These show a high degree of heterozygosity

    and are inherited in typical Mendelian fash-

    ion. By identifying the sequences which

    flank various "CA" repeats, polymerasechain reaction (PCR) primers can be de-

    signed which amplify fragments of differing

    sizes, depending upon "n". The number of

    PCR primer pairs which uniquely amplify

    distinct "CA" repeats is constantly growing.

    Using these and other polymorphic loci dis-

    tributed throughout the genome, a highly

    detailed genetic linkage map of the human

    genome is being compiled. There are now

    several thousand such highly polymorphic

    loci which are distributed throughout thehuman genome, with markers spaced at less

    than 5 cM. Thus, finding tight linkage be-

    tween a phenotypic trait and some "CA" re-

    peat or other polymorphic locus becomes

    increasingly more probable.

    In addition to identifying polymor-

    phic loci, PCR primers that amplify distinct

    segments of genomic DNA also provide an

    approach to physical mapping and eventu-

    ally isolation of the gene of interest. Once a

    PCR primer pair is found which identifies apolymorphism that is tightly linked to the

    phenotype of interest, genomic DNA librar-

    ies can be screened using the same PCR

    primer pair. Clones which are identified by

    definition contain genomic DNA which is

    also tightly linked to the gene of interest.

    There are now several methods which per-

    mit isolation of genes or parts of genes from

    genomic DNA. Most prominent among

    these methods are conventional methods of

    screening cDNA libraries and newer meth-ods such as cDNA selection by affinity hy-

    bridization and exon-amplification. Each of

    the genes identified in this fashion would be

    considered a candidate for the disease locus

    being studied. Based upon the properties of

    various candidate genes, such as their pat-

    terns of expression, the nature of the en-

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    16/47

    16

    coded proteins, or identifiable mutations,

    where the phenotype is some disease state,

    the gene of interest can be unambiguously

    identified.

    As the density of genetic and physi-

    cal markers increases, maps which incorpo-rate all types of markers (integrated maps)

    are emerging. These maps are facilitating

    isolation of genes for diseases which are in-

    herited in a simple Mendelian fashion.

    These maps are expected to help in identify-

    ing genes in complex human genetic dis-

    eases.

    The Physical Characteristics of Genomes

    Genomes Consist of DNA Molecules, andVary Widely in Size

    The holy grail of classical geneticists was to

    understand the physical structure of a gene

    and how this structure allowed it to carry out

    its two functions: to determine the character-

    istics of the organism and to transmit those

    characteristics to the next generation. By

    the time the molecular structure of the ge-

    netic molecule, DNA, was determined by

    Watson and Crick in 1952, so much wasknown about the properties of genes from

    classical genetic studies that it was immedi-

    ately apparent from the DNA structure how

    in a general way these two functions were

    carried out: the information for the organism

    was present in the form of a code, and the

    information was replicated by base pair

    complementarity. Since that time many bi-

    ologists have concentrated their efforts on

    determining the precise code for particular

    organisms, and on understanding how thecode is read out, implemented, and transmit-

    ted.

    The total information for an organ-

    ism is contained in its genome, comprising

    the nuclear and plastid (mitochondrial, cho-

    roplast) chromosomes. Each nuclear

    chromosome consists of a single DNA

    molecule held within a protein scaffolding.

    held within a protein scaffolding. There

    may be from one to over a hundred chromo-

    somes in the nucleus, depending on the or-

    ganism. The genomes of organisms that live

    independently range in size from approxi-

    mately 106

    base pairs in bacteria to over 1011

    base pairs in some amphibians. The ge-

    nomes of "quasi" organisms, such as viruses,

    that utilize the cellular machinery of another

    organism to replicate, can be much smaller.

    Small viral genomes, such as those of retro-

    viruses, certain tumor viruses such as SV40,

    or bacteriophages such as X174 are as

    small as 5,000 base pairs. Some biologists

    even view transposable elements as a kind

    of organism, termed "selfish DNA", that

    perpetuate their own existence within hostgenomes. Transposable elements may be as

    short as 1,000 bases and encode just a single

    gene.

    Bacterial Genomes Contain Some 4300

    Genes, Higher Organisms May Have As

    Many As 30,000 or More

    Because genes may be tightly

    packed, even overlapping, on the DNA

    molecule, or widely separated by "junk" se-quences, the number of genes in the genome

    does not necessarily correlate with the

    amount of DNA. In general, genes are more

    densely packed in the genomes of prokaryo-

    tes than in those of eukaryotes. The bacte-

    riophage genome, 50,000 base pairs, con-

    tains some 50 genes, or about one gene per

    1000 base pairs (kilobase pairs, or kb).

    The determination of the complete nucleo-

    tide sequence of the chromosome, in 1982,

    was a landmark achievement in the analysisof genome structure. The frequency of

    genes in the E. coli genome, which is esti-

    mated to contain 4,300 genes in 4.7 mb of

    DNA, is somewhat lower.

    The number of genes in the genomes

    of higher organisms was been the subject of

    much debate and speculation. Prior to the

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    17/47

    17

    direct sequencing of large genomes, two ap-

    proaches were taken to estimate the gene

    number. A genetic approach was to esti-

    mate the total number of genes from the fre-

    quency of lethal mutations obtained upon

    mutagenesis. Estimates obtained by this ap-proach were too low for at least two reasons:

    many genes may encode non-essential prod-

    ucts, and many essential products may be

    redundantly encoded.

    A biochemical approach to deter-

    mine the number of genes measured the rate

    of annealing of mRNA to DNA. By this

    method it is possible to make an estimate of

    the complexity of the mixture. Complexity

    is defined as the total number of non-

    repeating sequences within a mixture of nu-cleic acids. By means of such kinetic meas-

    urements, it was estimated that mammalian

    genomes contained some tens of thousands

    of expressed genes.

    Genome Projects

    Current Methods Make the Sequencing of

    Whole Genomes Possible

    Until the mid-1990s, the genetic and physi-cal study of the genetic make-up of organ-

    isms proceeded in a piecemeal fashion.

    Genes and genetic loci were studied one at a

    time, as they became relevant to a particular

    research topic or project. However, with the

    advent of cloning vectors that can contain

    much larger inserts of intact chromosomes

    and improved sequencing technologies, the

    sequencing of entire genomes has become

    feasible. In this approach, the complete

    DNA sequence of an organism is deter-mined, and all of the genes potentially iden-

    tified by computer analysis of the sequence.

    By carrying out all of the cloning and se-

    quencing at once in a unified project, the

    goal of obtaining a complete sequence oc-

    curs much more rapidly, allowing investiga-

    tors to concentrate on analyzing their inte-

    grated function in the life of the organism.

    Construction of Physical Maps: Overlap-

    ping Clones.

    The complete DNA sequence of genomes is

    obtained by the automated determination

    and analysis of vast quantities of DNA se-

    quence. However, before this can be done,

    the genome must first be obtained in frag-

    ments of sequencable length (a few hundred

    to a few thousand base pairs) whose rela-

    tionship to one another is known. For this

    purpose, a complete physical map is con-

    structed. This consists of overlapping

    cloned fragments of the genome, usually incosmid, P1, BAC, or YAC vectors. As such

    a map is being constructed, overlapping

    groups of contiguous fragments, termed

    contigs, are built up. Contigs are progres-

    sively joined to each other as more and more

    fragments are mapped, until, when the

    physical map is completed, the number of

    contigs equals the number of chromosomes.

    The Physical Map Is Correlated with theGenetic Map

    The availability of complete genomic se-

    quences allows correlation of the physical

    sequence with genetic markers and the use

    of mutants to understand the function of the

    sequence. Physical and genetic maps are

    correlated in several ways. Physical se-

    quences that differ between strains (or in

    lineages, e.g. of humans), resulting in "re-

    striction fragment length poly-morphisms" (RFLPs), are genetically

    mapped in the same way that any other ge-

    netic difference is mapped. (An RFLP re-

    sults whenever a particular, detectable (eg.

    by hybridization to a probe) restriction

    fragment differs in size between two organ-

    isms. This can come about because one or

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    18/47

    18

    both of the restriction sites that define the

    fragment are mutated, because a new restric-

    tion site arises between them, because DNA

    has been deleted or inserted between the two

    sites, or because some other rearrangement

    has separated them.) Cloned DNA frag-ments may also be mapped to chromosomes

    and parts of chromosomes by in situ hy-

    bridization techniques. This approach is

    particularly powerful in Drosophila, where

    the genetic map is already correlated in de-

    tail with the polytene chromosome banding

    pattern. A third approach is the identifica-

    tion of functional genes on cloned DNA

    fragments by the complementation of known

    mutations (complementation rescue). This

    approach is particularly powerful in organ-isms that are easily transformed, such as

    yeast and C. elegans.

    Eukaryotic Genomes Contain a Large

    Amount of Repetitive DNA.

    In spite of the fact that eukaryotic genomes

    may have more genes than originally

    thought, it remains true that these genomes

    contain a great deal of non-coding sequence.

    Some of this "extra DNA" appears simply to

    be non-functional unique sequence.

    Unique sequence is DNA sequence that oc-

    curs only once in the haploid genome.

    Some extra DNA is accounted for by in-

    trons. Other sequences make up distinct

    classes ofrepetitive DNA. Repetitive DNA

    is DNA sequence present more than once

    per haploid genome. Repetitive DNA can

    make up anywhere from a small fraction to a

    majority of the genomic DNA of eukaryotic

    organisms. Typically it represents some

    20% to 50%.

    The first evidence that genomes con-

    tained DNA apart from unique sequences

    came from analysis ofreannealing kinetics.

    When genomic DNA was denatured (e.g. by

    heating) to cause the strands to separate, and

    then allowed to reanneal to the double-

    stranded form (at a lower temperature), the

    rate of reannealing was not consistent with a

    single kinetic component.

    When double-stranded DNA is dena-

    tured and renatured, the rate of reannealing,like the rate of other bimolecular reactions,

    is dependent on concentration. In the case

    of double-stranded DNA, the relevant con-

    centration is the concentration ofsimilar or

    identical DNA sequences, since only these

    can interact to anneal. The concentration of

    a pair of similar sequences in a mixture of

    nucleic acids depends on the complexity of

    the mixture, that is, the number ofdifferent

    sequences in the mixture. When the kinetics

    of renaturation of eukaryotic genomic DNAwas measured, it was found that much of the

    DNA reannealed at a rate higher than ex-

    pected for unique sequences. This indicated

    that these sequences were repeated within

    the genome. In fact, there were several ki-

    netic components, indicating sequences pre-

    sent from 10 times to millions of times in

    the genome. This kind of kinetic analysis is

    called Cotanalysis, because the kinetic data

    were typically presented in a plot of percent

    DNA annealed versus the product of theDNA concentration (Co) and time of anneal-

    ing (T).

    There Are Several Kinds of Repeated Se-

    quences

    The fastest kinetic component in a

    Cot analysis of eukaryotic DNA typically

    annealed essentially instantaneously, and in

    a concentration-independent manner. This

    component consists of inverted repeats.

    These are similar sequences joined close

    together and in inverted orientation, so that

    they reanneal in a so-called "snap-back" or

    "foldback" reaction. Inverted repeats are

    often members of other repetitive sequence

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    19/47

    19

    families elsewhere present as isolated re-

    peats.

    The second fastest kinetic compo-

    nent consists of sequences present millions

    of times in the genome. These are simple

    sequences consisting of long stretches of ashort repeat, such as ...ATATATATAT...

    (from crab) or ...AAGAGAAGAG... (from

    Drosphila). Such sequences are also known

    as satellite sequences. This stems from

    their behavior during density analysis of

    DNA. When the density of eukaryotic DNA

    is analyzed by buoyant density centrifuga-

    tion in a CsCl density gradient, it is found to

    have several components of different den-

    sity. The gradient profile consists ofmain

    band DNA, containing the unique se-quences, including most of the genes, and

    satellite bands, so-called because they lie

    along side the main band on the profile. The

    anomalous, repetitive structure of the simple

    sequence DNA accounts for its variant

    buoyant density. In some eukaryotic ge-

    nomes there is little or no satellite or simple-

    sequence DNA, whereas in others such se-

    quences may make up over 50% of the total.

    The function of satellite DNA is not known.

    Speculation focuses on a possible role dur-ing pairing of homologous chromosomes.

    The next slowest kinetic component,

    lying between the satellite sequences and the

    unique sequences in rate of annealing, con-

    sists of the so-called middle repetitive se-

    quences. There are a great variety of such

    sequences. Some are genes present in mul-

    tiple copies in the genome. Genes for com-

    mon cellular components such as ribosomal

    RNA or histone proteins are often present in

    multiple copies. The multiple copies maybe dispersed in the genome, or may be pres-

    ent in tandem arrays at a single locus.

    Non-functional, corrupted (mu-tated)

    copies of genes, called pseudogenes, make

    up another component of the middle repeti-

    tive DNA. These sequences may have

    arisen in a duplication event, or by reverse

    transcription of an RNA copy of the gene,

    followed by insertion of the DNA copy into

    the genome. DNA copies of mRNA's,

    known as processed pseudogenes, are char-

    acterized by the presence of polyA tails and

    absence of introns. This indicates their ori-gin from reverse transcription of cellular

    mRNA, followed by insertion of the DNA

    copy into the genome. Between 5% and

    10% of the human genome is made up of a

    large pseudogene family known as the Alu

    family (named after the restriction enzyme,

    AluI, that was first used to identify it).

    These 300 bp repeats, present hundreds of

    thousands of times in the genome, probably

    originated as DNA copies of the short cellu-

    lar RNA known as 7SL RNA. 7SL RNAfunctions normally as a component of the

    cellular mechanism that translocates newly

    synthesized proteins across membranes of

    the rough endoplasmic reticulum. Short re-

    peats such as the Alu repeats have been

    dubbed SINES, for "short, interspersed se-

    quences".

    Other families of middle repetitive

    sequences consist of transposable elements.

    These are present in all genomes and have a

    great variety of structures and modes oftransposition. They make up the LINES, or

    "long interspersed sequences", in mammal-

    ian genomes, and are in some cases related

    to the genomes of retroviruses. Endogenous

    retroviral genomes themselves are another

    component of the middle repetitive DNA of

    mammals. Transposable elements make up

    some 20% of theDrosophila genome.

    Finally, there is a class of middle

    repetitive sequences that so far have eluded

    explanation. These sequences typically con-sist of a few hundred base pairs, interspersed

    among other sequences around the genome.

    They have been given the general name in-

    terspersed repeats. They make up families

    of anywhere from a few to hundreds of

    thousands of members, and there are typi-

    cally hundreds to thousands of families in

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    20/47

    20

    eukaryotic genomes. They usually account

    for a large proportion of the middle repeti-

    tive DNA. In spite of their prevalence and

    ubiquity, the origin and function of these

    interspersed repeats remains a mystery.

    While they certainly have an origin, the sus-picion is that they have no function. They

    are the ultimatejunk DNA.

    Maintenance and transmission of the ge-

    netic material

    Special Sequences Control the Replication

    and Transmission of the Genetic Material

    Most organisms use DNA as their genetic

    material. The exceptions are some virusesthat use RNA. The symmetry of DNA per-

    mits replication by polymerases to create

    two exact copies of the genetic material.

    One mechanism of replication involves ini-

    tiation of synthesis at a single point, the ori-

    gin of replication, and replication to com-

    pletion. Many bacteria, plasmids and vi-

    ruses replicate in this fashion. Another

    mechanism involves initiation of DNA

    synthesis at many points on the genome and

    synthesis until the replication forks meet.There may or may not be origins of replica-

    tion that are used during every round of rep-

    lication. Eukaryotes use multiple origins on

    a single DNA molecule. Also eukaryotes

    have linear genomes which require ends

    with special structures, called telomeres,

    both for protection of the DNA, and to per-

    mit the end to be correctly replicated.

    Telomeres have a unique physical structure

    that includes multiple short DNA repeats

    with nicks and a capping hairpin structure.Once a genome has been replicated,

    each copy must be accurately partitioned

    into the two daughter cells. For the bacterial

    circular genome and for some plasmids this

    is accomplished by having a partition se-

    quence in the DNA near the origin of repli-

    cation. These sequences attach to regions of

    the cell wall that grow apart during cell divi-

    sion, dragging the two newly replicated ge-

    nomes apart. For some plasmids and for the

    plasmid-like DNA of mitochondria and

    chloroplasts the genome is maintained inmultiple copies and the cell depends at least

    partly on statistics to ensure that each

    daughter cell or organelle gets at least one

    copy of the genome. Other mechanisms

    then ensure amplification of the genome.

    Eukaryotes generally have their ge-

    nomes distributed on several chromo-somes

    and thus have special problems in assuring

    that each daughter cell gets exactly the right

    set of chromosomes after replication. A

    special structure, the centromere, and at-tached cytoskeletal machinery, the mitotic

    apparatus (mitotic spindle), ensure accu-

    rate segregation of chromosomes. During

    meiosis, in which a diploid cell undergoes

    reductive divisions to yield haploid cells,

    synapsis, or pairing of homologous chromo-

    somes, and a unique meiotic apparatus are

    required to ensure that haploid gametes get

    exactly one of each chromosome.

    Enzymatic Mechanisms Repair DNA Dam-

    age and Recombine the DNA Strands

    As the genetic material, DNA is pre-

    cious and must be protected from damage.

    Ultraviolet light, ionizing radiation, and

    DNA modifying chemicals can damage

    DNA. Many mechanisms exist to repair

    damage that occurs. Excision repair path-

    ways exploit the fact that two copies of ge-

    netic information are stored in the two

    strands of DNA. Damaged bases can be re-

    moved on one strand and then recopied from

    the other. Recombinational repair mecha-

    nisms work by shuffling damaged and un-

    damaged segments that are present in more

    than one copy in the cell to try to put to-

    gether a 'good' genome.

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    21/47

    21

    Even in the absence of detectable

    DNA damage DNA sequences may 'recom-

    bine'. Homologous recombination is at the

    heart of both classical genetics and modern

    "gene-targeting". The mechanism of such

    recombination or cross-over events is con-troversial and probably varies according to

    the organism, but involves breaks in DNA,

    unwinding of strands, hybridization to ho-

    mologous segments of DNA and new DNA

    synthesis and endonuclease strand cleavage.

    The net result is equivalent to a physical

    cleavage of DNA and rejoining to a different

    partner. Some transposons and viruses cata-

    lyze recombination events that involve spe-

    cific DNA sequences that may not be ho-

    mologous (or only for a few bases).

    Recombinant DNA and the Construc-tion

    of Transgenic Organisms

    Genes May Be Amplified in Pure Form by

    "Cloning" Them in Microorganisms.

    Early genetics was dependent on naturally

    occurring mechanisms for the study of ge-

    netic function. In the 1970's techniques

    were developed to manipulate DNA in vitroand move it across species boundaries.

    These cloning techniques rely on enzymes

    that work on DNA. Restriction endonucle-

    ases (commonly called restriction en-

    zymes) cut DNA at specific sequences, of-

    ten palindromic sequences. (A "palin-

    drome" is a word or sentence that reads the

    same forwards or backwards, like "A man, a

    plan, a canal, Panama.", or "Madam, I'm

    Adam.".) For example the restriction en-

    zyme BamHI cuts at GGATCC. BamHI iscalled a 6-cutter because its recognition se-

    quence is six bases long. On average one

    expects a specific six-base sequence like

    GGATCC to occur once every 4Kb of DNA,

    but of course some fragments are much big-

    ger or smaller. Furthermore, the average

    size depends on the GC/AT content of the

    DNA being cut, and the relative numbers of

    G or C vs A or T nucleotides in the restric-

    tion site. The size of DNA fragments can be

    determined on agarose gels. About 150 re-

    striction enzymes with different recognition

    sequences are available commercially. Theposition of restriction sites in a piece of

    DNA can be determined, giving a restric-

    tion map useful for subsequent manipula-

    tions. Fragments of DNA can be joined to

    one another by another enzyme, DNA li-

    gase. Together the ability to cut DNA,

    separate fragments by size, and then rejoin

    them in a new combination in vitro, forms

    the basis for the powerful cloning technolo-

    gies.

    Though new DNA molecules can bemade in vitro, the yield is usually low.

    However, by cloning DNA into a vector

    capable of replication, the recombinant

    DNA can be amplified in vivo. Further-

    more, by placing the recombinant DNA into

    a microorganism, a single defined segment

    of a large genome can be separated from the

    remainder of the genome simply by select-

    ing a clone of organisms, like a bacterial

    colony or a phage plaque. This is the origin

    of the term "cloning". Depending on thevector being used, a variety of methods are

    then available for separating the vector plus

    insert from the host microorganism's own

    DNA.

    Common sources of vector DNA are

    viruses and plasmids that are capable of rep-

    lication inE. coli. E coli is a useful host for

    amplifying DNA since it is easy to grow to

    high density (2 X 10-9 cells/ml) and has rela-

    tively little DNA of its own. Some virus

    vectors (e.g. the filamentous phage M13)only infect certain stains ofE. coli. Vectors

    such as yeast YACs and mammalian retrovi-

    ral expression vectors are shuttle vectors

    that can replicate in bothE. coli and eukary-

    otic cells. Often bulk recombinant DNA is

    made inE. coli and an experiment is done in

    another organism.

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    22/47

    22

    An example of vector cloning fol-

    lows. The bacteriophage has a genome of

    about 50kb. A region containing about 1/3

    of the genome serves no function (the so-

    called financial district) and can be replaced

    by other DNA including E. coli or foreignDNA. These clones can replicate just like

    the original bacterial virus, but now when-

    ever they duplicate they also duplicate the

    inserted DNA.

    For the most part DNA is DNA and

    can be moved from organism to organism

    without problems. However there are three

    common problems in transferring DNA that

    we will discuss briefly. 1) DNA can be

    modified in ways that affect function. 2)

    DNA can contain sequences that get rear-ranged in some organisms. 3) A cloned se-

    quence may make a protein toxic to some

    cells.

    DNA modifications are common and

    include sequence specific methylation of

    bases. These methylations can affect gene

    function and resistance to digestion with

    specific restriction enzymes. Strains ofE.

    coli that lack many of the offending DNA

    methylases (e.g. mcrA) have been con-

    structed. Also, some strains ofE. coli makerestriction enzymes that destroy unmodified

    DNA. Take care! Often a specific cloning

    project requires a specific host strain that

    modifies or does not modify DNA (see the

    New England Biolabs catalog or Molecular

    Cloning for details).

    E. coli does not like DNA containing

    short direct repeats or with inverted repeats,

    both of which tend to get deleted from

    cloned fragments by the very active E. coli

    recombination pathway. This can be a prob-lem when cloning eukaryotic DNA in which

    such structures are common. E. coli host

    vectors that are defective in recombinases

    (like recA) are helpful but do not completely

    solve the problem. E. coli vectors do not

    tolerate more than about 20kb of DNA.

    Yeast artificial chromosomes (YACs) are

    useful for cloning up to 400kb of DNA. As

    the name implies YACs are grown in yeast.

    Eukaryotic cells such as yeast are tolerant of

    repeated DNA, and hence repetitive se-

    quences that cannot be cloned inE. coli can

    often be cloned in a YAC.Cloned DNA may express proteins

    that kill a specific host. For example, even

    though eukaryotic promoters and introns do

    not function in E. coli, often a polypeptide

    derived from one exon will be expressed in

    E. coli. E. coli is especially sensitive to hy-

    drophobic proteins that interfere with secre-

    tion (a secA strain may tolerate such clones)

    and to DNA binding proteins.

    The Polymerase Chain Reaction (PCR) Is aWay to "Clone" DNA Directly In Vitro

    Instead of amplifying a defined DNA seg-

    ment by ligating it to a vector and introduc-

    ing it into a microorganism, it is possible to

    amplify it enzymatically by the polymerase

    chain reaction (PCR). In PCR, the DNA

    segment between two short (15 to 30

    nucleotides long) single-stranded

    oligonucleotide primers is copied by a

    primer-dependent DNA polymerase. The

    polymerase used is from a thermophilic

    bacterium. This makes it possible to carry

    out many cycles of synthesis automatically

    by alternately heating the reaction mixture

    to melt all DNA strands (the polymerase is

    not inactivated by the high temperature re-

    quired for this), and then cooling it to allow

    the primers to anneal and the polymerase to

    function by extending them. In each succes-

    sive cycle of melting and replication, the

    amount of the DNA segment between the

    two primers increases exponentially, as the

    product of each synthetic round serves as

    template in the next.

    PCR can be extremely specific and

    sensitive. Specificity is provided if each of

    the primers anneals to only the single, in-

  • 7/30/2019 Essentials of Molecular Genetics.pdf

    23/47

    23

    tended sequence. In 30 cycles of polymeri-

    zation, biochemically detect-able and useful

    quantities of a sequence from 50 to 5000

    bases in length can be amplified from tiny

    amounts of complex mixtures, such as the

    genomic DNA of vertebrates. The syntheticproduct can subsequently be sequenced,

    used as a labeled probe, or cloned for further

    in vitro modification.

    Genes Are Cloned by Isolating Them from

    Clone Libraries or Clone Banks

    The fundamental importance of gene clon-

    ing is that it allows the purification of a sin-

    gle gene out of the thousands or tens of

    thousands present in the genomes of com-plex organisms. To accomplish this feat, it

    is first necessary to introduce all the genes

    of the organism under study into a culture of

    microorganisms. The task is then to identify

    a clone of the microorganism that contains

    the single gene of interest. The mixed cul-

    ture of microorganisms is termed a clone

    library orclo