1 genomics notes

Upload: parisha-singh

Post on 03-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 1 Genomics Notes

    1/4

    Genomes

    A genome is the completely (or almost completely) determined DNA sequence of the genetic

    material (chromosomes as well as any plasmids, mitochondrial DNA, etc) of an organism.

    The word is somewhat of a misnomer: a genome isn't the same as 'all genes', it is rather'sequence of all DNA' wherein all genes can be found.

    The first genome of a free living organism (viruses aside) was that of Haemophilus

    influenzaepublished in 1995(Fleischmann et al, Science (1995) vol 269, pp 496-512).

    Why are complete genomes interesting?

    The most basic answer to that question is that we want to know the complete set of genes

    that an organism has.

    The genome of an organism in a certain sense the blueprint for that organism.

    Many observations and experiments in biology involve mutants and mutations, and knowing

    the complete set of genes for an organism can help with the analysis.

    For example, we may want to be sure that a knocked-out gene does not have a backup copy

    somewhere in the genome.

    knowing the complete genome for an organism is only the first step in the complete

    mapping of the constituents and processes of the organism.

    The complete genome is a necessary (but not sufficient) requirement for understanding an

    organism.

    And yet another answer to the question is emerging: the availability of more and more

    complete genomes allows entirely new kinds of comparisonsto be made between

    organism.

    New types of analysis can be applied to old questions in biology, involving problems in

    evolutionary history relating to the interactions between species.

    However, the cost and effort required to sequence a genome, especially a bacterial genome, israpidly diminishing, as new and/or improved technologies and tools are developed.

    We will soon find ourselves in a situation where the availability of a genome is going to be

    considered a basic requirement for working on a specific organism.

    How many genomes have been completed ?

    http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=r
  • 8/12/2019 1 Genomics Notes

    2/4

    TIGR Microbial Database,published microbial genomes,andprojects in progress.Lists maintained byTIGR,The Institute for Genomics Research. This institute was

    founded by Craig Venter, and was the first to sequence complete genome for a

    bacterium,Haemophilus influenzae, in 1995.

    EBI Completed Genomesweb site, links, resources. Contains sequence data fororganelles (mitochondria), phages and viruses.

    Genome Monitoring Tableby Stephan Beck and Peter Sterk at EBI. NCBI Genomic Biologyweb site, with links and search resources. GOLD(Genomes OnLine Database), maintained by the companyIntegrated

    Genomics,which is selling annotation services for companies that have in-house

    genome projects (primarily bacterial genomes).

    Organism Type

    Genome

    size

    (Mb)

    Number

    of genesLinks Comment

    Haemophilus

    influenzaeBacterial 1.83 1850

    Haemophilus

    influenzaepage at

    TIGR.

    The first genome of a free-

    living organism.1995

    Escherichia coli Bacterial 4.64 4289

    E.coli Genome

    ProjectUniversity

    of Wisconsin-

    Madison

    The most studied bacterium.

    1997

    Rickettsia

    prowazekiiBacterial 1.11 834

    The first genome to be

    sequenced in Sweden (Siv

    Andersson, Uppsala).1998

    Methanococcus

    jannaschiiArchaeal 1.66 1750

    Methanococcus

    jannaschiipage at

    TIGR.

    The first sequenced Archaea.

    1996

    Saccharomyces

    cerevisiaeEukaryote 12.1 6294

    SGD,MIPS yeast

    DB

    The first sequenced

    eukaryote.1997

    Caenorhabditis

    elegans

    Eukaryote,

    nematode

    97 18,424

    WormBase,C.

    elegans Genome

    Project

    The first sequenced

    multicellular organism. 1998

    Drosophila

    melanogaster

    Eukaryote,

    insect137 13,601 BDGP,Flybase

    Celera Corp, publicly

    available.2000

    Arabidopsis

    thaliana

    Eukaryote,

    plant125 25,498

    The Arabidopsis

    Information

    resource

    The first plant. 2000

    Homo sapiensEukaryote,

    primate3,000 50,000 ?

    HGP at Sanger,

    HGP at Oak Ridge

    Ensembl

    Rough draft exists. Not yet

    finished, except for

    chromosomes 21 and 22.

    http://www.tigr.org/tdb/mdb/mdbcomplete.htmlhttp://www.tigr.org/tdb/mdb/mdbcomplete.htmlhttp://www.tigr.org/tdb/mdb/mdbcomplete.htmlhttp://www.tigr.org/tdb/mdb/mdbinprogress.htmlhttp://www.tigr.org/tdb/mdb/mdbinprogress.htmlhttp://www.tigr.org/tdb/mdb/mdbinprogress.htmlhttp://www.tigr.org/http://www.tigr.org/http://www.tigr.org/http://www.ebi.ac.uk/genomes/http://www.ebi.ac.uk/genomes/http://www.ebi.ac.uk/~sterk/genome-MOT/index.htmlhttp://www.ebi.ac.uk/~sterk/genome-MOT/index.htmlhttp://www.ncbi.nlm.nih.gov/Genomes/index.htmlhttp://www.ncbi.nlm.nih.gov/Genomes/index.htmlhttp://wit.integratedgenomics.com/GOLD/http://wit.integratedgenomics.com/GOLD/http://www.integratedgenomics.com/http://www.integratedgenomics.com/http://www.integratedgenomics.com/http://www.integratedgenomics.com/http://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=ghihttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=ghihttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=ghihttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www.genetics.wisc.edu/http://www.genetics.wisc.edu/http://www.genetics.wisc.edu/http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9278503&form=6&db=m&Dopt=rhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9278503&form=6&db=m&Dopt=rhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9823893&form=6&db=m&Dopt=bhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9823893&form=6&db=m&Dopt=bhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9823893&form=6&db=m&Dopt=bhttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=arghttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=arghttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=arghttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8688087&form=6&db=m&Dopt=bhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8688087&form=6&db=m&Dopt=bhttp://genome-www.stanford.edu/Saccharomyces/http://genome-www.stanford.edu/Saccharomyces/http://www.mips.biochem.mpg.de/proj/yeast/http://www.mips.biochem.mpg.de/proj/yeast/http://www.mips.biochem.mpg.de/proj/yeast/http://www.mips.biochem.mpg.de/proj/yeast/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9169865&dopt=Abstracthttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9169865&dopt=Abstracthttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9169865&dopt=Abstracthttp://www.wormbase.org/http://www.wormbase.org/http://www.sanger.ac.uk/Projects/C_elegans/http://www.sanger.ac.uk/Projects/C_elegans/http://www.sanger.ac.uk/Projects/C_elegans/http://www.sanger.ac.uk/Projects/C_elegans/http://www.sanger.ac.uk/Projects/C_elegans/http://www.flybase.org/http://www.flybase.org/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10731132&dopt=Abstracthttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10731132&dopt=Abstracthttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10731132&dopt=Abstracthttp://www.arabidopsis.org/home.htmlhttp://www.arabidopsis.org/home.htmlhttp://www.arabidopsis.org/home.htmlhttp://www.arabidopsis.org/home.htmlhttp://www.sanger.ac.uk/HGP/http://www.sanger.ac.uk/HGP/http://www.ornl.gov/TechResources/Human_Genome/http://www.ornl.gov/TechResources/Human_Genome/http://www.ensembl.org/http://www.ensembl.org/http://www.ensembl.org/http://www.ornl.gov/TechResources/Human_Genome/http://www.sanger.ac.uk/HGP/http://www.arabidopsis.org/home.htmlhttp://www.arabidopsis.org/home.htmlhttp://www.arabidopsis.org/home.htmlhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10731132&dopt=Abstracthttp://www.flybase.org/http://www.sanger.ac.uk/Projects/C_elegans/http://www.sanger.ac.uk/Projects/C_elegans/http://www.sanger.ac.uk/Projects/C_elegans/http://www.wormbase.org/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9169865&dopt=Abstracthttp://www.mips.biochem.mpg.de/proj/yeast/http://www.mips.biochem.mpg.de/proj/yeast/http://genome-www.stanford.edu/Saccharomyces/http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8688087&form=6&db=m&Dopt=bhttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=arghttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=arghttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9823893&form=6&db=m&Dopt=bhttp://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9278503&form=6&db=m&Dopt=rhttp://www.genetics.wisc.edu/http://www.genetics.wisc.edu/http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=95350630&form=6&db=m&Dopt=rhttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=ghihttp://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=ghihttp://www.integratedgenomics.com/http://www.integratedgenomics.com/http://wit.integratedgenomics.com/GOLD/http://www.ncbi.nlm.nih.gov/Genomes/index.htmlhttp://www.ebi.ac.uk/~sterk/genome-MOT/index.htmlhttp://www.ebi.ac.uk/genomes/http://www.tigr.org/http://www.tigr.org/tdb/mdb/mdbinprogress.htmlhttp://www.tigr.org/tdb/mdb/mdbcomplete.html
  • 8/12/2019 1 Genomics Notes

    3/4

    The analysis of a genome

    Define the location of genes (coding sequences, regulatory regions): gene prediction(identification).

    o Gene prediction ab initiousing software based on rules and patterns. FindOpen Reading Frames (ORFs), with additional criteria for good start sequencefor a gene. This is considered reasonably easy for bacteria, but is very difficult

    for eukaryotes.

    o Gene identification through alignment with know proteins and EST sequences(Expressed Sequence Tags; mRNA sequences).

    o Gene prediction through similarity with proteins or ESTs in other organisms.o Gene prediction through comparison with other genomes; conserved regions

    are probably coding or regulatory regions. This is called synteny, and is very

    promising for analysis of higher eukaryote genomes.

    Annotation of the genes: Compare with genes/proteinsof known function in otherorganisms. This is essentially the same as labellingthe gene.

    Functional classification. Broad groups of functional characterization, such as'ribosomal proteins', 'nucleotide metabolism', 'signal transduction'.

    Metabolic pathways.o Are any common pathways missing?o Are there 'gaps' (missing enzymes) in some pathways?o Compare identified pathways with the life style of the organism.

    Evolutionary historyo Internal genome duplications can sometimes be detected.o Gene decay can sometimes be characterized: genes that are on their 'way out'

    after duplication, or because the life style of the organism has changed.

    oHorisontal gene transfer: genes that have been acquired from anotherorganism.

  • 8/12/2019 1 Genomics Notes

    4/4

    Comparative Genomics

    It is now possible to investigate which sets of genes are common to many differentorganisms, or groups of organisms. Is there a common core of genesnecessary for all

    life? Is that core sufficient for life?

    If one looks at a specific, and yet fundamental, component such as the ribosome andthe protein synthesis, can one say anything about whether this system has changed

    fundamentally through evolution, or has it stayed basically the same throughout?

    Have there been inventions during evolutionin such a fundamental system?

    Which genes are necessary for multicellular lifeforms; which set of genes are onlyfound in multicellular organisms but not in unicellular ones?

    The rate of horizontal gene transfer(genes that have jumped the species barrier)among bacteria can now be investigated. How often, and under what circumstances dobacteria exchange genes? Has anything similar happened with higher organisms?

    Where and how have new genes emerged in evolutionary history? Can precursorsof some gene families be found in distant relatives of a species?

    The problem of identifying and characterizing orthologous genes versus paralogousgenes becomes easier to address (but not necessarily solve).

    o Orthologuesare genes that have diverged from a common ancestor becauseof a speciation event.

    o Paraloguesare genes that have diverged as the result of a gene duplicationevent.