genomics biology 122 genes and development. first genome: haemophilus influenza, 1995; by craig...
Post on 19-Dec-2015
214 views
TRANSCRIPT
Genomics
Biology 122
Genes and Development
First genome: Haemophilus influenza, 1995; by Craig Venter and TIGR
Genomics milestones
Human genome, draft sequences, 2001: Two groups (Francis Collinsof the Public consortium ; Craig Venter and CELERA)
Now: 1000’s of bacteria have been sequenced. Hundreds of human genomeshave been sequenced!
NCBI, Nov. 2010
From Genome.gov Human genome conference 6/7/2010
Restrictionanalysis
FISH
Fig. 18.2
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
a.
b.
abl
bcr
bcr (on normal 22)
abl (on normal 9)
fused gene
9abl
bcr
der 9
22
Normal interphase nucleus
Reciprocal translocationbetween one 9 and one #22chromosome forms anextra-long chromosome 9 (“der9”) and the Philadelphiachromosome (Ph1) containingthe fused bcr-abl gene. This is aschematic view representingmetaphase chromosomes.
Ph1
Interphase nucleus of leukemiccell containing the Philadelphiachromosome (Ph1)
b: Reprinted by permission from Macmillan Publishers Ltd: Bone Marrow Transplantation 33, 247-249, “Secondary Philadelphia chromosomeafter non-myeloablative peripheral blood stem cell transplantation for a myelodysplastic syndrome in transformation,” T Prebet, A-S Michallet, C
Charrin, S Hayette, J-P Magaud, A Thiébaut, M Michallet, F E Nicolini © 2004
Sequence-taggedsites (STS)
Comparison of genetic and physical maps
Manual sequencing
Automated DNA sequencing
Estimated genes in sequenced genomes
Transposable elements
Alternative splicing
b.
c.
T C G GT CT C G G T AT A AG C
C C C GA CT T G A T GA T GG T
Chromosome 1
SNP SNP SNP
SNPs
Diagnostic SNPs
Haplotype 1
Haplotypes
Haplotype 2
Haplotype 3
Haplotype 4
Chromosome 2
Chromosome 3
Chromosome 4
a.
A A C A A A AT T TC C CG
C T C A AA G TA C G G T GT A AGC C
T T G
A C C
G T C
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
A T C
A C G
A GT G CT C A A C AA T AAG T
G T C
A
A/G T/C C/G
C C
C C T C C CG G G GG G G
A A C A A A AT T TC C CG C C T C C CG G A GA G G
A A C A A A AT T TT C CG C C T C C CG G A GG G G
A A C A A A AT T TC C CG C C T C C CG G G GG G G
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Genomevariation
Comparison of plant genomes (Comparative genomics)
Fig. 18.9Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Rice Genome
Corn Chromosome Segments
1 2 3 4 5 6
7 8 9 10 11 12
1 2 3 4 5 6 71 2 3 4 5 6 7 8 9 10
A B C D F G H I
SugarcaneChromosome Segments
WheatChromosome Segments
Genomic Alignment (Segment Rearrangement)
Rice
Sugarcane
Corn
Wheat
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Hypothesis: Flowers and leaves will express some of the same genes.
SCIENTIFIC THINKING
Prediction: When mRNAs isolated from Arabidopsis flowers and from leaves are used as probes on an Arabidopsisgenome microarray, the two different probe sets will hybridize to both common and unique sequences.
Genome sequencing projects statistics Organism Complete Draft assembly In progress total
Prokaryotes 850 585 534 1969
Archaea 78 5 32 115
Bacteria 773 580 502 1855
Eukaryotes 39 249 320 608
Animals 6 110 159 275
Mammals 3 37 81 121
Birds 3 12 15
Fishes 13 12 25
Insects 2 26 20 48
Flatworms 2 3 5
Roundworms 1 13 12 26
Amphibians 1 1
Reptiles 1 1
Other animals 16 22 38
Plants 7 23 78 108
Land plants 4 19 73 96
Green Algae 3 4 4 11
Fungi 16 83 39 138
Ascomycetes 13 63 28 104
Basidiomycetes 1 12 8 21
Other fungi 2 8 3 13
Protists 10 31 40 81
Apicomplexans 5 10 4 19
Kinetoplasts 4 1 3 8
Other protists 1 19 33 53
total: 889 834 854 2577
Revised: Nov 18, 2010
Genomes deposited at NCBI
GOLD (Genomes Online Database)
Complete Incomplete Targeted
Bacterial 2666 5493 424 Metagenome studies 340
Archaeal 149 182 1 Metagenome samples 1930
Eukaryotic 166 2037 13
[Metagenome are environmental samples]
Finished 1960
Permanent draft 1021
Complete, not published 26
Draft 1529
In progress 3426
DNA received 266
Awaiting DNA 510
Targeted (funded, not started) 438
Date 11/23/2011
NCBI, Genomes
Species Reference sequences In progress
Viroids 41 41
Viruses 2721 3933
Bacterial 1681 5140
Archaeal 121 90
Eukaryotes 1815
Organelles 2974
Date 11/23/2011
Human Disease genes
From Genome.gov, 11-2010
Animals Vertebrates
Amphipod Crustacean Chicken
Aphid, Pea Coelacanth
Beetle, Red Flour Gar, Spotted
Bug (Chagas' Vector) Hagfish
Centipede, Geophilimorph Lamprey, Sea
Chelicerate (Horseshoe Crab) Lizard, Anole
Drug Resistant Parasitic Nematode Pufferfish
Freshwater Polyp Shark, Elephant
Fruit Fly Skate
Honey Bee Spotted African Lungfish
Louse, Body Stickleback, Threespine
Mosquito Turtle, Painted
Placazoan Zebra finch
Planarian
Roundworm
Sand Fly
Sea Slug
Sea Squirt
Sea Star
Sea Urchin
Snail, Freshwater
Strongylid Nematode
Tardigrade
Wasp, Parasitoid
Worm, Acorn Genome.gov
Worm, Priapulid 11/22/2011
Animal genomes in progress, November 2011 (genome.gov)
Mammals
Aardvark Guinea Pig Opossum, Gray Short-Tailed
Alpaca Hedgehog, European Opossum, Laboratory
Armadillo, Nine-banded Hippopottamus Orangutan
Baboon Honey Possum (Noolbenger) Pangolin
Bat, Little Brown (Microbat) Horse Pika
Bat, Big brown Human Platypus, Duck-Billed
Bonobo Hyrax Rabbit
Bushbaby Koala Rat
Bushbaby/Galago Lemur, Flying Rat, Kangaroo
California leaf-nosed bat Lemur, Mouse Rhesus Macaque
Cape golden mole Lesser Egyptian jerboa Ring-tailed lemur
Cat Lizard, Anole Shrew, Elephant
Chimpanzee Llama Shrew, European Common
Chinchilla Long-haired (Rufous) elephant shrew Shrew, Tree
Chinese hamster Macaque, Cynomolgous Sloth
Cow Macaque, Pigtail Springhare
Crested porcupine Macaque, Rhesus Squirrel
Degu Macaque, Rhesus (Chinese population) Star nosed mole
Dog Malayan tapir Stickleback, Threespine
Dolphin Mangabey, Sooty Syrian/Golden Hamster
Eastern grey kangaroo Marmoset Tarsier
Elephant, African Savannah Mexican free-tailed bat Tenrec (Lesser Hedgehog)
Ferret Mole Vervet
Fly Fox (Megabat) Monkey, Squirrel Vole, Prairie
Giant anteater Mouse Wallaby, Tammar
Gibbon Mouse, Deer Water Chevrotain
Golden-mantled howling monkey Mouse, White-Footed Weddell Seal
Greater horseshoe bat Naked mole rat West Indian manatee
North American porcupine White rhinocerous
Mammal genomes in progress, November 2011 (genome.gov)
Neanderthals
Science Nov 17, 2006
Neanderthals
• 99.5% identical to humans when comparing the same sequences
Neanderthals
Draft sequence published May 7, 2010.
Neanderthals from four sites (see map) 21 bones from Vindija analyzed for this study 3 bones were selected for detailed sequencing (from three individuals) Bones from three other sites were also sequenced (see map)
Compared Neanderthal to five human genomes
Conclusion: Non-African humans contain some Neanderthal derived sequences (1 to 4%) (gene flow estimated to be Neanderthal to Human, and occurred > 45,000 years ago)
Notes: Humans and Neanderthals lived in the same area for > 10,000 years. Neanderthals perished 30,000 years ago.
Neanderthals
Four models of how the gene transfer could have occurred (option 2 is least likely, option 3 most likely)
Transfer most likely occurred inMiddle-East/Western Asia
PNG = Papua New Guinea
Denisovians
Third type of human genome sequencedFinger bone found in the Denisova cave in Altai Krai, Russia in 2008The Denisova bone had a genome distinct from modern humans or
NeanderthalsThe bone was dated to 41,000 years agoSince only bone fragments are known, it is not known how they lookedIt is thought that they were distributed throughout Asia and Melanasia
Analysis of the genome, and comparison with humans and neanderthals, suggests that 4% of non-African DNA is related to neanderthals and 4 to 6% of melanasian genomes is related to denisovians. This suggests some interbreeding between the first modern humans, neanderthals, and denisovians.
Analysis of HLA types (immune proteins) suggests that over half of eurasian HLA types came from neanderthals or denisovians, suggesting that they were selected for in the eurasians.
Watson’s genome
• Sequenced using shotgun sequencing
• About 3.5 percent of Watson’s genome could not be matched to the reference genome-probably due to differences in cloning step
• 32 million reads resulted in 2.8 billion base pairs of assembled sequence (7.5 fold coverage)
• 4.1 million differences to the already published genome (12.3 million bases different)
• 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions.
Venter’s genome compared to the reference genome
How different are individuals?
• 44% of genes were heterozygous for one or more variants (they could determine both copies)
• A conservative estimate that a minimum of 0.5% variation exists between two haploid genomes (all heterozygous bases).
How different are individuals?
• DNA from a Yoruba from Ibadan, Nigeria was completed.
• About 4 million SNPs were found, 74% had already been found by others.
• About 24% more polymorphism (heterozygosity) than Caucasian genomes.
• There were 5,704 indels ranging from 50 to over 35,000 bp long. Many were SINES and LINES.
Bentley et al., Nature, November 6, 2008
How different are individuals?
• DNA from a Han Chinese individual was completed.
• About 3 million SNPs were found, 86% had already been found by others.
• About 24% more polymorphism (heterozygosity) than Caucasian genomes.
• There were 2,682 structural variations, including insertions, deletions, and inversions. Many variations in SINES and LINES were found.
Wang et al., Nature, November 6, 2008
How different are cancer cells?
• DNA from skin cells and acute myeloid leukemia cells from the same Caucasian woman were sequenced.
• About 2.9 million SNPs were found in the skin cells, and 3.8 million in the leukemia cells.
• Almost all of the differences in SNPs were found to be common in other sequenced genomes or not in genes.
• Ten genes were found to have acquired mutations in the leukemia cells. Of these, two were known to be involved in tumour progression. The functions of the other eight mutant genes are unknown.
Ley et al., Nature, November 6, 2008
Metabolomics
• A study of 284 males compared 383 metabolic indicators and SNPs (genetic variants).
• Up to 12% of the levels of the metabolic molecules could be explained by particular versions of the gene (SNP).
• Four genes were known to be in metabolic pathways related to the metabolic molecule that was high or low.
Geiger et al., PLOS Genetics. November, 2008
Wooly mammoth
• Over 4 billion bp in genome
• Mammoths and African elephants differ in about 1 amino acid per protein
• Estimate that Mammoths and African elephant separated 1.5 to 2.0 Million years ago
Nature, November 20, 2008
Wooly mammoth
Recent genome news
Nov 19, 2011
Malaysian Genomics Resource Centre Berhad (MGRC) today announced that it has successfully completed its 100th human genome from a diverse mix of Malaysian, European and Australian individuals.
The results of the data generated from these genomes has helped in efforts to identify and compare highly represented patterns of common and clinically-relevant genetic variations within Malaysian and other populations, and to establish robust bioinformatics protocols for the reference-based analysis of genomic information.
Recent genome news
Nov 23, 2011
A study of 11,000 children and adults found that very short people (the lowest 2.5% of the population) are missing more genes or parts of genes than taller people.
Recent genome news
November, 2011
The mythical "$1,000 genome" is almost upon us (in 2012), said Jonathan Rothberg, CEO of sequencing technology company Ion Torrent, at MIT's Emerging Technology conference.
November 2, 2011
Duke University said last week that it will sequence 4,000 individuals as part of a collaborative, $25 million effort to identify as many genes as possible implicated in epilepsy.
Maize (corn) genome
Maize has 10 chromosomes, 2.3 billion base pairs
The sequencing was done using clone-by-clone method, with 16,848 BACs sequenced, assembled, and analyzed.
There are estimated to be 32,500 protein encoding genes, and 150 microRNA genes (miRNA).
Approximately 75% of the genome is repeated DNA.It has over 400 families of LTR retrotransposons with over 31,000 different sequences.
P. S. Schnable et al., Science 326, 1112-1115
(2009)
Fig. 1 The maize B73 reference genome (B73 RefGen_v1): Concentric circles show aspects of the genome
1000 Genomes projectThe 1000 Genomes Project is an international collaboration to produce an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts.
This resource will support genome-wide association studies and other medical research studies.
The genomes of about 2500 unidentified people from about 27 populations around the world will be sequenced using next-generation sequencing Technologies.
Highlights
Over 4.9 trillion nucleotides sequencedOver 800 individuals (179 people had their whole genomes sequenced
and 697 people just the protein-coding regions)Each child had around 60 mutations in its genome that did not exist in
either parentOver 15 million SNPs discoveredeach individual is carrying a significant number of deleterious mutations,
maybe 250 or 300 genes that have defective copies
3 billion Number of DNA letters in the human genome (200 volumes the size of a Manhattan telephone book, which has around 1,000 pages)
20,000-25,000 Number of genes in the genome (though not all scientists agree)2000 Year the first draft of the human genome was announced to much
fanfare at the Clinton White House2003 Final draft completed to 99.99% accuracy
2500 Number of people whose genomes the 1,000 Genomes Project hopes to sequence, from 25 populations
15 million Number of single-letter changes identified in the pilot phase
1 million Number of small insertions and deletions identified in the pilot phase
4.9 trillion Number of letters of data sequenced by the 1,000 Genomes Project so far
1094 Genomes completed for 1094 individuals, 6/23/11
1000 Genomes project http://www.1000genomes.org/home
Human microbiome
Adults harbor ten times more microbial cells than they have human cells.
Examination of how these microbes impact human health through their association with the body, for example by influencing metabolism, disease susceptibility and drug response is key for improving human health.
Through the Comparative Genome Evolution (CGE) program, NHGRI approved a limited project – Sequencing of Cultivable Microbes from Human Gut – to obtain reference genome sequence data from up to 300 cultured bacteria and archea sampled from the human digestive tract and urogenital tract in September 2005.
The object is three-fold: to start to generate reference data for future large-scale metagenomics studies; to understand the diversity of bacterial pangenomes, and to start to address the technical and bioinformatic challenges that human metagenomics research will encounter.
From Genome.gov, 11-2010
Scientists involved in the Genome 10K Project are assembling specimens of thousands of animals spanning a broad range of evolutionary diversity.
Photos courtesy of San Diego Zoo.
From http://news.ucsc.edu/2009/11/3333.html
Scientists propose a "genome zoo" of 10,000 vertebrate speciesNovember 03, 2009By Branwyn Wagman, Guest Writer (831) 459-3077
10,000 vertebrate genomes
In the most comprehensive study of animal evolution ever attempted, an international consortium of scientists plans to assemble a genomic zoo--a collection of DNA sequences for 10,000 vertebrate species, approximately one for every vertebrate genus.
Known as the Genome 10K Project, it involves gathering specimens of thousands of animals from zoos, museums, and university collections throughout the world, and then sequencing the genome of each species to reveal its complete genetic heritage.
Launched in April 2009 at a three-day meeting at the University of California, Santa Cruz, the project now involves more than 68 scientists. Calling themselves the Genome 10K Community of Scientists (G10KCOS), the group outlined its proposal to create a collection of tissue and DNA specimens for the project in a paper to be published online November 5 in the Journal of Heredity.
From http://news.ucsc.edu/2009/11/3333.html