genomics biology 122 genes and development. first genome: haemophilus influenza, 1995; by craig...

Post on 19-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genomics

Biology 122

Genes and Development

First genome: Haemophilus influenza, 1995; by Craig Venter and TIGR

Genomics milestones

Human genome, draft sequences, 2001: Two groups (Francis Collinsof the Public consortium ; Craig Venter and CELERA)

Now: 1000’s of bacteria have been sequenced. Hundreds of human genomeshave been sequenced!

NCBI, Nov. 2010

From Genome.gov Human genome conference 6/7/2010

Restrictionanalysis

FISH

Fig. 18.2

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

a.

b.

abl

bcr

bcr (on normal 22)

abl (on normal 9)

fused gene

9abl

bcr

der 9

22

Normal interphase nucleus

Reciprocal translocationbetween one 9 and one #22chromosome forms anextra-long chromosome 9 (“der9”) and the Philadelphiachromosome (Ph1) containingthe fused bcr-abl gene. This is aschematic view representingmetaphase chromosomes.

Ph1

Interphase nucleus of leukemiccell containing the Philadelphiachromosome (Ph1)

b: Reprinted by permission from Macmillan Publishers Ltd: Bone Marrow Transplantation 33, 247-249, “Secondary Philadelphia chromosomeafter non-myeloablative peripheral blood stem cell transplantation for a myelodysplastic syndrome in transformation,” T Prebet, A-S Michallet, C

Charrin, S Hayette, J-P Magaud, A Thiébaut, M Michallet, F E Nicolini © 2004

Sequence-taggedsites (STS)

Comparison of genetic and physical maps

Manual sequencing

Automated DNA sequencing

Estimated genes in sequenced genomes

Transposable elements

Alternative splicing

b.

c.

T C G GT CT C G G T AT A AG C

C C C GA CT T G A T GA T GG T

Chromosome 1

SNP SNP SNP

SNPs

Diagnostic SNPs

Haplotype 1

Haplotypes

Haplotype 2

Haplotype 3

Haplotype 4

Chromosome 2

Chromosome 3

Chromosome 4

a.

A A C A A A AT T TC C CG

C T C A AA G TA C G G T GT A AGC C

T T G

A C C

G T C

Haplotype 1

Haplotype 2

Haplotype 3

Haplotype 4

A T C

A C G

A GT G CT C A A C AA T AAG T

G T C

A

A/G T/C C/G

C C

C C T C C CG G G GG G G

A A C A A A AT T TC C CG C C T C C CG G A GA G G

A A C A A A AT T TT C CG C C T C C CG G A GG G G

A A C A A A AT T TC C CG C C T C C CG G G GG G G

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Genomevariation

Comparison of plant genomes (Comparative genomics)

Fig. 18.9Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Rice Genome

Corn Chromosome Segments

1 2 3 4 5 6

7 8 9 10 11 12

1 2 3 4 5 6 71 2 3 4 5 6 7 8 9 10

A B C D F G H I

SugarcaneChromosome Segments

WheatChromosome Segments

Genomic Alignment (Segment Rearrangement)

Rice

Sugarcane

Corn

Wheat

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Hypothesis: Flowers and leaves will express some of the same genes.

SCIENTIFIC THINKING

Prediction: When mRNAs isolated from Arabidopsis flowers and from leaves are used as probes on an Arabidopsisgenome microarray, the two different probe sets will hybridize to both common and unique sequences.

Genome sequencing projects statistics Organism Complete Draft assembly In progress total

Prokaryotes 850 585 534 1969

Archaea 78 5 32 115

Bacteria 773 580 502 1855

Eukaryotes 39 249 320 608

Animals 6 110 159 275

Mammals 3 37 81 121

Birds   3 12 15

Fishes   13 12 25

Insects 2 26 20 48

Flatworms 2 3 5

Roundworms 1 13 12 26

Amphibians   1   1

Reptiles   1   1

Other animals   16 22 38

Plants 7 23 78 108

Land plants 4 19 73 96

Green Algae 3 4 4 11

Fungi 16 83 39 138

Ascomycetes 13 63 28 104

Basidiomycetes 1 12 8 21

Other fungi 2 8 3 13

Protists 10 31 40 81

Apicomplexans 5 10 4 19

Kinetoplasts 4 1 3 8

Other protists 1 19 33 53

total: 889 834 854 2577

 

Revised: Nov 18, 2010

Genomes deposited at NCBI

GOLD (Genomes Online Database)

Complete Incomplete Targeted

Bacterial 2666 5493 424 Metagenome studies 340

Archaeal 149 182 1 Metagenome samples 1930

Eukaryotic 166 2037 13

[Metagenome are environmental samples]

Finished 1960

Permanent draft 1021

Complete, not published 26

Draft 1529

In progress 3426

DNA received 266

Awaiting DNA 510

Targeted (funded, not started) 438

Date 11/23/2011

NCBI, Genomes

Species Reference sequences In progress

Viroids 41 41

Viruses 2721 3933

Bacterial 1681 5140

Archaeal 121 90

Eukaryotes 1815

Organelles 2974

Date 11/23/2011

Human Disease genes

From Genome.gov, 11-2010

Animals Vertebrates

Amphipod Crustacean Chicken

Aphid, Pea Coelacanth

Beetle, Red Flour Gar, Spotted

Bug (Chagas' Vector) Hagfish

Centipede, Geophilimorph Lamprey, Sea

Chelicerate (Horseshoe Crab) Lizard, Anole

Drug Resistant Parasitic Nematode Pufferfish

Freshwater Polyp Shark, Elephant

Fruit Fly Skate

Honey Bee Spotted African Lungfish

Louse, Body Stickleback, Threespine

Mosquito Turtle, Painted

Placazoan Zebra finch

Planarian

Roundworm

Sand Fly

Sea Slug

Sea Squirt

Sea Star

Sea Urchin

Snail, Freshwater

Strongylid Nematode

Tardigrade

Wasp, Parasitoid

Worm, Acorn Genome.gov

Worm, Priapulid 11/22/2011

Animal genomes in progress, November 2011 (genome.gov)

Mammals

Aardvark Guinea Pig Opossum, Gray Short-Tailed

Alpaca Hedgehog, European Opossum, Laboratory

Armadillo, Nine-banded Hippopottamus Orangutan

Baboon Honey Possum (Noolbenger) Pangolin

Bat, Little Brown (Microbat) Horse Pika

Bat, Big brown Human Platypus, Duck-Billed

Bonobo Hyrax Rabbit

Bushbaby Koala Rat

Bushbaby/Galago Lemur, Flying Rat, Kangaroo

California leaf-nosed bat Lemur, Mouse Rhesus Macaque

Cape golden mole Lesser Egyptian jerboa Ring-tailed lemur

Cat Lizard, Anole Shrew, Elephant

Chimpanzee Llama Shrew, European Common

Chinchilla Long-haired (Rufous) elephant shrew Shrew, Tree

Chinese hamster Macaque, Cynomolgous Sloth

Cow Macaque, Pigtail Springhare

Crested porcupine Macaque, Rhesus Squirrel

Degu Macaque, Rhesus (Chinese population) Star nosed mole

Dog Malayan tapir Stickleback, Threespine

Dolphin Mangabey, Sooty Syrian/Golden Hamster

Eastern grey kangaroo Marmoset Tarsier

Elephant, African Savannah Mexican free-tailed bat Tenrec (Lesser Hedgehog)

Ferret Mole Vervet

Fly Fox (Megabat) Monkey, Squirrel Vole, Prairie

Giant anteater Mouse Wallaby, Tammar

Gibbon Mouse, Deer Water Chevrotain

Golden-mantled howling monkey Mouse, White-Footed Weddell Seal

Greater horseshoe bat Naked mole rat West Indian manatee

North American porcupine White rhinocerous

Mammal genomes in progress, November 2011 (genome.gov)

Neanderthals

Science Nov 17, 2006

Neanderthals

• 99.5% identical to humans when comparing the same sequences

Neanderthals

Draft sequence published May 7, 2010.

Neanderthals from four sites (see map) 21 bones from Vindija analyzed for this study 3 bones were selected for detailed sequencing (from three individuals) Bones from three other sites were also sequenced (see map)

Compared Neanderthal to five human genomes

Conclusion: Non-African humans contain some Neanderthal derived sequences (1 to 4%) (gene flow estimated to be Neanderthal to Human, and occurred > 45,000 years ago)

Notes: Humans and Neanderthals lived in the same area for > 10,000 years. Neanderthals perished 30,000 years ago.

Neanderthals

Four models of how the gene transfer could have occurred (option 2 is least likely, option 3 most likely)

Transfer most likely occurred inMiddle-East/Western Asia

PNG = Papua New Guinea

Denisovians

Third type of human genome sequencedFinger bone found in the Denisova cave in Altai Krai, Russia in 2008The Denisova bone had a genome distinct from modern humans or

NeanderthalsThe bone was dated to 41,000 years agoSince only bone fragments are known, it is not known how they lookedIt is thought that they were distributed throughout Asia and Melanasia

Analysis of the genome, and comparison with humans and neanderthals, suggests that 4% of non-African DNA is related to neanderthals and 4 to 6% of melanasian genomes is related to denisovians. This suggests some interbreeding between the first modern humans, neanderthals, and denisovians.

Analysis of HLA types (immune proteins) suggests that over half of eurasian HLA types came from neanderthals or denisovians, suggesting that they were selected for in the eurasians.

Watson’s genome

• Sequenced using shotgun sequencing

• About 3.5 percent of Watson’s genome could not be matched to the reference genome-probably due to differences in cloning step

• 32 million reads resulted in 2.8 billion base pairs of assembled sequence (7.5 fold coverage)

• 4.1 million differences to the already published genome (12.3 million bases different)

• 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions.

Venter’s genome compared to the reference genome

How different are individuals?

• 44% of genes were heterozygous for one or more variants (they could determine both copies)

• A conservative estimate that a minimum of 0.5% variation exists between two haploid genomes (all heterozygous bases).

How different are individuals?

• DNA from a Yoruba from Ibadan, Nigeria was completed.

• About 4 million SNPs were found, 74% had already been found by others.

• About 24% more polymorphism (heterozygosity) than Caucasian genomes.

• There were 5,704 indels ranging from 50 to over 35,000 bp long. Many were SINES and LINES.

Bentley et al., Nature, November 6, 2008

How different are individuals?

• DNA from a Han Chinese individual was completed.

• About 3 million SNPs were found, 86% had already been found by others.

• About 24% more polymorphism (heterozygosity) than Caucasian genomes.

• There were 2,682 structural variations, including insertions, deletions, and inversions. Many variations in SINES and LINES were found.

Wang et al., Nature, November 6, 2008

How different are cancer cells?

• DNA from skin cells and acute myeloid leukemia cells from the same Caucasian woman were sequenced.

• About 2.9 million SNPs were found in the skin cells, and 3.8 million in the leukemia cells.

• Almost all of the differences in SNPs were found to be common in other sequenced genomes or not in genes.

• Ten genes were found to have acquired mutations in the leukemia cells. Of these, two were known to be involved in tumour progression. The functions of the other eight mutant genes are unknown.

Ley et al., Nature, November 6, 2008

Metabolomics

• A study of 284 males compared 383 metabolic indicators and SNPs (genetic variants).

• Up to 12% of the levels of the metabolic molecules could be explained by particular versions of the gene (SNP).

• Four genes were known to be in metabolic pathways related to the metabolic molecule that was high or low.

Geiger et al., PLOS Genetics. November, 2008

Wooly mammoth

• Over 4 billion bp in genome

• Mammoths and African elephants differ in about 1 amino acid per protein

• Estimate that Mammoths and African elephant separated 1.5 to 2.0 Million years ago

Nature, November 20, 2008

Wooly mammoth

Recent genome news

Nov 19, 2011

Malaysian Genomics Resource Centre Berhad (MGRC) today announced that it has successfully completed its 100th human genome from a diverse mix of Malaysian, European and Australian individuals.

The results of the data generated from these genomes has helped in efforts to identify and compare highly represented patterns of common and clinically-relevant genetic variations within Malaysian and other populations, and to establish robust bioinformatics protocols for the reference-based analysis of genomic information.

Recent genome news

Nov 23, 2011

A study of 11,000 children and adults found that very short people (the lowest 2.5% of the population) are missing more genes or parts of genes than taller people.

Recent genome news

November, 2011

The mythical "$1,000 genome" is almost upon us (in 2012), said Jonathan Rothberg, CEO of sequencing technology company Ion Torrent, at MIT's Emerging Technology conference.

November 2, 2011

Duke University said last week that it will sequence 4,000 individuals as part of a collaborative, $25 million effort to identify as many genes as possible implicated in epilepsy.

Maize (corn) genome

Maize has 10 chromosomes, 2.3 billion base pairs

The sequencing was done using clone-by-clone method, with 16,848 BACs sequenced, assembled, and analyzed.

There are estimated to be 32,500 protein encoding genes, and 150 microRNA genes (miRNA).

Approximately 75% of the genome is repeated DNA.It has over 400 families of LTR retrotransposons with over 31,000 different sequences.

P. S. Schnable et al., Science 326, 1112-1115

(2009)

Fig. 1 The maize B73 reference genome (B73 RefGen_v1): Concentric circles show aspects of the genome

1000 Genomes projectThe 1000 Genomes Project is an international collaboration to produce an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts.

This resource will support genome-wide association studies and other medical research studies.

The genomes of about 2500 unidentified people from about 27 populations around the world will be sequenced using next-generation sequencing Technologies.

Highlights

Over 4.9 trillion nucleotides sequencedOver 800 individuals (179 people had their whole genomes sequenced

and 697 people just the protein-coding regions)Each child had around 60 mutations in its genome that did not exist in

either parentOver 15 million SNPs discoveredeach individual is carrying a significant number of deleterious mutations,

maybe 250 or 300 genes that have defective copies

3 billion Number of DNA letters in the human genome (200 volumes the size of a Manhattan telephone book, which has around 1,000 pages)

20,000-25,000 Number of genes in the genome (though not all scientists agree)2000 Year the first draft of the human genome was announced to much

fanfare at the Clinton White House2003 Final draft completed to 99.99% accuracy

2500 Number of people whose genomes the 1,000 Genomes Project hopes to sequence, from 25 populations

15 million Number of single-letter changes identified in the pilot phase

1 million Number of small insertions and deletions identified in the pilot phase

4.9 trillion Number of letters of data sequenced by the 1,000 Genomes Project so far

1094 Genomes completed for 1094 individuals, 6/23/11

1000 Genomes project http://www.1000genomes.org/home

Human microbiome

Adults harbor ten times more microbial cells than they have human cells.

Examination of how these microbes impact human health through their association with the body, for example by influencing metabolism, disease susceptibility and drug response is key for improving human health.

Through the Comparative Genome Evolution (CGE) program, NHGRI approved a limited project – Sequencing of Cultivable Microbes from Human Gut – to obtain reference genome sequence data from up to 300 cultured bacteria and archea sampled from the human digestive tract and urogenital tract in September 2005.

The object is three-fold: to start to generate reference data for future large-scale metagenomics studies; to understand the diversity of bacterial pangenomes, and to start to address the technical and bioinformatic challenges that human metagenomics research will encounter.

From Genome.gov, 11-2010

Scientists involved in the Genome 10K Project are assembling specimens of thousands of animals spanning a broad range of evolutionary diversity.

Photos courtesy of San Diego Zoo.

From http://news.ucsc.edu/2009/11/3333.html

Scientists propose a "genome zoo" of 10,000 vertebrate speciesNovember 03, 2009By Branwyn Wagman, Guest Writer (831) 459-3077

10,000 vertebrate genomes

In the most comprehensive study of animal evolution ever attempted, an international consortium of scientists plans to assemble a genomic zoo--a collection of DNA sequences for 10,000 vertebrate species, approximately one for every vertebrate genus.

Known as the Genome 10K Project, it involves gathering specimens of thousands of animals from zoos, museums, and university collections throughout the world, and then sequencing the genome of each species to reveal its complete genetic heritage.

Launched in April 2009 at a three-day meeting at the University of California, Santa Cruz, the project now involves more than 68 scientists. Calling themselves the Genome 10K Community of Scientists (G10KCOS), the group outlined its proposal to create a collection of tissue and DNA specimens for the project in a paper to be published online November 5 in the Journal of Heredity.

From http://news.ucsc.edu/2009/11/3333.html

top related