the human genome project main reference: nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

12
The Human Genome Project Main reference: Nature (2001) 409, 860- 921 http://www.abdn.ac.uk/~gen155/lectures/hgpcore .ppt • http://www.nature.com/ng/web_specials/ Whole issue also available from Nature Genome Gateway www.nature.com/genomics/human/ Describes the publicly funded project; Celera’s private HGP published in Science

Upload: sarah-franklin

Post on 28-Mar-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

The Human Genome Project

• Main reference: Nature (2001) 409, 860-921• http://www.abdn.ac.uk/~gen155/lectures/hgpcore.ppt

• http://www.nature.com/ng/web_specials/

• Whole issue also available from Nature Genome Gateway www.nature.com/genomics/human/

• Describes the publicly funded project; Celera’s private HGP published in Science

Page 2: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Main points

• Basic genome statistics

• Genome browsers e.g. UCSC, Ensembl

• Genomic “landscape”

• Repeated DNA as a “fossil record”

• Number of genes

• Polymorphism

• Applications

Page 3: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

The Strategy

• The genome sequence was a multinational collaboration involving 100s of scientists, millions of dollars, many countries

• The strategy was “top-down” using methods developed on small genomes (e.g. yeast)

• Figure 2 in the Nature paper

Page 4: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Genome statistics• Total size = 3290 Mb• 212 Mb of heterochromatin• Chromosomes range from 279 Mb (#1) to

45 Mb (#21) (fig 9, table 8 in paper)• Total “raw” sequence 23,000 Mb• Number of genes = about 31,000• About 30% of the genome is transcribed• About 1.5% of the genome is protein coding

Page 5: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Repeat DNA “fossils”• Genomes are full of repeated DNA

sequences of various kinds (table 11/12)

• Each type of repeat has a single origin and has replicated many times within the genome, transposing to new sites and accumulating mutations

• By comparing copies of the repeat to see how much they have diverged, can get an idea of how old repeat is (fig 18)

Page 6: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Humans versus worms and flies• Humans have only about twice as many genes as

worms or flies (table 23)• But human genes are subject to more alternative

splicing (60% vs 22%; average 3 different transcripts per gene)

• So humans probably have about 5 times as many proteins as worms or flies

• Complexity is not proportional to numbers of genes or proteins, but to the number of interactions they can have

Page 7: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Index of human genes and proteins

• 3 basic methods to predict genes from the genomic DNA: Comparison with ESTs, mRNAs Homology with other known genes/proteins Purely computational methods based on Hidden

Markov Models (HMMs)

• Started with predictions by Ensembl, combined with other information…..

Page 8: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

The Human Proteome• Key database is InterPro, which combines

information on all known protein domains• Only 94 of the 1262 InterPro types (7%) are

vertebrate-specific - so most domains are older than common ancestor of all animals - new ones are not “invented” very often

• Many of these are concerned with defence/immunity and the nervous system

• Most novelty is generated by new protein “architectures”, combining old domains in new ways (fig 42/45)

Page 9: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Genome History

• Mouse and human diverged about 100Mya, so there is 200My of evolution between them

• Chromosome translocations are involved in the formation of new species

• By comparing locations in the genome of homologous genes, can define regions of synteny (fig 46)

• Breakage seems to occur randomly, but tends to be in gene-poor regions

• No convincing evidence for whole-genome duplications

Page 10: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Polymorphism

• More than a million SNPs (single nucleotide polymorphisms were found

• Average 1 SNP per 1.9kb or 15 SNPs per gene• Combinations of closely linked SNP alleles form

haplotypes• Not all possible haplotypes are found in population - e.g

about 4-5 per gene (theoretically could have 215 = about 32000)

• HapMap – the haplotype mapping project• A paper (Trends in Genetics) on the subject of haplotype

blocks

Page 11: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Applications in medicine• Having the genome sequence, and databases

of genes, makes it much easier to find disease genes by positional cloning (e.g. BRCA2 for breast cancer)

• Sequence reveals new drug targets: e.g. a new type of serotonin receptor, predicted from sequence, shown to be a candidate for treating mood disorders and schizophrenia

Page 12: The Human Genome Project Main reference: Nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Latest - the Y chromosome

• Nature paper