the human genome project main reference: nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt

Post on 28-Mar-2015

214 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Human Genome Project

• Main reference: Nature (2001) 409, 860-921• http://www.abdn.ac.uk/~gen155/lectures/hgpcore.ppt

• http://www.nature.com/ng/web_specials/

• Whole issue also available from Nature Genome Gateway www.nature.com/genomics/human/

• Describes the publicly funded project; Celera’s private HGP published in Science

Main points

• Basic genome statistics

• Genome browsers e.g. UCSC, Ensembl

• Genomic “landscape”

• Repeated DNA as a “fossil record”

• Number of genes

• Polymorphism

• Applications

The Strategy

• The genome sequence was a multinational collaboration involving 100s of scientists, millions of dollars, many countries

• The strategy was “top-down” using methods developed on small genomes (e.g. yeast)

• Figure 2 in the Nature paper

Genome statistics• Total size = 3290 Mb• 212 Mb of heterochromatin• Chromosomes range from 279 Mb (#1) to

45 Mb (#21) (fig 9, table 8 in paper)• Total “raw” sequence 23,000 Mb• Number of genes = about 31,000• About 30% of the genome is transcribed• About 1.5% of the genome is protein coding

Repeat DNA “fossils”• Genomes are full of repeated DNA

sequences of various kinds (table 11/12)

• Each type of repeat has a single origin and has replicated many times within the genome, transposing to new sites and accumulating mutations

• By comparing copies of the repeat to see how much they have diverged, can get an idea of how old repeat is (fig 18)

Humans versus worms and flies• Humans have only about twice as many genes as

worms or flies (table 23)• But human genes are subject to more alternative

splicing (60% vs 22%; average 3 different transcripts per gene)

• So humans probably have about 5 times as many proteins as worms or flies

• Complexity is not proportional to numbers of genes or proteins, but to the number of interactions they can have

Index of human genes and proteins

• 3 basic methods to predict genes from the genomic DNA: Comparison with ESTs, mRNAs Homology with other known genes/proteins Purely computational methods based on Hidden

Markov Models (HMMs)

• Started with predictions by Ensembl, combined with other information…..

The Human Proteome• Key database is InterPro, which combines

information on all known protein domains• Only 94 of the 1262 InterPro types (7%) are

vertebrate-specific - so most domains are older than common ancestor of all animals - new ones are not “invented” very often

• Many of these are concerned with defence/immunity and the nervous system

• Most novelty is generated by new protein “architectures”, combining old domains in new ways (fig 42/45)

Genome History

• Mouse and human diverged about 100Mya, so there is 200My of evolution between them

• Chromosome translocations are involved in the formation of new species

• By comparing locations in the genome of homologous genes, can define regions of synteny (fig 46)

• Breakage seems to occur randomly, but tends to be in gene-poor regions

• No convincing evidence for whole-genome duplications

Polymorphism

• More than a million SNPs (single nucleotide polymorphisms were found

• Average 1 SNP per 1.9kb or 15 SNPs per gene• Combinations of closely linked SNP alleles form

haplotypes• Not all possible haplotypes are found in population - e.g

about 4-5 per gene (theoretically could have 215 = about 32000)

• HapMap – the haplotype mapping project• A paper (Trends in Genetics) on the subject of haplotype

blocks

Applications in medicine• Having the genome sequence, and databases

of genes, makes it much easier to find disease genes by positional cloning (e.g. BRCA2 for breast cancer)

• Sequence reveals new drug targets: e.g. a new type of serotonin receptor, predicted from sequence, shown to be a candidate for treating mood disorders and schizophrenia

Latest - the Y chromosome

• Nature paper

top related