the human genome project main reference: nature (2001) 409, 860-921 gen155/lectures/hgpcore.ppt
Post on 28-Mar-2015
214 Views
Preview:
TRANSCRIPT
The Human Genome Project
• Main reference: Nature (2001) 409, 860-921• http://www.abdn.ac.uk/~gen155/lectures/hgpcore.ppt
• http://www.nature.com/ng/web_specials/
• Whole issue also available from Nature Genome Gateway www.nature.com/genomics/human/
• Describes the publicly funded project; Celera’s private HGP published in Science
Main points
• Basic genome statistics
• Genome browsers e.g. UCSC, Ensembl
• Genomic “landscape”
• Repeated DNA as a “fossil record”
• Number of genes
• Polymorphism
• Applications
The Strategy
• The genome sequence was a multinational collaboration involving 100s of scientists, millions of dollars, many countries
• The strategy was “top-down” using methods developed on small genomes (e.g. yeast)
• Figure 2 in the Nature paper
Genome statistics• Total size = 3290 Mb• 212 Mb of heterochromatin• Chromosomes range from 279 Mb (#1) to
45 Mb (#21) (fig 9, table 8 in paper)• Total “raw” sequence 23,000 Mb• Number of genes = about 31,000• About 30% of the genome is transcribed• About 1.5% of the genome is protein coding
Repeat DNA “fossils”• Genomes are full of repeated DNA
sequences of various kinds (table 11/12)
• Each type of repeat has a single origin and has replicated many times within the genome, transposing to new sites and accumulating mutations
• By comparing copies of the repeat to see how much they have diverged, can get an idea of how old repeat is (fig 18)
Humans versus worms and flies• Humans have only about twice as many genes as
worms or flies (table 23)• But human genes are subject to more alternative
splicing (60% vs 22%; average 3 different transcripts per gene)
• So humans probably have about 5 times as many proteins as worms or flies
• Complexity is not proportional to numbers of genes or proteins, but to the number of interactions they can have
Index of human genes and proteins
• 3 basic methods to predict genes from the genomic DNA: Comparison with ESTs, mRNAs Homology with other known genes/proteins Purely computational methods based on Hidden
Markov Models (HMMs)
• Started with predictions by Ensembl, combined with other information…..
The Human Proteome• Key database is InterPro, which combines
information on all known protein domains• Only 94 of the 1262 InterPro types (7%) are
vertebrate-specific - so most domains are older than common ancestor of all animals - new ones are not “invented” very often
• Many of these are concerned with defence/immunity and the nervous system
• Most novelty is generated by new protein “architectures”, combining old domains in new ways (fig 42/45)
Genome History
• Mouse and human diverged about 100Mya, so there is 200My of evolution between them
• Chromosome translocations are involved in the formation of new species
• By comparing locations in the genome of homologous genes, can define regions of synteny (fig 46)
• Breakage seems to occur randomly, but tends to be in gene-poor regions
• No convincing evidence for whole-genome duplications
Polymorphism
• More than a million SNPs (single nucleotide polymorphisms were found
• Average 1 SNP per 1.9kb or 15 SNPs per gene• Combinations of closely linked SNP alleles form
haplotypes• Not all possible haplotypes are found in population - e.g
about 4-5 per gene (theoretically could have 215 = about 32000)
• HapMap – the haplotype mapping project• A paper (Trends in Genetics) on the subject of haplotype
blocks
Applications in medicine• Having the genome sequence, and databases
of genes, makes it much easier to find disease genes by positional cloning (e.g. BRCA2 for breast cancer)
• Sequence reveals new drug targets: e.g. a new type of serotonin receptor, predicted from sequence, shown to be a candidate for treating mood disorders and schizophrenia
Latest - the Y chromosome
• Nature paper
top related