human genome project 101 human genome program, u.s. department of energy, genomics and its impact on...
TRANSCRIPT
HUMAN GENOME PROJECTHUMAN GENOME PROJECT101101
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
Human Genome Project
Begun in 1990, the U.S. Human Genome Project is a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but effective resource and technological advances have accelerated the expected completion date to 2003.
HGP goals are to: ■ identify all the approximately 35,000* genes in human DNA,
■ determine the sequences of the 3 billion chemical base pairs that make up human DNA,
■ store this information in databases, ■ improve tools for data analysis, ■ transfer related technologies to the private sector, and ■ address the ethical, legal, and social issues (ELSI) that may
arise from the project.
Human Genome DataHuman Genome Data
• Derived from the Human Genome Project
• sequence freeze date in anticipation of data release: 22 July 2000
• Release of First Draft Sequence of Human Genome :
Nature 409 (6822), 15 February 2001
Science 291 (5507), 16 February 2001
• Release of “Complete” Draft Sequence of Human Genome: April 2003
GENE GENEIntragenic region
exons intronsinterspersed
repeatstandemrepeats
Fine Structure of Human Genomic DNA
ACGTTGTGTCGCTGATTAGCTAGACCAAGATAGTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATATATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTAGATG
3.2 billion nucleotides
The
Human
Genome
How many genes?
>100,000 < 40,000
But think of allall our traits, Jim-
bo!
Ours?! Are you of my species?
Get lost, punk!
Ouch!
The
Human
GenomeACGTTGTGTCGCTGATTAGCTAGACCAAGATAGTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCWHENISAGENEAGENETATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC
Comparative Genomics (Alignment)
Gene Prediction
Experimental Discovery (Genetics)
Alignment
CTCGCTGACTCAATCGGATTATGCTAGTCG
GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG
TGACTCAATCGGATTATGCTAGTCG
ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG
ATTTTTTTGACTCAATCGGATTA
CGGGGTGACTCAATCGGA
AAAAATATATTGACTCAATCGGATTATGCTAGTCG
GTCGTAGCTTGACTCAATCGGATTATGCTAGTCG
TCATATGACTCAATCGGATTATGCTAGTCG
CTCGCTGACTCAATCGGATTATGCTAGTCG
GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG
TGACTCAATCGGATTATGCTAGTCG
ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG
ATTTTTTTGACTCAATCGGATTA
CGGGGTGACTCAATCGGA
AAAAATATATTGACTCAATCGGATTATGCTAGTCG
GTCGTAGCTTGACGGAATCGGATTATGCTAGTCG
TCATATGACTCAATCGGATTATGCTAGTCG
CTCGCTGACTCAATCGGATTATGCTAGTCG
GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG
TGACTCAATCGGATTATGCTAGTCG
ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG
ATTTTTTTGACTCAATCGGATTA
CGGGGTGACTCAATCGGA
AAAAATATATTGACTCAATCGGATTATGCTAGTCG
GTCGTAGCTTGACGGAATCGGATTATGCTAGTCG
TCATATGACTCAATCGGATTATGCTAGTCG
Gene Prediction
TTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAGGAGAGAATATAAAGGATAGATTACATGTGATATATGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTATGGATTGCGCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTTCGCTATAGGCTATGCGATATAACCCAGGGGGGATACGCTATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTTCGCTATAGGCTATAGCGATATGACCCAGGGGGGATACGCTATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATATGATATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAAATAATATAAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC
TTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAGGAGAGAATATAAAGGATAGATTACATGTGATATATGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTATGGATTGCGCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTTCGCTATAGGCTATGCGATATAACCCAGGGGGGATACGCTATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTTCGCTATAGGCTATAGCGATATGACCCAGGGGGGATACGCTATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATATGATATTAGGAGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAAATAATATAAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC
GENE GENEIntragenic region
exons intronsinterspersed
repeatstandemrepeats
Gene Prediction Algorithmsbased on consensus nucleotide sequences of
•tata boxes and start codons
•stop codons
•splice junctions
•CpG islands
Comparative Gross Results from Model Genome
Projects
Humans have about 35,000 genes!
You were right.
So what’s new!
Human Genes
Surprising Findings = !!
• !! Only 35,000 genes• most genes in euchromatin• GC/AT patchiness• !! Gene density higher & intron
size smaller in GC-rich patches• !! 1.4% translated, 28%
transcribed• !! Origins of genes
Some Origins of Human Genes
• Most from distant evolutionary past(basic metabolism, transcription, translation,repli-cation fixed since appearance of bacteria and yeast)
• Only 94/1278 families vertebrate-specific• 740 are nonprotein-encoding RNA genes• many derive from partial genomes of viruses and
virus-like elements—genomic fossils• some acquired directly from bacteria
(rather than by evolution from bacteria)
Genomic Fossils
Genomic Fossils(also known as Molecular Fossils)
• interspersed repeats
• generated by integration of transposable elements or retrotransposable RNAs
• active contemporary modifier of some vertebrate genomes (mouse)
• formerly active modifier of human genome
• some as prevalent as 1.5 million copies
Alu ElementsType of Short Interspersed Nuclear Element (SINE)
• transcribed by RNA polymerase III • 3’ oligo dA-rich tail• found only in primates• 1,500,000 copies • derived from 7SL RNA gene• dimer-like structure• most retroposition occurred 40 mya
A/T-richregion and3’-UTR
direct repeats
5’ 3’
A-rich regionRNA polymerase
III Promoter
AAAnA B
31 bp
50-300 bpAlu
Reverse TranscriptionEssential for Retroposition and Proliferation of Retroelements
• Converts primed RNAs into cDNAs• catalyzed by RNA-dependent DNA pol
» (reverse transcriptase)
• pol encoded by retroviruses and active LINEs
Retroviral genomic RNA
Alu RNA
LINE RNA
Alu Subfamily Structure (millions of years)
Oldest [J]
Intermediate [S]
Youngest [Y]
Jo Jb (65)
S (50) Sq
Sp Sx
Sc Sg
Y (25)
Yb8 Ya5 Ya8
450,000 copies
50,000 copies
Alu Elements as Genomic Fossils
Alu Subfamily Structure
PS [J]: Primate-Specific. Abundant in all primates.65-70 mya: Early Prosimian (strepsirhini)
AS [S]: Anthropoid-Specific (haplorhini) 50-60 mya One mutation difference than PS.
Alu Subfamily Structure
CS[S]: Catarrhine-specific. Nine mutations arising30-40 mya: Platyrrhines (FN) (Marmoset)Catarrhine (DFN) (Macaque)
macaque
Alu Subfamily Structure
HS [Y]: Human-specific. Five or more additional20-25 mya: Almost exclusively Hominids
Alu Subfamily Structure
Master Gene Model of RetropositionP. Deininger, M. Batzer, Trends in Genetics 8:307, 1992
2. Master mutation
1. Amplification
TIME (m.y.)CO
PY
N
UM
BER
3’5’
3’5’
Alu Subfamily Structure (millions of years)
Oldest [J]
Intermediate [S]
Youngest [Y]
Jo Jb (65)
S (50) Sq
Sp Sx
Sc Sg
Y (25)
Yb8 Ya5 Ya8
450,000 copies
50,000 copies
Alus as Genomic Fossils
ALU INSERTIONS AND DISEASE
LOCUS DISTRIBUTION SUBFAMILY DISEASE REFERENCE
BRCA2 de novo Y Breast cancer Miki et al, 1996Mlvi-2 de novo (somatic?) Ya5 Associated with
leukemiaEconomou-Pachnis andTsichlis, 1985
NF1 de novo Ya5 Neurofibromatosis Wallace et al, 1991APC Familial Yb8 Hereditary desmoid
diseaseHalling et al, 1997
PROGINS about 50% Ya5 Linked with ovariancarcinoma
Rowe et al, 1995
Btk Familial Y X-linkedagammaglobulinaemia
Lester et al, 1997
IL2RG Familial Ya5 XSCID Lester et al, 1997Cholinesterase one Japanese family Yb8 Cholinesterase
deficiencyMuratani et al, 1991
CaR familial Ya4 Hypocalciurichypercalcemia and
neonatal severehyperparathyroidism
Janicic et al, 1995
C1 inhibitor de novo Y Complement deficiency Stoppa Lyonnet et al, 1990ACE about 50% Ya5 Linked with protection
from heart diseaseCambien et al, 1992
Factor IX a grandparent Ya5 Hemophilia Vidaud et al, 19932 x FGFR2 De novo Ya5 Apert’s Syndrome Oldridge et al, 1997GK ? Sx Glycerol kinase
deficiencyMcCabe et al, (personalcomm.)
What’s New About Old Fossils?In the Human Genome
• Comprise nearly 50% of genome• 50% more Alu elements than were predicted by
molecular biology• scarce in highly-regulated regions (detrimental?)• enriched in GC regions (beneficial?)• little activity, but little scouring• occur frequently within exons• contribute to formation of genes encoding novel
proteins
3.2 billion bases
28% transcribed
<1.4% encodes protein
50% repeats
Only ~35,000 genes!
FEATURESFEATURESThe
Human
Genome
not many modern protein families
Humans have about 35,000 genes!
Well, then…How can you
explain human complexity?