human genome structure and organization bert gold, ph.d., f.a.c.m.g
TRANSCRIPT
Human Genome Structure and Organization
Bert Gold, Ph.D., F.A.C.M.G.
Genetic Variation
PhenotypeExpression of the genotype (modified by the environment).
The structural or functional nature of an individual. Includes:
appearance, physical features, organ structure
biochemical, physiologic nature
GenotypeGenetic status, the alleles an individual carries.
Learning Objectives
Recap and Update Public and Private Human Genome Project Status
Provide Reminders of Necessary Background for Genetic Disease Association and Linkage Studies
Definitions• Penetrance - The probability that an individual who is ‘at-
risk’ for the disorder (ie- carries the gene) develops (expresses) the condition. May be age dependent.
• Expression - The characteristics of a trait or disease that are outwardly expressed. Eg-myotonic dystrophy: myotonia, cataracts, narcolepsy, frontal balding, infertility.
• Ascertainment – The method used in gathering genetic data. Study conclusions differ depending on how affected individuals entered the study.
• Phenocopy – Individuals whose phenotype, under the influence of non-genetic agents, has become like the one normally caused by a specific genotype in the absence of non-genetic agents.
• Pleiotropy - The quality of an allele to produce more than one effect; ie- to manifest its expression in the structure and/or function of more than one organ system or tissue
• Recurrence Risk – Likelihood that a relative of a proband for a rare disease will have the same disease.
Penetrance and Expressivity
• Penetrance: Proportion that expresses a trait– Complete: P=1.0 or 100%– Incomplete (“reduced”): P<1.0 or < 100%
• Expressivity: Severity of the phenotype– Expressivity may vary
• Between families (interfamilial) or• Within families (intrafamilial)
• TRY NOT TO CONFUSE “VARIABLE EXPRESSIVITY” WITH “INCOMPLETE PENETRANCE”
Chromosomes, Genes and Proteins
Genes are on Chromosomes
Genes may encode proteins or RNA
Non-coding RNA ‘genes’ • tRNAs (497 were counted, 821 when count
genes and pseudogenes)– tRNAs found are consistent with Wobble– Codon bias only roughly correlated with tRNA
distribution
• rRNAs• small nucleolar RNAs (snoRNAs)• snRNAs (spliceosome constituents)• 7SL RNA• telomerase RNA• Xist transcript• Vault RNA
tRNAs
Some chromosomes are richer in genes than others
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
1 3 5 7 9 11 13 15 17 19 21 X
Chromosomes
Number ofNucleotide
sin
Exons
HOXA, HOXB, HOXC and HOXD are in regions with a particularly low density of repeats: This is believed to result
from the presence of Cis-acting elements in this vicinity.
Proteins demonstrate patterns and similarity of function
Functionally and Structurally similar proteins are organized into families
e.g.- E.C., SWISS-PROT, TrEMBL,
In silico approaches to characterize genes include:
• PFAM, searchable via HMMER• Other in silico collections include:
– PRINTS– PROSITE– SMART– BLOCKS
• Creation of an Integrated Protein Index (IPI)
How many genes are there?
Estimates from the Public Program– RefSeq– Exons– Introns– Average Sizes– Coding Sequences (CDS)– Alternative splice products (about 3%)– Creation of an Integrated Gene Index (IGI)– Genscan to Ensembl to Pfam via GeneWise (31,778)– Could be as low as 24,500 using overprediction
corrections.
Estimates from Celera25,086 in Assembly 3
• 25,086 in Assembly 3
Pre-existing estimates
• W. Gilbert’s back of the envelope calculation
• Reassociation Kinetics
• Estimates from Double Twist using Promoter Inspector plus
• Unpublished estimates from Human Genome Sciences
Size of Genes:
• Largest: Dystrophin 2.7 Mb
• Titin
• 80,780 bp coding
• 178 exons
• largest single exon 17,106
GENE HOMOLOGS, ORTHOLOGS, PARALOGS
• Vaculolar sorting machinery in yeast• ABC gene superfamily• Ig gene superfamily• FGF superfamily• Intermediate filament superfamily• PROTEIN FAMILY EXPANSION
APPEARS TO BE A PRIMARY EVOUTIONARY MECHANISM
The proteome
• Functional categories
• PRINTS
• Prosite
• Pfam
• Interpro (http://www.ebi.ac.uk/interpro/)
GENE ONTOLOGY
• Standard Vocabulary
• Hierarchy of terms (Directed ACYCLIC Graph)
• Ashburner Nature Genetics 25:25-29 (2000)
• ‘Bushy’ model
Horizontal Transfer controversy • One of the major conclusions of the Public Genome effort,
published in Feb. 15, 2001 Nature was: “Hundreds of human genes appear likely to have resulted
from horizontal transfer from bacteria at some point in the vertebrate lineage. Dozens of genes appear to have been derived from transposable elements”
• This has now been widely disputed and is believed to result from:– Microbial contaminants in the sequence.– Bacterial gene integration into pre-vertebrates– And
• “The more probable explanation for the existence of genes shared by humans and prokaryotes, but missing in nonvertebrates, is a combination of evolutionary rate
variation, the small sample of nonvertebrate genomes, and gene loss in the nonvertebrate lineages. “
-Salzberg et. al., Science
Splice Pattern, 98% GT-AG
Chromatin Structure
• Euchromatin
• Heterochromatin
• Nucleosomes
Chromosome Facts
• Chromosomes replicate during S phase
• Chromosomes recombine during Pachytene
• Recombination is an obligate activity
• Sex chromosomes recombine with each other
Cytogenetics is done by Karyotyping
• Chromosomes are chemically frozen in metaphase
• Must be carried out on dividing cells• Microfilament inhibitors• Microtubule inhibitors• Membrane lysis• Pronase, trypsin digest• Giemsa stain• G-bands correspond to regions of relatively low
GC contenthttp://genome.ucsc.edu/goldenPath/mapPlots/http://genome.ucsc.edu/goldenPath/hgTracks.html
Cell Division: Meiosis
– Segregation• Defined: Alleles are paired; gametes
receive one of each.• Exceptions: trisomy and uniparental disomy
– Independent Assortment• Gene Pairs segregate independently• Exception: linkage
Meiosis Creates Gametes
And provides a basis for genetic recombination!
Genetic Recombination
• Crossing Over• Resolution• Recombinant Chromosomes
– OBLIGATE ACTIVITY– FEMALE RECOMB. RATES HIGHER THAN
MALE– INCREASED RATES AT TELOMERES– PARADOX: SHORT ARMS SHOW MORE THAN
LONG ARMS– 1cM is 1 Mb on long arms, but short arms are 2 cM
per Mb and the Yp-Xp pseudoautosomal region is 20 cM per Mb.
INCREASED RATES AT TELOMERES
PARADOX: SHORT ARMS SHOW MORE THAN LONG ARMS
Genes
• Units of heredity• Encode proteins (and some RNAs)• Human genetics is the study of gene variation in
humans• ‘Gene’ as a term is used ambiguously to refer
both to the ‘locus’ and the ‘allele’ ie- There is only one locus but two alleles in a given individual.
• Sequencing in both genome projects took place upon multiple alleles; this has led to some assembly confusions.
• Ultimately want a haploid genome map.
The Human Genome Project • International public effort commencing in 1990 to
sequence the entire human genome by 2005.• STS approach chosen in 1991• Private effort launched in 1996 by Celera using
‘Shotgun’ cloning
BAC clones, sequenced into BAC end reads, and assembled into ‘contigs’
Markerless ‘contigs’ in the Celera
assembly are called ‘Scaffolds’
Markers are BAC ends in the ‘shotgun’
Mate pair reads provided the core of Celera sequence
Draft human genome sequences complete by
February 2001.• Published simultaneously in Feb. 2001
– Public Sequence in NATURE (409: 745-964)– Celera Sequence in SCIENCE (291: 1145-
1434)
Greater than 50% of sequence is repetitive
45% of the human genome is derived from transposable elements
• Long Interspersed Elements: LINEs (21% of genome)– LINE1 – Some Still Active, Autonomous, consist of two ORFs
(one is a pol).
– LINE2
– LINE3
• Short Interspersed Elements: SINEs (13% of genome)– ALU – Some still active, use L1 enzymes to replicate
– MIR
– Ther2/MIR3
• LTR Retroposons– Consist of gag and pol
– Protease, rt, RNAseH, integrase all encoded
– Reverse transcription occurs cytoplasmically, using a tRNA to prime replication
• DNA Transposons
98.5 % of sequence is non-coding.
Approximately 1/3 of the human genome is transcribed (public guess).
Allelism
• Alternate forms of a gene
• e.g.- Sickle Cell, CFTR
• Recessive disease
• e.g. Achondroplasia, Tuberous Sclerosis
• Dominant Disease
Heterozygote or Homozygote
• 1,2 or 1,1
• homogeneity of alleles at a locus
Genetic Markers
• RFLPs• VNTRs (STRs)• Microsatellites• STSs• SNPs• “Tools” used to find disease genes• “Flags” with locations throughout the
genome
Polymorphism Information Content versus Heterozygosity (PIC vs. het)
• Determining heterozygosity from SNP rare allele frequency
• Information Content in SNPs versus STRs
Typology of SNPs• Type I- Coding, non-synonymous, non-conservative• Type II- Coding, non-synonymous, conservative• Type III- Coding, synonymous• Type IV- Non-coding, 5’-UTR• Type V- Non-coding, 3’UTR• Type VI- Other non-coding• Type I and Type II SNPs have lower heterozygosity
than other SNPs, presumably as a result of selective pressure.– About 25% of type I and type II SNPs have minor allele
frequencies > 15%– About 60% have minor allele frequencies < 5%
Mutation
• Occurs more often during male meiosis
• Occurs more often in ‘long genes’
• More easily detected in Dominant Diseases– Achondroplasia– Duchenne Muscular Dystrophy
• May often involve CpG mutating to TpG
Autosomal Recessive Inheritance
• Two copies of a gene required to be affected• Carriers have one copy of the mutation and are
unaffected• 25% of offspring of two carriers will be
affected• Males and females affected in equal number• Eg. Sickle Cell, beta-thal., CF
X Linked Recessive (Sex Linked)
• Females rarely affected
• No male to male transmission
• Affected males transmit gene to all daughters
• Eg- Duchenne Muscular Dystrophy, Hemophilia A
Autosomal Dominant Inheritance
• Each child at 50% risk
• Does not skip generations
• Often, lethal in double dose
• Large genetic load
X-linked Dominant Pedigree
• Example is Hypophosphatemic, Vitamin D Resistant Rickets
• Distinguished from Autosomal Dominant by:– No male-to-male transmission– All daughters of affected fathers are affected
IMPORTANT NOTE:
Dominant and Recessive refer to the phenotypic expression of alleles, NOT to intrinsic characteristics of gene loci.
Inheritance Pattern Complexities • Pseudodominant Transmission of a Recessive• Pseudorecessive Transmission of a Dominant
– Misassigned paternity, causal heterogeneity, incomplete penetrance, germline mosaicisim
• Mosaicism• Mitochondrial Inheritance• Penetrance and Expressivity
– Semi-dominant, gender- influenced, age-related, transmission-related, imprinting
• Uniparental Disomy (UPD)• Environmental effects, phenocopies
Preview of linkage analysis
• Characterizing Human Genetics:– Long generation time– Inability to control matings– Inability to control study population– Inability to control exposures to environmental
conditions– It is possible to define phenotypes well!– Can study genetic structures through family history– Link phenotypes and genetic structures through
statistical methods