genome biology and biotechnology 9. the localizome prof. m. zabeau department of plant systems...

44
Genome Biology and Genome Biology and Biotechnology Biotechnology 9. The localizome 9. The localizome Prof. M. Zabeau Prof. M. Zabeau Department of Plant Systems Biology Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology Flanders Interuniversity Institute for Biotechnology (VIB) (VIB) University of Gent University of Gent International course 2005 International course 2005

Upload: kory-campbell

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Genome Biology and Genome Biology and BiotechnologyBiotechnology

9. The localizome9. The localizome

Prof. M. ZabeauProf. M. ZabeauDepartment of Plant Systems Biology Department of Plant Systems Biology

Flanders Interuniversity Institute for Biotechnology (VIB)Flanders Interuniversity Institute for Biotechnology (VIB)University of GentUniversity of Gent

International course 2005International course 2005

SummarySummary

¤ DNA localizome or DNA interactome– Genome-wide mapping of DNA binding proteins

• Transcription factor binding sites• Localization of replication origins

¤ Protein localizome– High throughput localization of proteins in cellular

compartments

Functional Functional MapsMaps

or “-omes”or “-omes”

proteins

ORFeome

Localizome

Phenome

Transcriptome

Interactome

Proteome

Genes or proteins

Genes

Mutational phenotypes

Expression profiles

Protein interactions

1 2 3 4 5 n

DNA Interactome Protein-DNA interactions

“Conditions”

After: Vidal M., Cell, 104, 333 (2001)

Cellular, tissue location

Genome-wide Analysis of Regulatory Genome-wide Analysis of Regulatory SequencesSequences

¤ Gene expression is regulated by transcription factors selectively binding to regulatory regions– protein–DNA interactions involve sequence-specific

recognition– Other factors, such as chromatin structure may be involved

¤ Sequence-specific DNA-binding proteins from eukaryotes generally – recognize degenerate motifs of 5–10 base pairs– Consequently, potential recognition sequences for

transcription factors occur frequently throughout the genome

¤ Genome-wide surveys of in vivo DNA binding proteins– provides a platform to answer these questions

Genome-wide Analysis of Regulatory Genome-wide Analysis of Regulatory SequencesSequences

¤ Methods combine – Large-scale analysis of in

vivo protein–DNA crosslinking

– microarray technology

¤ ChIP-on-chip– Chromatin Immuno-

Precipitation on DNA chips

Reprinted from: Biggin M., Nature Genet. 28, 303 (2001)

Genome-Wide Location and Function of DNA Genome-Wide Location and Function of DNA Binding ProteinsBinding Proteins

¤ Paper presents– proof of principle for microarray-based approaches to

determine the genome-wide location of DNA-bound proteins • Study of the binding sites of a couple of well known gene-specific

transcription activators in yeast: Gal4 and Ste12

– Combines data from• in vivo DNA binding analysis with • expression analysis • to identify genes whose expression is directly controlled by

these transcription factors

Ren et. al., Science, 290, 2306 (2000)

ChromatinChromatin Immuno Precipitation (Chip) Immuno Precipitation (Chip) ProcedureProcedure

– Cells are fixed with formaldehyde, harvested, and sonicated– DNA fragments cross-linked to a protein of interest are enriched by

immunoprecipitation with a specific antibody– Immuno-precipitated DNA is amplified and labeled with the

fluorescent dye Cy5– Control DNA not enriched by immunoprecipitation is amplified and

labeled with the different fluorophore Cy3– DNAs are mixed and hybridized to a microarray of intergenic

sequences – The relative binding of the protein of interest to each sequence is

calculated from the IP-enriched/unenriched ratio of fluorescence from 3 experiments

Reprinted from: Ren et. al., Science, 290, 2306 (2000)

Reprinted from: Ren et. al., Science, 290, 2306 (2000)

Modified ChromatinModified Chromatin Immuno Precipitation (Chip) Immuno Precipitation (Chip) ProcedureProcedure

Close-up of a scanned image of a micro-array containing 6361 intergenic region DNA fragments of the yeast genome

ChIP-enriched DNA fragment

Proof of concept: Gal4 transcription factorProof of concept: Gal4 transcription factor

¤ Identification of sites bound by the transcriptional activator Gal4 in the yeast genome and genes induced by galactose– Gal4 activates genes necessary for galactose metabolism

• The best characterized transcription factor in yeast– 10 genes were bound by Gal4 and induced in galactose

• 7 genes in the Gal pathway, previously reported to be regulated by Gal4

• 3 novel genes: MTH1, PCL10, and FUR4

Reprinted from: Ren et. al., Science, 290, 2306 (2000)

Genome-wide location of Gal4 proteinGenome-wide location of Gal4 protein

Genes whose promoter regions are bound by Gal4 and whose expression levels were induced at least twofold by galactose

Reprinted from: Ren et. al., Science, 290, 2306 (2000)

Reprinted from: Ren et. al., Science, 290, 2306 (2000)

Role of Gal4 in Galactose-dependent Cellular Role of Gal4 in Galactose-dependent Cellular RegulationRegulation

The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains how regulation of several different metabolic pathways can be

coordinated

Fur4

MTH1

Pcl10

reduces levels of glucose

transporter

increases intracellular pools

of uracil

ConclusionsConclusions

¤ The genes whose expression is controlled directly by transcriptional activators in vivo – Are identified by a combination of genome-wide location

and expression analysis

¤ Genome-wide location analysis provides information – On the binding sites at which proteins reside in the genome

under in vivo conditions

Genomic Binding Sites of the Yeast Cell-cycle Genomic Binding Sites of the Yeast Cell-cycle Transcription Factors SBF and MBF Transcription Factors SBF and MBF

¤ Paper presents– The use of CHIP and DNA microarrays to define the genomic binding

sites of the SBF and MBF transcription factors in vivo– The SBF and MBF transcription factors are active in the initiation of

the cell division cycle (G1/S) in yeast• A few target genes of SBF and MBF are known but the precise

roles of these two transcription factors are unknown• The two transcription factors are heterodimers containing the

same Swi6 subunit and a DNA binding subunit– MBF is a heterodimer of Mbp1 and Swi6 – SBF is a heterodimer of Swi4 and Swi6

Iyer et al., Nature 409: 533 (2001)

Genomic targets of SBF and MBFGenomic targets of SBF and MBF

Reprinted from: Iyer et al., Nature 409: 533 (2001)

In VivoIn Vivo Targets of SBF and MBF Targets of SBF and MBF

¤ The CHIP experiments identified– 163 possible targets of SBF– 87 possible targets of MBF– 43 possible targets of both factors

¤ Support for the possible in vivo targets– Most of the genes downstream of the putative binding sites

peak in G1/S– Target genes are highly enriched for functions related to

DNA replication, budding and the cell cycle– In vivo binding sites are highly enriched for sequences

matching the defined consensus binding sites

Reprinted from: Iyer et al., Nature 409: 533 (2001)

Expression Expression Profiles of SBF Profiles of SBF

and MBF Targetsand MBF Targets

Reprinted from: Iyer et al., Nature 409: 533 (2001)

Transcriptome data for synchronized cell cultures

Expression Profiles of SBF and MBF TargetsExpression Profiles of SBF and MBF Targets

¤ Why are two different transcription factors used to mediate identical transcriptional programmes during the cell-division cycle in yeast? – A possible answer is suggested by differences in the functions

of the genes that they regulate• Many of the targets of SBF have roles in cell-wall biogenesis and

budding • 25% of the MBF target genes have known roles in DNA replication,

recombination and repair– The results support a model in which

• SBF is the principal controller of membrane and cell-wall formation

• MBF primarily controls DNA replication

¤ The need for DNA replication and membrane / cell-wall biogenesis may be different in the mitotic and meiotic cell cycle

Reprinted from: Iyer et al., Nature 409: 533 (2001)

A high-resolution map of active promoters A high-resolution map of active promoters in the human genomein the human genome

¤ Paper presents– a genome-wide map of active promoters in human

fibroblast cells• determined by experimentally locating the sites of RNA

polymerase II preinitiation complex (PIC) binding• map defines 10,567 active promoters corresponding to

– 6,763 known genes – >1,196 un-annotated transcriptional units

– Global view of functional relationships in human cells between

• transcriptional machinery• chromatin structure • gene expression

Kim et. al., Nature 436: 876-880 (2005)

Identification of active promoters in the human Identification of active promoters in the human genomegenome

¤ Microarrays cover– All non-repeat DNA at 100 bp

resolution

¤ Pol II preinitiation complex (PIC)– RNA polymerase II – transcription factor IID

– general transcription factors ¤ ChIP of PIC-bound DNA

– monoclonal antibody against TAF1 subunit of the complex (TBP associated factor 1 )

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

Results from TFIID ChIP-on-chip Results from TFIID ChIP-on-chip analysisanalysis

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

Characterization of active promotersCharacterization of active promoters

¤ Matched the 12,150 TFIID-binding sites to – the 5' end of known transcripts in transcript databases– 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends

of known messenger RNAs

¤ 8,960 promoters were mapped – within annotated boundaries of 6,763 known genes in the EnsEMBL

genes

The chromatin-modification features of The chromatin-modification features of the active promotersthe active promoters

¤ Validation of active promoters– ChIP-on-chip using an anti-

RNAP antibody – ChIP-on-chip analysis using

• anti-acetylated histone H3 (AcH3) antibodies

• anti-dimethylated lysine 4 on histone H3 (MeH3K4) antibodies

• known epigenetic markers of active genes

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

TFIID, RNAP, AcH3 and MeH3K4 profiles TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of on the promoter of RPS24RPS24 gene gene

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

Additional findingsAdditional findings

¤ Promoters of non-coding transcripts– Are very similar to promoters of protein coding genes

¤ Promoters of novel genes– Estimate 13% of human genes remain to be annotated in the

genome

¤ Clustering of active promoters– co-regulated genes tend to be organized into coordinately

regulated domains

¤ Genes using multiple promoters

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

Multiple promoters in human genesMultiple promoters in human genes

¤ WEE1 gene locus– Two different transcripts with alternative 5’ends

• Encoding different proteins

– Two different TFIID-binding sites- two promoters– Differential transcription during the cell cycle

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

¤ Functional relationship between transcription machinery and gene expression

– correlated genome-wide expression profiles with PIC promoter occupancy

¤ Four general classes of promotersI. Actively transcribed genesII. Weakly expressed genesIII. Weakly PIC bound genesIV. Inactive genes

The transcriptome of a cell lineThe transcriptome of a cell line

Reprinted from: Kim et. al., Nature 436: 876-880 (2005)

Genome-Wide Distribution of ORC and MCM Genome-Wide Distribution of ORC and MCM Proteins in yeast: High-Resolution Mapping of Proteins in yeast: High-Resolution Mapping of

Replication Origins Replication Origins

¤ Paper presents– Genome-wide location analysis to map the DNA replication origins

in the 16 yeast chromosomes by determining the binding sites of prereplicative complex proteins

Wyrick et. al., Science, 294, 2357 (2001)

Chromosome Replication In Eukaryotic Chromosome Replication In Eukaryotic CellsCells

¤ Chromosome replication – initiates from origins of replication distributed along

chromosomes– Origins of replication comprise autonomously replicating

sequences (ARS)• ARS contain an 11-bp ARS consensus sequence (ACS)

– Essential for replication initiation – Recognized by the Origin Recognition Complex (ORC)

• The majority of sequence matches to the ACS in the genome do not have ARS activity

¤ Prereplicative complexes at replication origins comprise – Origin Recognition Complex (ORC) proteins– Minichromosome Maintenance (MCM) proteins

Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)

Reprinted from: Stillman, Science, 294, 2301(2001)

Prereplicative Complexes At Origins Of Prereplicative Complexes At Origins Of ReplicationReplication

ORC- and MCM-binding sites compared with ORC- and MCM-binding sites compared with known ARSsknown ARSs

¤ High degree of correlation between MCM and ORC binding sites and known ARSs– Correct identification of 88%

known ARSs¤ The method can accurately

identify the position of ARSs to a resolution of 1 kb or less

Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)

Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)

Genome-wide Genome-wide Location Of Location Of Potential Potential

Replication OriginsReplication Origins

Identification of 429 potential origins on the entire genome

ConclusionsConclusions

¤ The ChIP-based method identified the majority of origins found in the analysis of genome-wide replication timing in yeast – and provides direct, high-resolution mapping of potential

origins

¤ Similar approaches identified origins in other organisms– For example: Coordination of replication and transcription

along a Drosophila chromosome• MacAlpine et al., Genes & Dev. 18: 3094-3105 (2004)

Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)

Functional Functional MapsMaps

or “-omes”or “-omes”

proteins

ORFeome

Localizome

Phenome

Transcriptome

Interactome

Proteome

Genes or proteins

Genes

Mutational phenotypes

Expression profiles

Protein interactions

1 2 3 4 5 n

DNA Interactome Protein-DNA interactions

“Conditions”

After: Vidal M., Cell, 104, 333 (2001)

Cellular, tissue location

Global analysis of protein localization in Global analysis of protein localization in budding yeast budding yeast

¤ Paper presents– An approach to define the organization of proteins in the context

of cellular compartments involving– the construction and analysis of a collection of yeast strains

expressing full-length, chromosomally tagged green fluorescent protein fusion proteins

Huh et. al., Nature 425, 686 - 691 (2004)

Experimental StrategyExperimental Strategy

¤ Systematic tagging of yeast ORFs with green fluorescent protein (GFP)– GFP is fused to the carboxy terminus of each ORF– Full length fusion proteins are expressed from their native

promoters and chromosomal location

¤ The collection of yeast strains expressing GFP fusions was analyzed by– fluorescence microscopy to determine the primary

subcellular localization of the fusion proteins• Defines 12 categories

– co-localization with red fluorescent protein (RFP) markers to refine the subcellular localization

• Defines 11 additional categories

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

Construction of GFP fusion proteinsConstruction of GFP fusion proteins

¤ For each ORF a pair of PCR primers was designed– Homologous to the chromosomal insertion site– Matching a GFP – selectable marker construct

¤ Yeast was transformed with the PCR products to generate– Strains expressing chromosomally tagged ORFs

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

Representative GFP ImagesRepresentative GFP Images

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

Nucleus Nuclear periphery ER

Bud neck mitochondrion Lipid particle

GFP and RFP Co-localization ImagesGFP and RFP Co-localization Images

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

Nucleolar marker

Global resultsGlobal results

¤ Constructed ~6.000 ORF-GFP fusions– 4.156 had localizable GFP signals

(~75% of the yeast proteome)– Good concordance with data from

earlier studies• GFP does not affect the location• Localized 70% of the new proteins

– Major compartments: cytoplasm (30%) and the nucleus (25%)

– 20 other compartments: 44% of the proteins

¤ Most the proteins can be located in discrete cellular compartments

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

22 categories

The proteome of the nucleolusThe proteome of the nucleolus

¤ Detected 164 proteins in the nucleolus– Plus 45 identified in other

studies

¤ Data are consistent with MS analysis of human Nucleolar proteins– Allows identification of

yeast-human orthologs

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

Transcriptional co-regulation and Transcriptional co-regulation and subcellular localization are correlatedsubcellular localization are correlated

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

subcellular localization

33 transcription modulesCo-regulated genes

ConclusionConclusion

¤ The high-resolution, high-coverage localization data set – represents 75% of the yeast proteome

• classified into 22 distinct subcellular localization categories,

¤ Analysis of these proteins – in the context of transcriptional, genetic, and protein–

protein interaction data • provides a comprehensive view of interactions within and

between organelles in eukaryotic cells. • helps reveal the logic of transcriptional co-regulation

Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)

Recommended readingRecommended reading

¤ DNA-interactome– Genome-Wide Location of DNA Binding Proteins

• Ren et. al., Science, 290, 2306 (2000)

– Map of active promoters in the human genome• Kim et. al., Nature 436: 876-880 (2005)

¤ Global analysis of protein localization in yeast• Huh et. al., Nature 425, 686 - 691 (2004)

Further reading Further reading

¤ Genome-Wide Location of DNA Binding Proteins– Genomic Binding Sites of the Yeast Cell-cycle

Transcription Factors SBF and MBF• Iyer et al., Nature 409: 533 (2001)

– High-Resolution Mapping of Replication Origins• Wyrick et. al., Science, 294, 2357 (2001)