la régulation transcriptionnelle : mécanismes et méthodes...

165
La régulation transcriptionnelle : mécanismes et méthodes d’analyse Jean Imbert For a European Research Initiative: http://fer.apinc.org/ 15 avril 2009 Plate-forme IBiSA - Inserm

Upload: nguyentu

Post on 01-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

La régulation transcriptionnelle :mécanismes et méthodes d’analyse

Jean Imbert

For a European Research Initiative: http://fer.apinc.org/

15 avril 2009

Plate-formeIBiSA - Inserm

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

I. Principes de basesdu contrôle de la transcription

I.1. Eukaryotic Promoter Classes

• Pol I < 1%

• Pol II with TATA-box > 70%• Pol II without TATA-box ~ 20%

• Pol III internal ~ 5%• Pol III upstream with TATA-box < 1%• Pol III upstream without TATA-box < 1%

mRNAs

3/4 rRNAs (28S, 18S, 5.8S)

Small RNAstRNAs5S RNAs

I.2. How to define a gene and its boundaries?

CORE PROMOTER

MODEL OF TYPICAL GENE PROMOTER AND REGULATORY REGIONS

Enhancer Regulatory Elements

+ 1

Inr EnhancerTATA Box

Core Promoter

Regulatory Promoter

-40 +50-4000 -500 +2000

From Felsenfeld and Groudine, Nature, 421:448-454, 2003

From Jones and Kadonaga, Genes Dev. 14:1992-1996, 2000.

CTCF

The molecular mechanisms of transcriptional regulationinvolve an intricate hierarchy of factors acting sequentiallyat three levels:

1. Binding at specific regulatory elements linked to targeted genes ofthe sequence-specific DNA-binding transcription factors.

2. Recruitment of non-DNA-binding proteins capable of modifyingthe general repressive context of chromatin and actingas signal-regulated scaffolds to bridge interactions betweenthe sequence-specific DNA-binding proteins and the basal transcriptionmachinery.

3. Recruitment of RNA polymerase II and its basal transcription factors.

DNA compactioncompaction in a human nucleus

compact size DNA length compaction

nucleus (human) 2 x 23 = 46 chromosomes 92 DNA molecules 10 μm ball 12,000 Mbp 4 m DNA 400,000 x

mitotic chromosome 2 chromatids, 1 μm thick 2 DNA molecules 10 μm long X 2x 130 Mbp 2x 43 mm DNA 10,000 x

DNA domain anchored DNA loop 1 replicon ? 60 nm x 0.5 μm 60 kbp 20 μm DNA 35 x

chromatin fiber approx. 6 nucleosomes per ‘turn’ of 11 nm 30 nm diameter 1200 bp 400 nm DNA 35 x

nucleosome disk 1 ¾ turn of DNA (146 bp) + linker DNA 6 x 11 nm 200 bp 66 nm DNA 6 - 11 x

base pair 0.33 x 1.1 nm 1 bp 0.33 nm DNA 1 x

1bp (0.3nm)

10,000 nm

30nm

11 nm

Compaction of DNA by histones Compaction by chromosome scaffold / nuclear matrix

From Jakob H. Waterborg - School of Biological Sciences - University of Missouri-Kansas City

I.3. Major Nucleosomal Histone Modification Mapping

From G. Felsenfeld and M. Groudine. Controlling the double helix. Nature 421 (6921):448-453, 2003.

• Acetylation at lysine residues is highly associated with transcriptional activation (H2AK5, H2BK12, etc.)

• Methylation at lysine or arginine residues is associated with either transcriptional activation (H3K4, H4R3, etc.) or repression (H3K9, H3K27, etc.)

• Phosphorylation at serine or threonine residues is associated with either transcriptional activation (H3S28, etc.) or repression (H2AS1, etc.)

Functional Effects of Histone Modifications on TranscriptionA Few Examples in Mammals

Ubiquitylation and sumoylation have been associated with mitosis, meiosis,etc.

From C. L. Peterson and M. A. Laniel. Histones and histone modifications. Curr.Biol. 14 (14):R546-R551, 2004.

II. Les facteurs de transcriptionspécifiques des éléments régulateurs

DD: Dimerization DomainDBD: DNA Binding DomainTD: Transactivation DomainNLS: Nuclear Localization SignalNES: Nuclear Export SignalP: Phosphorylation Site

Sequence-specific transcription factors are modular proteins

DBD TDDD

NLS

PPP

NES

Histone acetyltransferase CBP/p300 interacts with many partners

Various modes of transcription factoractivation

NFAT

From Bing Ren, Laboratory of Gene Regulation, Ludwig Institute for Cancer Research, UCSD

From Tupler R, Perini G, Green MR. Nature 409: 832-833, 2001.

Genome-wide comparison of transcriptional activator families in eukaryotes

C2H2 zinc fingers are found in 2% of all human genes, and they are by far the most abundant class of DNA-binding domains found in human transcription factors.

From David Gifford, Young’s lab

STRATEGIES POUR L'ANALYSE DES SEQUENCES REGULATRICES ET DES FACTEURS DE TRANSCRIPTION

1. Identification et caractérisation des séquences régulatrices- Région promotrice, amplificatrice, répresseur- Essais à gènes indicateurs : transitoire ou stable (CAT, Luciférase, SEAP, β-gal, …)- Séquençage, recherche dans des banques de motifs consensus (sites web : TESS, Euk. Pr. Database,

TRANSFAC, JASPAR, MATINSPECTOR, TFSEARCH, etc.)

2. Identification et caractérisation des facteurs protéiques spécifiques des séquences régulatrices- gel-retard et dérivés : UV-crosslinking, oligonucléotides biotynilés- détection d'empreintes :* in vitro : Nucléases (DNAse I hypersensitivity, nucléase S1, Mmase, méthodes chimiques)* in vivo : détection d'empreinte génomique, ChIP (Chromatin ImmunoPrecipitation), enhancer knock in ,

minichromosomes, transgénèse

3. Caractérisation des interactions physiques et fonctionnelles- Transfection et biochimie- ChIP, ChIP-on-chip, ChIP-seq

4. Intégration dans le contexte chromatinien (Insulator, MAR/SAR, LCR, etc.)- Minichromosome artificiel- 3C, 4C, 5C

III. Exemple d’organisationfonctionnelle des régions régulatrices :

Control of IL2RA Gene Transcription

Major Roles of the IL2/IL2R System

Major T cell growth factor controlling antigen-mediated cell growth

Antigen-induced cell death (AICD)

Essential for generation in Thymus and peripheral maintenance ofCD4+CD25+ Treg

T CD4+

CD3/TcR

CD28

Ag MHC

B7 APCAutocrine

effects

IL-2

α/β/γc

B T CD8+NK

β/γcα/β/γc α/β/γc

Paracrineeffects

Primary Activation

IL-2Rα CHAIN EXPRESSION IS CONTROLED AT:

- TRANSCRIPTIONAL LEVEL

- POST-TRANSCRIPTIONAL LEVEL (mRNA half-life)

CD25/IL-2Rα GENE TRANSCRIPTION DURING T CELL ACTIVATIONLo

g [IL

2RA

mR

NA

leve

l]

G0 G1

Primary activation

CD3/TCRCD28

ProliferationDifferentiation

S

IL2/IL2R (α βc γc)

Cell cycle progression

IL-2

IL-2/IL2-RSignal 3

IL-2

III II

I

TGF-β

TGF-β/TGF-βROther signal

IL-2 RαmRNA

PRRII[-137,-64]

PRRI[-276,-244]

Elf-1

HMG-I(Y)

PRRIV[+3389,+3596]

IL-2rE

PRRIII [-3780,-3703]

IL-2rE

Stat5a,b

HMG-I(Y)

Stat5a,b

GATA-1-like

Ets-1/2

PRRVI[-8689,-8483]

CD28rE

NF-κB SRF

CREB/ATF

AP-1 AP-1

HMG-I(Y)NFAT Exon 1

AP-1

NFAT

SATB1?

SBS[+7822,+8199]

CREB/ATFAP-1

Smad3

PRRV[-7664,-7566]

CD4+ T cell

APC

B7

B7/CD28Signal 2

Ag/TCR/CD3 Signal 1

Ag

MHCI

PRRIITATA

+ 1-100-200

PRRI-300

PRIMARYACTIVATION

IL2RA GENE REGULATORY REGIONS

-4000-9000

NF-κB SRF Elf-1?

Phylogenetic footprinting:two species comparison

PipMaker

http://pipmaker.bx.psu.edu/pipmaker/

Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. (2000). PipMaker: A Web Server for Aligning Two Genomic DNA Sequences. Genome Res. 10, 577-586.

Homo Sapiens/Mus Musculus IL-2Rα locus dotplot comparison

1 2 3 4 567Exons:k

Homo Sapiens IL-2Rα

IIII1

intr

on A

IVSB

S

gap

23

45

67

Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. (2000). PipMaker: A Web Server for Aligning Two Genomic DNA Sequences. Genome Res. 10, 577-586. update JI 17/10/05

IIIVI I+II IVRegulatory Regions: V

Mus

Mus

culu

s IL

-2R

α

V

TTTCTTCTAGGAAGTACCAAACATTTCTGATAATAGAATTGAGCAATTTCCTGATIIIIIIII--IIIIIIIII-IIIIIIIIIIIII-III-IIIIIIII-IIIIIIIITTTCTTCTGAGAAGTACCAGACATTTCTGATAAGAGAGTTGAGCAACTTCCTGAT

Site I Site II Site III

GASp/GATAGASd/EBSd EBSp

PRRIII

IL-2rE

- 1369 - 1315

- 3772 - 3718

Homo Sapiens

Mus Musculus

Gene reporter assay

250XbaI

TATA-3700-3800

PRRII

+ 1-100-200

PRRI-300

PRIMARYACTIVATION

XbaI XbaI NF-κBSRF

Elf-1

pTK CAT

PRRIII

IL-2

0 20 40 60 80CAT (pg/ml)

IL-2ns

pTK4.CAT

pTK4 / 250XbaI

INDUCTION FOLD

4.2

1.33

In vivo Genomic Footprinting

In Vivo Footprinting

• 1) Methylation of guanines (major groove) and to a lesser extent

of adenines (minor groove) by DMS on living cellsThe level of methylation is affected by protein binding to DNA

• 2) Genomic DNA extraction

• 3) Cleavage of methylated residues by piperidine

• 4) LMP-PCR amplification of the region to be analyzedLast amplification cycles performed with a 32P-labeled primer

• 5) Analysis of the PCR products on sequencing gel

1 2 3 4

nsIn V

itro

24h

48h

CD2+CD28

Primary T lymphocytes

- 3700

- 3819

GASp/GATA

EBSp

CGTATACGTAATGCGCATATGCTA

GASd

EBSd

- 3769

- 3757

The GASd/EBSd site is the only motif occupied in vivo in response to CD2+CD28in purified human primary T lymphocytes

TTTCTTCTAGGAAGTACCAAACATTTCTGATAATAGAATTGAGCAATTTCCTGATAAAGAAGATCCTTCATGGTTTGTAAAGACTATTATCTTAACTCGTTAAAGGACTA

GASp/GATAGASd/EBSd EBSp

- 3772 - 3718

The GASd/EBSd motif is the only putative regulatory element within PRRIIImodified in vivo in response to an IL-2-dependent induction

in human T lymphocytes

INDUCIBLE

CONSTITUTIVE

Lecine, P., Algarte, M., Rameil, P., Beadling, C., Bucher, P., Nabholz, M. and Imbert, J. Elf-1 and Stat5 bind to critical element in a new enhancer of the human interleukin-2 receptor alpha gene. Mol.Cell.Biol. 16: 6829-6840; 1996.

STAT5STAT5

STAT5

STAT5

STAT5

α β γc

MEMBRANE

NUCLEUSTARGET GENESCIS, OSM, IL-2RαGAS

JAK1JAK3

IL-2

STAT5

CYTOPLASM

Electrophoretic Mobility Shift Assay(EMSA)

IL-2

GASd/EBSd Site I FcγRI-GAS

C2C1

C3C2C3

1 2 3 4 5 6

- + - + - +

Inducibles complex C2 and C3 are GAS-specificConstitutive complex C1 is EBS-specific

GASdCD25/IL-2Rα GASd/EBSd EMSA probe: TTTCTTCTAGGAAGTACC

AAAGAAGATCCTTCATGGEBSd

Mouse site I TTTCTTCTGAGAAGTACCAAAGAAGACTCTTCATGG

COMP 100X : - - WT

mGAS

dm

EBSd

mGA

Sdm

EBSd

+IL-2

-IL-2

GASd/EBSd

C1C2C3

1 2 3 4 5 6

The constitutive C1 complex is EBS-specific and the inducible C2 and C3compexes are GAS-specific

anti-Stat5banti-Ets-1/2anti-Ets-1anti-Ets-2

1 2 3 4 5 6

- - + - - -- - - + - -- - - - + -- - - - - +

GASd/EBSd

- IL-

2 + IL-2

C2C1

C3

In Kit-225 cells the inducible C3 complex contains Stat5b, Ets-1 and Ets-2

Lécine et al., MCB, 1996

TTCTAGGAAEBS

AGGAAGAS

DNA affinity purification using a GASd/EBSd biotinylated probe

Co-Immunoprecipitation Assay

IL-2

1 2 3 4 5 6 7 8 9

- + - + + +- - +

IP :

WB : Stat5b

Ets1/2 Ets-1 Ets-2 c-Rel TL

83 Stat5b

Stat5b, Ets-1 and Ets-2 interact in vivo in response to IL-2

INDUCTION

27.3 +/- 7.7

11.8 +/- 4.5

8.3 +/- 3.3

9.1 +/- 4.1

8.8 +/- 3.4

4.6 +/- 2.1

1.0 +/- 0.02

0 20 40 60 80

CAT (pg/ml)

IL-2+P+INS

REPORTER

GASd EBSd

pTK4.CAT

EFFECTOR

++

+-

+-

--

+

+

-

+

--+

Stat

5bEt

s-1

Ets-

2

---

---

Functional cooperation between Stat5b, Ets-1 and Ets-2in response to IL-2 +PMA+Ionomycin

ChIP (Chromatin Immunoprecipitation) Assay

Formaldehydecross-linking

Sonication to shear chromatin

Specific Antibodies

Immunoprecipitation

Boundfraction

Unboundfraction

Reversing cross-linking

Semi-quantitative or quantitativePCR Assays

Chromatin immunoprecipitation (ChIP)

M 1 2 3 4 5 6 7 8 9

IL-2 16h 3h

input

input

no Ab

no Ab

Stat5b

Stat5b

p-Stat5b

Ets-1/2

Ets-1/2

Stat5b and Ets-1/2 bind to IL2RA gene within human IL-2rE in vivo

PRRIII ChIP primer design

1 TTCTGCCCTTAGCTTCTACCCCTCTCTACTTCTGGTTAACTATGGACCACACTCTGCTTCBZ3-1 BZ3-2

61 CTCAGGAACCACCTACCAAGGCCGTATCCATCCTTCAAGGACAATACGTGGGCCTTTCCT>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>

121 GATCACATCAGCTCAACAACTTTTCCCTCCTACATTTCAATTGCTCTTCTTACCATAATC>>>

181 ATTAGTATTCACCCCACTGTACGTCTAGAAAGAAAGTGGTCTTAAACCTAAGGGAAGGCA241 GTCTAGGTCAGAAATTTGTTGTCCGCTGTTCTGAGCAGTTTCTTCTAGGAAGTACCAAAC

BZ3-3301 ATTTCTGATAATAGAATTGAGCAATTTCCTGATGAAGTGAGACTCAGCTTGCACTGTTGA

<<<<<<<<<<<<<<<<<<361 CCGGCTGTCCTGGATGAACCTAGTTACTTTTAACCAAATGTTCCTTTCTTGAACTTGTTC

<< 421 CTTTCTTGAACTTAATCTATC

OLIGO start len tm gc% 3' seqBZ3-1 62 20 59.96 55.00 TCAGGAACCACCTACCAAGGBZ3-3 362 20 59.62 55.00 GGTCAACAGTGCAAGCTGAGBZ3-2 104 20 60.71 50.00 ATACGTGGGCCTTTCCTGAT

T LYMPHOCYTE

RESTING IL-2 STIMULATION

Stat5

Ets-1/2

GAS

+ 1

OFF GAS

+ 1

ON

Stat

5

Stat

5

Ets-1/2

X

Stat

5

Stat

5

Ets-1/2

X

Rameil et al., Oncogene, 2000

CD25/IL-2 Rα

+ 1

CD28 Signal 2

?

PRRII(-137~-64)

PRRI(-276~-244)

TCR / CD3Signal 1

Elf-1NF-kB

SRFHMG-I(Y)

IL-2R Signal 3

PRRIII(-3780~-3703)

Stat5a, b

GATA-1-like

IL-2rE(+3389~+3596)

Stat5a, b

HMG-I(Y)

Ets-1/2

Phylogenetic footprinting andDNAse I hypersensitive site mapping

DNase I Hypersensitive assay

DNase I Hypersensitive (DH) sites are created by the structure of chromatin

Chromatin structure:

The structure of DH sites

- not protected by histone octamers

DH sites

B. DNAse I hypersensitive sitesns CD3+CD28CD3 CD28

DH sites

4

3

2

1

DNase I

kb

2.0

2.3

4.46.69.4

C. Restriction map DH sites

14 23IL-2Rα

PRRI+IIPRRIII

BgBg H

Probe

1 kb E Bc S B B H P

EBBB

BH

P

ESBcS

E

4818942

C. Gene reporter assays

1 6 11 16 21 26

CAT induction fold

pTK3

pTK3/ES

pTK3/BcS

pTK3/EB

pTK3/BB

pTK3/BH

481.IIR

481.IIR/BcS

8942.IIR

A. Homo Sapiens/Mus musculus CD25/IL-2Rα gene

0 5 10 15

pGL3

-

-

- -

-

-

- -

pGL3/PRRIV

+

+

CD28 wt

CD28 wt

161

172

181

192

+

+

Δ30

CD28 Δ 30

161

172

CD28 can specifically induce PRRIV transcriptional activity

Luciferase activity

CRE/TRE

ACTCCTCTAGAATTAT

ACTCCTGACGAATTAT

mCRE/TRE

Luciferase

1 3 5 7 9 11 13

CD3+CD28

CD28CD3

pGL3/PRRIV(mCRE/TRE)

pGL3/PRRIV

Induction fold (luciferase)

pSV40

PRRIV

The CRE/TRE within PRRIV is essential for the response of PRRIV to TCR-CD3 and CD28 signals

Control regions of the IL2RΑ gene

PRRII(-137~-64)

PRRI(-276~-244)

Elf-1

HMG-I(Y)

PRRV/IL-2rE(+3389~+3596)

PRRIII/IL-2rE (-3780~-3703)

IL-2/IL2-RSignal 3

Stat5a,b

HMG-I(Y)

Stat5a,b

GATA-1-like

Ets-1/2

B7/CD28Signal 2

PRRIV/CD28rE(-8689~-8483)

Ag/TCR/CD3 Signal 1

NF-κB SRFCREB/ATFAP-1

HMG-I(Y)NFAT Exon 1 NFAT

AP-1AP-1

SBS700

SATB1+1

IV. Les Nouveaux Outils

Analyse à grande échelle des modifications de la chromatine et des interactions entre génomes et facteurs de transcription

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

Gene expression analysis

IV.1. Computational ApproachesPhylogenetic Footprinting

Regulatory Network Modeling

IV.2. Experimental ApproachesChIP-on-chipChIP-SAGEChIP-seq

Chromosome Conformation CaptureCircular Chromosome Conformation Capture

Chromosome Conformation Capture Carbon-Copy

IV. Computational ApproachesPhylogenetic Footprinting

Regulatory Network Modeling

From Bing Ren, Laboratory of Gene Regulation, Ludwig Institute for Cancer Research, UCSD

From Bing Ren, Laboratory of Gene Regulation, Ludwig Institute for Cancer Research, UCSD

Phylogenetic footprinting:multi-species comparison

Méthodologie employée au cours d’une analyse

Coordonnées chromosomiques d’intérêt chez l’Homme

Récupération des séquences nucléiques

Alignement des séquences

Visualisation de l’alignement

Reconstruction phylogénétique

Galaxy (MultiZ)

ClustalX

Seaview

Expertise de l’alignement

Phylip (NJ)PhyML

Visualisation de la conservation des TFBS

A

B

C

D

E

Coordonnées chromosomiques d’intérêt chez l’Homme

Récupération des séquences nucléiques

Alignement des séquences

Visualisation de l’alignement

Reconstruction phylogénétique

Galaxy (MultiZ)

ClustalX

Seaview

Expertise de l’alignement

Phylip (NJ)PhyML

Visualisation de la conservation des TFBS

A

B

C

D

E

Étape 1: ECR Browser

Position chromosomique chez l’Homme

Choix des espèces

Étape 2: Mulan

Étape 3: MultiTF

Toutes les familles de TF dans MultiTF

Vision graphique finale

Étape 1: ECR Browser

Position chromosomique chez l’Homme

Choix des espèces

Étape 2: Mulan

Étape 3: MultiTF

Toutes les familles de TF dans MultiTF

Vision graphique finale

Représentation des différentes étapes réalisées pour l’étude préliminaire

CD25/IL2RA GeneECR Genome Browser on Human Mar. 2006 Assembly

position/search: chr10:6,080,616-6,159,203

PRRI+II

PRRIV

PRRIII

C?

B?

A?

0.02

Eléphant

Vache

Cheval

Chat

Chien

63

55

Cochon-d'inde

Souris

Rat

93

Macaque

Chimpanzé

Homme

96

54

70

40

85

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

0.020.02

Eléphant

Vache

Cheval

Chat

Chien

63

55

Cochon-d'inde

Souris

Rat

93

Macaque

Chimpanzé

Homme

96

54

70

40

85

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Eléphant

Vache

Cheval

Chat

Chien

63

55

Cochon-d'inde

Souris

Rat

93

Macaque

Chimpanzé

Homme

96

54

70

40

85

Eléphant

Vache

Cheval

Chat

Chien

63

55

Cochon-d'inde

Souris

Rat

93

Macaque

Chimpanzé

Homme

96

54

70

40

85

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

Stat5 Stat5 Elf-1 Stat5 Stat5 Elf-1

PRRIII/IL-2rE phylogeny

TSS

PRRVI PRRV A PRRIII PRRI PRRII PRRIV B2B1 C

[-276, -64][-3780, -3703] [+3389, +3596][-7664, –7566][-8689, -8483] [+9468, +9507][-7160, –6997]

[+10956, +11139][+23969, +24023]

TSS

PRRVI PRRV A PRRIII PRRI PRRII PRRIV B2B1 C

[-276, -64][-3780, -3703] [+3389, +3596][-7664, –7566][-8689, -8483] [+9468, +9507][-7160, –6997]

[+10956, +11139][+23969, +24023]

Tatou

Eléphant

Tenrec

Vache

Cheval

Chien

45

Lapin

Cochon-d'inde

Souris

Rat

79

63

Macaque

Chimpanzé

Homme

82

49

75

44

56

68

76

0.05AP1 CREB

AP1 CREB

AP1

AP1 CREB

CREB

CREB

AP1

Tatou

Eléphant

Tenrec

Vache

Cheval

Chien

45

Lapin

Cochon-d'inde

Souris

Rat

79

63

Macaque

Chimpanzé

Homme

82

49

75

44

56

68

76

0.050.05AP1 CREB

AP1 CREB

AP1

AP1 CREB

CREB

CREB

AP1

AP1 CREBAP1 CREB

AP1 CREBAP1 CREB

AP1AP1

AP1 CREBAP1 CREB

CREBCREB

CREBCREB

AP1AP1

PRRVI/CD28RE phylogeny

TSS

PRRVI PRRV A PRRIII PRRI PRRII PRRIV B2B1 C

[-276, -64][-3780, -3703] [+3389, +3596][-7664, –7566][-8689, -8483] [+9468, +9507][-7160, –6997]

[+10956, +11139][+23969, +24023]

TSS

PRRVI PRRV A PRRIII PRRI PRRII PRRIV B2B1 C

[-276, -64][-3780, -3703] [+3389, +3596][-7664, –7566][-8689, -8483] [+9468, +9507][-7160, –6997]

[+10956, +11139][+23969, +24023]

Tableau récapitulatif de la conservation des modules fonctionnels et potentiels du gène IL2RA dans 15 espèces de mammifères

TSS

PRRVI PRRV A PRRIII PRRI PRRII PRRIV B2B1 C

[-276, -64][-3780, -3703] [+3389, +3596][-7664, –7566][-8689, -8483] [+9468, +9507][-7160, –6997]

[+10956, +11139][+23969, +24023]

TSS

PRRVI PRRV A PRRIII PRRI PRRII PRRIV B2B1 C

[-276, -64][-3780, -3703] [+3389, +3596][-7664, –7566][-8689, -8483] [+9468, +9507][-7160, –6997]

[+10956, +11139][+23969, +24023]

IV.2. Experimental ApproachesChIP-on-chipChIP-SAGEChIP-seq

Chromosome Conformation CaptureCircular Chromosome Conformation Capture

Chromosome Conformation Capture Carbon-Copy

Genomic Targets Identification of Specific Transcription FactorsUsing Chromatin Immunoprecipitation

Gene-specific PCRChIP Cloning &

SequencingSAGE-like

GMAT/SACOChIP-seq

Microarray HybridizationChIP-on-chip

Genome-wide approaches (high throughput)Gene-specific approaches (low throughput)

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

Application to Human Genome

From Rémi Houlgatte – Inserm UMR915, Nantes, France

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

From T. Y. Roh, S. Cuddapah, and K. Zhao. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 19 (5):542-552, 2005.

Histone H3 K9/K14 acetylation islands colocalize with all identified PRRwithin IL2RA locus in resting human primary T cells

and suggest the existence of other cis-acting elements not yet identified

V VI

The genome-wide distributions of K9/K14 di-acetylated histone H3 (H3Ac2), K4 trimethylated histone H3 (H3K4me3), and K27 trimethylated histone H3 (H3K27me3) in resting human T lymphocytes were mapped by Genome-wide MApping Technique (GMAT), a combination of chromatin immunoprecipitation and SAGE technique.

The level of the histone modification at a genetic locus is positively correlated with the detection frequency of a 21-bp sequence tag identified by the GMAT analysis. The detection frequency (y-axis) is plotted against the chromosome coordinate (x-axis).

Activating marks of histone 3:- Di-acetylation on lysines 9 and 14 (H3K9acK14ac)- Tri-methylation on lysines 4 (H3K4me3), 36, and 79

Repressive marks on histone H3:- Tri-methylation on lysines 9 and 27 (H3K27me3)

Functional Architecture of the Nucleus

3CChromosome Conformation Capture

J. Dekker, K. Rippe, M. Dekker, and N. Kleckner. Capturing chromosome conformation. Science 295 (5558):1306-1311, 2002.

4CCircular Chromosome Conformation Capture

Z. Zhao, G. Tavoosidana, M. Sjolinder, A. Gondor, P. Mariano, S. Wang, C. Kanduri, M. Lezcano, K. S. Sandhu, U. Singh, V. Pant, V. Tiwari, S. Kurukuti, and R. Ohlsson. Circular chromosome conformation capture (4C) uncoversextensive networks of epigenetically regulated intra- and interchromosomalinteractions. Nat Genet 38 (11):1341-1347, 2006.

5CChromosome Conformation Capture Carbon Copy

J. Dostie, T. A. Richmond, R. A. Arnaout, R. R. Selzer, W. L. Lee, T. A. Honan, E. D. Rubio, A. Krumm, J. Lamb, C. Nusbaum, R. D. Green, and J. Dekker. Chromosome Conformation Capture Carbon Copy (): a 5Cmassively parallel solution for mappinginteractions between genomic elements. Genome Res. 16 (10):1299-1309, 2006.

3C. Requires a prior knowledge of both interacting regions

4C. Requires a prior knowledge of only one interacting region

5C: Chromosome Conformation Capture Carbon Copy

Summary of activation-dependent looping events and a model of transcriptionally active chromatin.

Cai S, Lee CC, Kohwi-Shigematsu T. SATB1 packages densely looped, transcriptionally active chromatin for coordinated expression of cytokine genes.

Nat Genet.38:1278-88, 2006.

Few published examples of physical and functional evidences for nonallelic interaction between chromosomes:

• Interchromosomal interactions and olfactory receptor choice (Lomvardas et al., Cell 1262:403-413, 2006).

• An LCR in IFN-γ locus associated with IL-4 locus on a different chromosome in committed naive T (Spilianakis et al., Nature, 435:637-645, 2005).

• The imprinting control region of the Igf2/H19 locus and the Wsb1/Nf1 gene (Ling et al., Science, 312:269-272, 2006).

A review: F. Savarese and R. Grosschedl. Blurring cis and trans in gene regulation. Cell 126 (2):248-250, 2006.

These nonallelic interchromosomal interactions appear relatively infrequent and transient, and their biological role is still somewhat unclear...

V. Next(Second)-Generation Sequencing

Overview

• The Basic Advantages/Disadvantages

• The Technologies at a glance

• The Application of massively parallel sequencing

From Elaine Mardis, Ph.D., Genome Sequencing Center, St Louis, MO, USA

Advantages of 2nd-Gen Platforms• No sub-cloning, no use of E. coli as host.

- cloning bias abolished- making libraries is more straightforward

• Each sequence is from a unique DNA molecule.- quantitation is possible through “counting”- enhanced dynamic range- detection of rare variants

• Provides exquisite resolution for many types of (input) experiments**.

• Revolutionary (disruptive) improvements in cost and speed of data generation.

• Requires (much) less automation at front end.From Elaine Mardis, Ph.D., Genome Sequencing Center, St Louis, MO, USA

Dis-advantages of Next-Gen Platforms

• Shorter read length sequences are produced.- relative to capillary sequencers- re-parameterization of base calling accuracy- challenges bioinformatics-based analyses

• File sizes traumatize IT infrastructures.- (up to) several Tb of raw data are produced per run- read processing pipelines require off-instrument CPU- decision of what to save vs re-run

• Instrument amortization paradigm shift.• Require (much) less automation at front end.

From Elaine Mardis, Ph.D., Genome Sequencing Center, St Louis, MO, USA

NGS: Next Generation Sequencingtarget 2013: 1X coverage of the human genome at $1,000

Gold Standard until 2006:Sanger sequencing with ABI 3730XL x1 Genetic Analyzer (1 Kb read length,

2.1 Mbp per day, 6x coverage 1 human genome, 18 Gb = ~18 years)

NGS since 2006:Solexa 2G: ~20-30Gb/Run, ~3-9 days run length, ~500 Mb/day, ~50-75 bp

read length, 99.99% accuracyABI SOLiD3: ~20-30Gb/Run, ~4-10 days run length, ~500 Mb/day , ~50-75 bp

read length, 99.94% accuracy454/Roche GS FLX: ~100MB/Run, ~7.5 hours run length, ~400 Mb/week,

~400 bp read length, 99.5% accuracyHeliscope: ~1Gb/hours, ~25-45 bp read length, 99% accuracy

In development:Oxford Nanopore Technologies: single molecule sequencing, ~nGB/Run?,

~2-3 Kb read length, 99.8% accuracy

From E. Mardis. Trends Genet., 24:133-141, 2008

Pyrosequencing

Illumina Genome Analyzer

ABI Next Gen Sequencing: SOLiD

Sequencing byOligonucleotide

Ligation andDetection

Le Système SOLiD™ est un analyseur génétique capable de faire du séquençage massif en parallèle.

The SOLiD™ 3 System

Applied Biosystems SOLiD• custom adapter library• emPCR on magnetic beads• sequencing by ligation usingfluorescent probes from acommon primer• sequential rounds of ligationfrom a series of primers• fixed/known nucleotides foreach probeset identify twobases per sequence read, for“two base encoding”

Current 454 Frags 454 Pairs Solexa Frags Solexa Pairs SOLiD Frags SOLiD PairsRead Length 250 bases 250 bases 36 bp 36 bp 25-35 bp 2 x 25 bpDays Per Run 0.3125 0.3125 3 6 7 10Number of Reads 400,000 400,000 60 Million 60 Million 80 Million 80 MillionGb Per Run (Filtered) 100 MB 100 MB >1 Gbp >2 Gbp 3 Gb 6 GbAverage Insert Size 2-3kb 200 bp 3 kb

Improvements 454 Frags 454 Pairs Solexa Frags Solexa Pairs SOLiD Frags SOLiD PairsRead Length 400 bases 400 bases 36 - 50 bp 36 - 50 bp 25 - 35 bp 2 x 25 bpDays Per Run 0.42 0.42 2.5 5 5 10Number of Reads 400,000 400,000 >90 Million >90 Million 80 Million 80 MillionGb Per Run (Filtered) 100 MB 100 MB >1.5 Gbp >3 Gbp 4 Gb 8 GbAverage Insert Size 2-3kb 200 bp 3 kb

Platform Statistics

Human gene mapping

Qualitative (SNP) and quantitative (amplification) genetic variations

de novo sequencing of model organisms and pathogens

Sequencing complex mixtures of microbial populations (gastrointestinal tract or water monitoring)

Epigenetic marks mapping and identification of regulatory sequences of gene expression (ChIP-seq)

Identification and analysis of non coding RNAs (miRNA, etc.)

Monitoring gene expression in covering all the alternative messengers to a given locus in a variety of contexts

Main NGS applications

Chromatin ImmunoPrecipitation (ChIP)-seq

• genome-wide identification of protein binding sites

• transcription factor bindingsites can indicate genes activated for transcription

• repressor binding sites canindicate genes repressed fromtranscription

• histone binding also can identifysequences available/not fortranscription

• co-investigation of transcribedgenes can provide correlative data

AAAAAA

AAA

AAAAAA

AAAAAA

AAA

AAAAAA

AAA

AAAAAA

AAAAAA

AAA

AAA

AAA

AAAAAA

AAAAAA

AAA

chr21:42,653,000-42,673,000TFF1 TMPRSS3TFF1 TMPRSS3

GIS-PETTranscriptomeGene discovery

ChIP-PETTF binding sitesEpigenetic sites

TFF1 TMPRSS3TFF1 TMPRSS3TFF1 TMPRSS3

chr21:42,653,000-42,673,000

SOLID-PETGenome SVs

Genome assembly

chr1:2,466,948-2,497,767

Reference genome (hg18)

MCF7 genome inversion

TNFRSF14

Reference genome (hg18)

MCF7 genome inversion

TNFRSF14TNFRSF14

Fusion transcripts

SNP*

Histone &Modification

Heterochromatin

mRNA

AAAAAA

AAAAAA

AAAAAA

Inversion

InsertionDeletion

Translocation

Chromatin loop

Euchromatin

Chromosome

alt. tss

ChromatinInteraction

Nucleosome

tss

ATGCGTACGTARNAPIIComplex

ChIA-PETLR chromatinInteractions

chr21:42,653,000-42,673,000TFF1 TMPRSS3TFF1 TMPRSS3TFF1 TMPRSS3

Yijun RUAN

N. D. Heintzman, R. K. Stuart, G. Hon, Y. Fu, C. W. Ching, R. D. Hawkins, L. O. Barrera, Calcar S. Van, C. Qu, K. A. Ching, W. Wang, Z. Weng, R. D. Green, G. E. Crawford, and B. Ren. Distinct and predictive chromatin signatures of transcriptionalpromoters and enhancers in the human genome. Nat Genet 39 (3):311-318, 2007.

Example of ChIP-seq(1)

Example of ChIP-seq (2)A. Barski, S. Cuddapah, K. Cui, T. Y. Roh, D. E. Schones, Z. Wang, G. Wei, I. Chepelev, and K. Zhao. High-resolution profiling of histone methylations in the human genome. Cell 129 (4):823-837, 2007.

Z. Wang, C. Zang, J. A. Rosenfeld, D. E. Schones, A. Barski, S. Cuddapah, K. Cui, T. Y. Roh, W. Peng, M. Q. Zhang, and K. Zhao. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 40 (7):897-903, 2008.

Histone Methylation near TSS

From Barski at al. Cell 129 (4):823-837, 2007.

From Barski at al. Cell 129 (4):823-837, 2007.

Epigenetic Modifications at Insulators and Enhancers

Active gene promoters are characterized by having active modification marks both surrounding and downstream of the TSSs

From Wang,Z., Schones,D.E., and Zhao,K. (2009). Characterization of human epigenomes. Curr. Opin. Genet. Dev.

Mapping PETs to reference genome

Dimerized PET sequencing

454GS FLX

or

Single PET sequencingSolexaGAII

or

Single PET sequencing

ABISOLiD II

or

Concatenated PET sequencing

ABI 3730

The Paired End diTag (PET) strategy for sequencing

Third-Generation Sequencing

Engineered α-hemolysin protein (shown in blue) is introduced into a planar lipid bilayer, which acts as an artificial biological membrane.

The lipid bilayer has a high electrical resistance and so when an electrical potential is applied across this membrane, a current flows only through the nanopore, carried by the ions in salt solutions that bathe both sides of the bilayer.

The lipid bilayer and nanopore are placed in a well that contains two electrodes on either side of the bilayer.

DNA sample is introduced into the top layer. As the exonuclease (shown here in green) directs individual DNA bases, in sequence, through the nanopore, each base transiently binds at the binding site (cyclodextrin, shown here in red).

During the binding event, the current through the nanopore is disturbed, creating a characteristic signal for each type of base. The signal for each base can be easily distinguished.

The electrical current trace provides a record of the sequence of bases passing through the nanopore.

TGS: Third-Generation Sequencing (1)

From Oxford Nanopore Technologies 2009 (http://www.nanoporetech.com/)

TGS: Third-Generation Sequencing (2)

To achieve high-throughput sequencing, this system will be run in parallel in an array chip.

Multiple microwells are arrayed on silicon, with individual nanopore sequencing units in each well.

Fragmented DNA is sequenced in parallel in multiple wells. Long read lengths are possible.

Data gathered from each well is combined for data reassembly.

From Oxford Nanopore Technologies 2009 (http://www.nanoporetech.com/)

TGS: Third-Generation Sequencing (3)

The instrumentation required to operate the array chip and record the resulting electrical signals does not require optics.

Direct electrical detection and potential long read lengths promise simpler bioinformatics.

From Oxford Nanopore Technologies 2009 (http://www.nanoporetech.com/)

From Oxford Nanopore Technologies 2009 (http://www.nanoporetech.com/)

TGS: Third-Generation Sequencing (4)

α-hemolysin nanopore (ribbon diagram) with covalently attached cyclodextrin (teal blue)transiently binds a base (red) traversing the pore.

From Clarke et al. Nat. Nanotechnol. 4:265-270, 2009

TGS: Third-Generation Sequencing (5)

Structures of haemolysin mutants

From Clarke et al. Nat. Nanotechnol. 4:265-270, 2009

TGS: Third-Generation Sequencing (6)

Nucleotide event distributions

Informatics

Challenges of short read re-sequencing

• Many short reads cannot be uniquely aligned because they map to multiple regions in the genome. • RepeatMasker does not identify many of these 30-50 bp “micro-repeats”.• The size and complexity of the human genome requires extra caution to ensure variant-containing reads are accurately placed and that multiply-placed reads are not further considered.

• Laboratory Information Management SystemTrack samplesTrack laboratory processes in the databaseGenerate reports

• All Information Management SystemTrack analysisTrack disk spaceSchedule batch processes

• The sheer number and variety of programs, command line options, work flows, versions, platforms, runs, etc. make it infeasible, even undesirable, to settle on a single solution at present.

• As such, we are tracking everything in a detailed way.

From LIMS to AIMS

Conclusions• The Second-Generation Sequencing can address human

genomes either by whole genome sequencing or by targeted approaches. The cost gap of these two approaches is narrowing. It will be most probably filled in shortly by nano-sequencing, the Third-Generation Sequencing.

• Many challenges remain for accurate bioinformatics-based analysis pipelines that map read, discover mutations and indels, and correlate data across samples.

Planned NGS ApplicationsPlanned NGS Applicationsat Hotel Express Transcriptome platformat Hotel Express Transcriptome platformhttp://http://tagc.univtagc.univ--mrs.frmrs.fr

IBiSA - InsermPlate-forme

From Myers and Wold, Nat Methods, 5:19-21, 2008

IV. Glossaire

Matrix attachment region (MAR)/scaffold attachment region (SAR): DNA sequence that binds the nuclear scaffold and can affect transcription. These elements form higher-order looped structures within chromosomes and influence gene expression by separating chromosomes into regulatory domains.

Silencer: Control element that suppresses gene expression independent of orientation or distance.

Insulator (also boundary element): Insulator elements affect gene expression by preventing the spread of heterochromatin and restricting transcriptional enhancers from activation of unrelated promoters. In vertebrates, insulator’s function requires association with the CCCTC-binding factor (CTCF), a protein that recognizes long and diverse nucleotide sequences.

Locus control region (LCR): Confers tissue-specific temporally regulated expression of linked genes. LCRs function independently of position, but they are copy number dependent and open the nucleosome structure so that other factors can bind. LCRs affect replication timing and origin usage.

Enhancer: Control element that elevates the levels of transcription from a promoter, independent of orientation or distance.

Promoter: Sequence of DNA near the 5' end of a gene that acts as a binding site for RNA polymerase and from which transcription is initiated.

cis-acting regulatory elements

Components of Eukaryotic Promotersand Regulatory Regions

• Site selector elements TATA-box, Initiator• Common upstream elements CCAAT-box, GC-box• Regulatory elements HSE, SRE, GRE, etc.

• Enhancers / Silencers• Locus control regions (LCRs)• Scaffold / Matrix attachment sites (SARs / MARs)• Insulator (CTCF)

• CpG islands

Promoter Regulatory Elements:Features and Facts

• Degenerate sequence motifs• Length: 6 to 20 bp• Low complexity (8-12 bits)• Binding sites of transcription factors• Excess of binding sites over binding proteins in the nucleus• Most in vitro binding sites not functional in vivo• Some in vivo binding sites also not functional• Regulatory potentials depends on cooperative effects between

multiple elements

V. Informations supplémentaires

Copyright ©2006 American Association for Cancer Research

Wu, J. et al. Cancer Res 2006;66:6899-6902

Strategies for the Design of Microarrays for the Human Genome

Single gene or selected regions

Horak et al. (2002) PNAS 99:2924-2029Overlapping PCR products

Cawley et al. (2004) 166:459-509Tiling oligonucleotides

Gene collection (ex. Refseq)

Blais et al. (2005) Genes & Dev 19:1-17

Boyer et al. (2005) Cell 122:947-956

1 kb PCR products

60-mer covering -8 kb, +2 kb for 17,917 annotated human genes

• CpG dinucleotides are present at 20% of predicted frequency• CpG islands: >200 bp long, >50 %G+C, CpG >0.6 predicted• CpG islands account for 1% of the genome• 29,000 CpG islands are predicted in the human genome• ~60% of known genes have a CpG island near 5’ end• CpG island microarrays are promoter- and regulatory region-enriched arrays

29,000 CpG islands are predicted in the human genome

CpG islands Weinmann et al. (2002) Genes & Dev. 16:235-244Oberley et al. (2004) Methods Enzymol. 376:315-334

From Rémi Houlgatte – Inserm UMR915, Nantes, France

From Rémi Houlgatte – INSERM ERM206, Marseille, France

CpG Island Localization

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

From Julia Zeitlinger - Whitehead Institute, UC Davis, USA

From Boyer, L. A., T. I. Lee, M. F. Cole, S. E. Johnstone, S. S. Levine, J. P. Zucker, M. G. Guenther, R. M. Kumar, H. L. Murray, R. G. Jenner, D. K. Gifford, D. A. Melton, R. Jaenisch, and R. A. Young. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122:947-956.

From T. Y. Roh, W. C. Ngau, K. Cui, D. Landsman, and K. Zhao. High-resolution genome-wide mapping of histone modifications. Nat Biotechnol. 22 (8):1013-1016, 2004.

Genome-Wide Mapping Technique (GMAT)Or Serial Analysis of Chromatin Occupancy (SACO)

Alternative Approaches…

From Bas van Steensel, Netherland Cancer Institute

From Bas van Steensel, Netherlands Cancer Institute, Amsterdam

From E. Mardis. Trends Genet., 24:133-141, 2008

Polymerase-based sequencing-by-synthesis

RNA Sequencing

RNA isolateSize selection for

nc RNA classes

Fragment,RT w/randoms

polyA priming,RT, ds DNA SAGE tags

Adapter-ligated fragmentsfor 2nd-gen sequencing

Alignment to reference database& discovery

Preuves du concept chez la levure :Ren, B., F. Robert, J. J. Wyrick, O. Aparicio, E. G. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, T. L. Volkert, C. J. Wilson, S. P. Bell, and R. A. Young. 2000. Genome-wide location and function of DNA binding proteins. Science 290:2306-2309.Wells, J., K. E. Boyd, C. J. Fry, S. M. Bartley, and P. J. Farnham. 2000. Target gene specificity of E2F and pocket protein family members in living cells. Mol. Cell. Biol. 20:5797-5807.

Etudes chez les mammifères (y compris Homo Sapiens) :Horak, C. E., M. C. Mahajan, N. M. Luscombe, M. Gerstein, S. M. Weissman, and M. Snyder. 2002. GATA-1 binding sites mapped in the beta-globin locus by using mammalian chIp-chip analysis. Proc. Natl. Acad. Sci. U. S A. 99:2924-2929.Weinmann, A. S., P. S. Yan, M. J. Oberley, T. H. Huang, and P. J. Farnham. 2002. Isolating human transcription factor targets by couplingchromatin immunoprecipitation and CpG island microarray analysis. Genes Dev. 16:235-244.Kirmizis, A., S. M. Bartley, A. Kuzmichev, R. Margueron, D. Reinberg, R. Green, and P. J. Farnham. 2004. Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27. Genes Dev. 18:1592-1605.Heisler, L. E., D. Torti, P. C. Boutros, J. Watson, C. Chan, N. Winegarden, M. Takahashi, P. Yau, T. H. Huang, P. J. Farnham, I. Jurisica, J. R. Woodgett, R. Bremner, L. Z. Penn, and S. D. Der. 2005. CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome. Nucleic Acids Res. 33:2952-2961.Kim, T. H., L. O. Barrera, M. Zheng, C. Qu, M. A. Singer, T. A. Richmond, Y. Wu, R. D. Green, and B. Ren. 2005. A high-resolution map of active promoters in the human genome. Nature 436:876-880.

Réseaux de régulations transcriptionnelles :Levure : Lee, T. I., N. J. Rinaldi, F. Robert, D. T. Odom, Z. Bar-Joseph, G. K. Gerber, N. M. Hannett, C. T. Harbison, C. M. Thompson, I. Simon, J. Zeitlinger, E. G. Jennings, H. L. Murray, D. B. Gordon, B. Ren, J. J. Wyrick, J. B. Tagne, T. L. Volkert, E. Fraenkel, D. K. Gifford, and R. A. Young. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799-804.Homo sapiens : Boyer, L. A., T. I. Lee, M. F. Cole, S. E. Johnstone, S. S. Levine, J. P. Zucker, M. G. Guenther, R. M. Kumar, H. L. Murray, R. G. Jenner, D. K. Gifford, D. A. Melton, R. Jaenisch, and R. A. Young. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122:947-956.

Revues et méthodes :Oberley, M. J., J. Tsao, P. Yau, and P. J. Farnham. 2004. High-throughput screening of chromatin immunoprecipitates using CpG-island microarrays. Methods Enzymol. 376:315-334.Ren, B. and B. D. Dynlacht. 2004. Use of chromatin immunoprecipitation assays in genome-wide location analysis of mammalian transcription factors. Methods Enzymol. 376:304-315.

Bioinformatique des promoteurs et régions régulatrices :Liu, Y., L. Wei, S. Batzoglou, D. L. Brutlag, J. S. Liu, and X. S. Liu. 2004. A suite of web-based programs to search for transcriptional regulatory motifs. Nucleic Acids Res. 32:W204-W207.

Quelques références...

Some referencesKiriakidou, M., P. T. Nelson, A. Kouranov, P. Fitziev, C. Bouyioukos, Z. Mourelatos, and A. Hatzigeorgiou. 2004. A combined computational-experimental approach predicts human microRNA targets. Genes Dev. 18:1165-1178.

Barski, A., S. Cuddapah, K. Cui, T. Y. Roh, D. E. Schones, Z. Wang, G. Wei, I. Chepelev, and K. Zhao. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129:823-837.Dahl, F., J. Stenberg, S. Fredriksson, K. Welch, M. Zhang, M. Nilsson, D. Bicknell, W. F. Bodmer, R. W. Davis, and H. Ji. 2007. Multigene amplification and massively parallel sequencing for cancer mutation discovery. Proc. Natl. Acad. Sci. U. S. A 104:9387-9392.Euskirchen, G. M., J. S. Rozowsky, C. L. Wei, W. H. Lee, Z. D. Zhang, S. Hartman, O. Emanuelsson, V. Stolc, S. Weissman, M. B. Gerstein, Y. Ruan, and M. Snyder. 2007. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 17:898-909.Johnson, D. S., A. Mortazavi, R. M. Myers, and B. Wold. 2007. Genome-wide mapping of in vivo protein-DNA interactions. Science 316:1497-1502.Lin, C. Y., V. B. Vega, J. S. Thomsen, T. Zhang, S. L. Kong, M. Xie, K. P. Chiu, L. Lipovich, D. H. Barnett, F. Stossi, A. Yeo, J. George, V. A. Kuznetsov, Y. K. Lee, T. H. Charn, N. Palanisamy, L. D. Miller, E. Cheung, B. S. Katzenellenbogen, Y. Ruan, G. Bourque, C. L. Wei, and E. T. Liu. 2007. Whole-genome cartography of estrogen receptor alpha binding sites. PLoS. Genet. 3:e87.Lu, C., B. C. Meyers, and P. J. Green. 2007. Construction of small RNA cDNA libraries for deep sequencing. Methods 43:110-117.Mardis, E. R. 2007. ChIP-seq: welcome to the new frontier. Nat. Methods 4:613-614.Porreca, G. J., K. Zhang, J. B. Li, B. Xie, D. Austin, S. L. Vassallo, E. M. LeProust, B. J. Peck, C. J. Emig, F. Dahl, Y. Gao, G. M. Church, and J. Shendure. 2007. Multiplex amplification of large sets of human exons. Nat. Methods 4:931-936.Schmid, C. D. and P. Bucher. 2007. ChIP-Seq data reveal nucleosome architecture of human promoters. Cell 131:831-832.Zhao, X. D., X. Han, J. L. Chew, J. Liu, K. P. Chiu, A. Choo, Y. L. Orlov, W. K. Sung, A. Shahab, V. A. Kuznetsov, G. Bourque, S. Oh, Y. Ruan, H. H. Ng, and C. L. Wei. 2007. Whole-genome mapping of histone h3 lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells. Cell Stem Cell 1:286-298.Chi, K. R. 2008. The year of sequencing. Nat. Methods 5:11-14.Mardis, E. R. 2008. The impact of next-generation sequencing technology on genetics. Trends Genet. 24:133-141.Pop, M. and S. L. Salzberg. 2008. Bioinformatics challenges of new sequencing technology. Trends Genet. 24:142-149.Schones, D. E., K. Cui, S. Cuddapah, T. Y. Roh, A. Barski, Z. Wang, G. Wei, and K. Zhao. 2008. Dynamic regulation of nucleosome positioning in the human genome. Cell 132:887-898.Schuster, S. C. 2008. Next-generation sequencing transforms today's biology. Nat. Methods 5:16-18.Shendure, J. A., G. J. Porreca, and G. M. Church. 2008. Overview of DNA sequencing strategies. Curr. Protoc. Mol. Biol. Chapter 7:Unit.von Bubnoff, A. 2008. Next-generation sequencing: the race is on. Cell 132:721-723.Wold, B. and R. M. Myers. 2008. Sequence census methods for functional genomics. Nat. Methods 5:19-21.

Regulatory region databases

PromoSer: The mammalian promoter service ORegAnno Open Regulatory Annotation:http://biowulf.bu.edu/zlab/PromoSer http://www.oreganno.org

PAZAR: A public database of transcription factor and regulatory sequence annotationhttp://www.pazar.info

Regulatory region analysis

PIPMaker: VISTA Tools: mVISTA and rVISTAhttp://pipmaker.bx.psu.edu/pipmaker http://genome.lbl.gov/vista/

DCODE.org Comparative Genomics Center: Comparing genomes to decipher the code of gene regulationhttp://www.dcode.org

Genomatix software GmbH:http://www.genomatix.de

General databases and tools

UCSC Genome Browser: Galaxy website:http://genome.ucsc.edu http://www.bx.psu.edu/cgi-bin/trac.cgi

IdConvert: http://idconverter.bioinfo.cnio.es

DNA fractionation

2010

5

21

0.5

Kb

1kb, 5kb, 10kb, 20kbB

B

B

B

SAB

BB

B

NNNN

The SOLID-PET ApproachCancer genomic DNA

circularization

EcoP15I cut

Purification & PCR

SOLiD sequencing

Mapping PETs to reference genome

PET Mapping

Reference genome sequence

Paired end tag (PET)

Yijun RUAN

Characteristics of the SOLiD™ SystemScalability, Throughput & Flexibility

1. # of Samples per Slide2. Sample Multiplexing

(20 Barcodes)Scalability

Increasing bead density

# of Beads per SlideThroughput # of Slides

2 Independent Flow CellsFlexibility

From 55’000 to 120’000 beads / panel

Experiment Specific Prep Fragmentation

SOLiD™ Sequencing WorkflowOpen, Flexible & Standardized

SampleCollection

LibraryConstruction

ePCR &Deposition

SequencingReaction

DataAnalysis

Your DNA or RNA Sample Collection and Purification Method

Emulsion PCR Deposition

Imaging

Image Analysis Color Calling Alignement Results

ChIP Seq

CGH Seq (CNV)Methylation Studies

Meta-Genomics

Whole TranscriptomeSmall RNA Profiling

Single Cell Transcriptome5’-SAGE / 3’-SAGE / CAGE

Whole Genome Re-sequencingTargeted Re-sequencing

Deep SequencingDe Novo Sequencing

SOLiDApplied Biosystems

Examples for Coverage, Throughput & MultiplexingEstimation of # of samples based on current Throughput

(10 Gb/slide) and # of Tags1 Compartment10 Gb / 200 M beads

Human Whole Human Whole Genome Genome

(5kb mate-pair library2x50bp,~3x Coverage!)

OR

WholeWholeTranscriptomeTranscriptome

(50 bp fragment library,1 – 4 Samples)

4 Compartments8 Gb / 160 M beads

Bacterial Whole GEX Profiling (Gene Exp)

Sample 1 - Normal

Whole MicrobialGenome Sequencing(5 MB, , 100X coverage)

Samples 14 - 18

Deep Re-Sequencing (e.g 100 kb target,

1’000X coverage/sample)Sample 3 - 13 (Pool)

Small RNA Profiling (Gene Exp)

Sample 2 - Tumor

1-2M Tags per Sample

e.g ~10 Mb

10M Tags

1–2M Tags per Sample

8 Compartments6.6 Gb / 132 M beads

ChIP1-8

3’-SAGEProfiling

1-8

ChIP9-16

CAGEProfiling

9 - 16

SAGEProfiling

11-20

BactRe-Seq

4Mb, 100x

SNPDiscovery10Mb, 28x

Global Methylation

Sample B

Global Methylation

Sample A

16.5 M mapped tags / compartment40 M mapped tags / compartment

Application Sample Prep Library Prep Sequencing Analysis

Developing Solutions that Take You fromSample to Results

• Third Party Tools: Softgenetics(NextGENe)

• SOLiD™ Analyzer• SOLiD™ Fragment Library

Sequencing Kit

• Small Sample Protocol• NA

ChIP

• AB tools: Corona, Map. GFF, SRF • SOLiD™ Analyzer• SOLiD™ Mate-Paired Library

Sequencing Kit

• SOLiD™ Mate-Paired LibraryOligos Kit

• Sample Multiplex Analysis

• BloodPrep® DNA Chemistry

• Agilent Array Enrichment• LR PCR

Resequencing

• Academic Tools: Shrimp, Velvet• Third Party Tools: Softgenetics

(NextGENe)

• SOLiD™ Analyzer• SOLiD™ Mate-Paired Library

Sequencing Kit

• SOLiD™ Mate-Paired LibraryOligos Kit

• BloodPrep® DNA Chemistry

de Novo

• AB Tools: WT Analysis tools- coming soon

• SOLiD™ Analyzer• SOLiD™ Mate-Paired Library

Sequencing Kit

• SOLiD™ Whole Transcriptome Analysis kit –coming soon

• MagMax™ Total RNA Isolatin Kits

Whole Transcriptome

• AB tools: RNA2 Map. GFF, SRF • Third Party tools: InterRNA

• SOLiD™ Analyzer• SOLiD™ Fragment Library

Sequencing Kit

• SOLiD™ Small RNA Expression kit

• Sample Multiplex Analysis

• mirVana™ miRNA Isolation Kit

Small RNA

SOLiD™ 3 System Specification Summary

• 800 Dedicated Service and Support• Large SOLiD™ Support Team incl. Bioinformatics Spec.

Service and Support

• Two independent flow cells process two slides per run• Open slide format with 1–8 samples / slide• 20 barcodes 160 samples / Slide, 320 samples / run

Scalable/ Flexible

• Fragment – up to 50 bases (R&D demonstrated: 75 bp)• Mate-paired – up to 2x50 bases (Insert Sizes: 0.6–10 kB)

Read Length

• Whole Genome and Targeted Resequencing, …• Gene Expression, Transcriptome Analysis,

ChIP Seq, Methylation Analysis, Structural Variation and • de novo sequencing

Applications(Protocols, kits)

• 35 bp Fragment 3.5 days, 50 bp Fragment 5-6 days• 2 x 50 bp Mate-Pair Run 12 – 14 daysRun Time

• Fragment: 10-15 GB / 200-300 M tags (mappable)• Mate-paired: 20–30GB / 400M-600 M tags (mappable)

• Overall sequence 99.94%• Consensus 99.999% @15X

Accuracy

Throughput / Run

SOLiD™ System List Prices per Slide & Sample

Selected Genome-Wide Studies (NGS, etc.) 2006-2009

Hawkins,R.D. and Ren,B. (2006). Genome-wide location analysis: insights on transcriptional regulation. Hum. Mol. Genet 15 Spec No 1, R1-R7.

Kim,T.H. and Ren,B. (2006). Genome-Wide Analysis of Protein-DNA Interactions. Annu. Rev Genomics Hum. Genet 7, 81-102.

Kim,T.H. and Ren,B. (2006). An all-round view of eukaryotic transcription. Genome Biol. 7, 323.

Loh,Y.H., Wu,Q., Chew,J.L., Vega,V.B., Zhang,W., Chen,X., Bourque,G., George,J., Leong,B., Liu,J., Wong,K.Y., Sung,K.W., Lee,C.W., Zhao,X.D., Chiu,K.P., Lipovich,L., Kuznetsov,V.A., Robson,P., Stanton,L.W., Wei,C.L., Ruan,Y., Lim,B., and Ng,H.H. (2006). The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat Genet 38, 431-440.

Roh,T.Y., Cuddapah,S., Cui,K., and Zhao,K. (2006). The genomic landscape of histone modifications in human T cells. Proceedings of the National Academy of Sciences 103, 15782-15787.

Zeller,K.I., Zhao,X., Lee,C.W., Chiu,K.P., Yao,F., Yustein,J.T., Ooi,H.S., Orlov,Y.L., Shahab,A., Yong,H.C., Fu,Y., Weng,Z., Kuznetsov,V.A., Sung,W.K., Ruan,Y., Dang,C.V., and Wei,C.L. (2006). Global mapping of c-Myc binding sites and target gene networks in human B cells. Proc. Natl. Acad. Sci. U. S. A 103, 17834-17839.

Barski,A., Cuddapah,S., Cui,K., Roh,T.Y., Schones,D.E., Wang,Z., Wei,G., Chepelev,I., and Zhao,K. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837.

Birney,E., Stamatoyannopoulos,J.A., Dutta,A., Guigo,R., Gingeras,T.R., Margulies,E.H., Weng,Z., Snyder,M., Dermitzakis,E.T., Thurman,R.E., Kuehn,M.S., Taylor,C.M., Neph,S., Koch,C.M., Asthana,S., Malhotra,A., Adzhubei,I., Greenbaum,J.A., Andrews,R.M., Flicek,P., Boyle,P.J., Cao,H., Carter,N.P., Clelland,G.K., Davis,S., Day,N., Dhami,P., Dillon,S.C., Dorschner,M.O., Fiegler,H., Giresi,P.G., Goldy,J., Hawrylycz,M., Haydock,A., Humbert,R., James,K.D., Johnson,B.E., Johnson,E.M., Frum,T.T., Rosenzweig,E.R., Karnani,N., Lee,K., Lefebvre,G.C., Navas,P.A., Neri,F., Parker,S.C., Sabo,P.J., Sandstrom,R., Shafer,A., Vetrie,D., Weaver,M., Wilcox,S., Yu,M., Collins,F.S., Dekker,J., Lieb,J.D., Tullius,T.D., Crawford,G.E., Sunyaev,S., Noble,W.S., Dunham,I., Denoeud,F., Reymond,A., Kapranov,P., Rozowsky,J., Zheng,D., Castelo,R., Frankish,A., Harrow,J., Ghosh,S., Sandelin,A., Hofacker,I.L., Baertsch,R., Keefe,D., Dike,S., Cheng,J., Hirsch,H.A., Sekinger,E.A., Lagarde,J., Abril,J.F., Shahab,A., Flamm,C., Fried,C., Hackermuller,J., Hertel,J., Lindemeyer,M., Missal,K., Tanzer,A., Washietl,S., Korbel,J., Emanuelsson,O., Pedersen,J.S., Holroyd,N., Taylor,R., Swarbreck,D., Matthews,N., Dickson,M.C., Thomas,D.J., Weirauch,M.T., Gilbert,J., Drenkow,J., Bell,I., Zhao,X., Srinivasan,K.G., Sung,W.K., Ooi,H.S., Chiu,K.P., Foissac,S., Alioto,T., Brent,M., Pachter,L., Tress,M.L., Valencia,A., Choo,S.W., Choo,C.Y., Ucla,C., Manzano,C., Wyss,C., Cheung,E., Clark,T.G., Brown,J.B., Ganesh,M., Patel,S., Tammana,H., Chrast,J., Henrichsen,C.N., Kai,C., Kawai,J., Nagalakshmi,U., Wu,J., Lian,Z., Lian,J., Newburger,P., Zhang,X., Bickel,P., Mattick,J.S., Carninci,P., Hayashizaki,Y., Weissman,S., Hubbard,T., Myers,R.M., Rogers,J., Stadler,P.F., Lowe,T.M., Wei,C.L., Ruan,Y., Struhl,K., Gerstein,M., Antonarakis,S.E., Fu,Y., Green,E.D., Karaoz,U., Siepel,A., Taylor,J., Liefer,L.A., Wetterstrand,K.A., Good,P.J., Feingold,E.A., Guyer,M.S., Cooper,G.M., Asimenos,G., Dewey,C.N., Hou,M., Nikolaev,S., Montoya-Burgos,J.I., Loytynoja,A., Whelan,S., Pardi,F., Massingham,T., Huang,H., Zhang,N.R., Holmes,I., Mullikin,J.C., Ureta-Vidal,A., Paten,B., Seringhaus,M., Church,D., Rosenbloom,K., Kent,W.J., Stone,E.A., Batzoglou,S., Goldman,N., Hardison,R.C., Haussler,D., Miller,W.,

2

Sidow,A., Trinklein,N.D., Zhang,Z.D., Barrera,L., Stuart,R., King,D.C., Ameur,A., Enroth,S., Bieda,M.C., Kim,J., Bhinge,A.A., Jiang,N., Liu,J., Yao,F., Vega,V.B., Lee,C.W., Ng,P., Shahab,A., Yang,A., Moqtaderi,Z., Zhu,Z., Xu,X., Squazzo,S., Oberley,M.J., Inman,D., Singer,M.A., Richmond,T.A., Munn,K.J., Rada-Iglesias,A., Wallerman,O., Komorowski,J., Fowler,J.C., Couttet,P., Bruce,A.W., Dovey,O.M., Ellis,P.D., Langford,C.F., Nix,D.A., Euskirchen,G., Hartman,S., Urban,A.E., Kraus,P., Van,C.S., Heintzman,N., Kim,T.H., Wang,K., Qu,C., Hon,G., Luna,R., Glass,C.K., Rosenfeld,M.G., Aldred,S.F., Cooper,S.J., Halees,A., Lin,J.M., Shulha,H.P., Zhang,X., Xu,M., Haidar,J.N., Yu,Y., Ruan,Y., Iyer,V.R., Green,R.D., Wadelius,C., Farnham,P.J., Ren,B., Harte,R.A., Hinrichs,A.S., Trumbower,H., and Clawson,H. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816.

Chiu,K.P., Ariyaratne,P., Xu,H., Tan,A., Ng,P., Liu,E.T., Ruan,Y., Wei,C.L., and Sung,W.K. (2007). Pathway aberrations of murine melanoma cells observed in Paired-End diTag transcriptomes. BMC. Cancer 7, 109.

Collins,P.J., Kobayashi,Y., Nguyen,L., Trinklein,N.D., and Myers,R.M. (2007). The ets-Related Transcription Factor GABP Directs Bidirectional Transcription. PLoS. Genet. 3, e208.

Cooper,S.J., Trinklein,N.D., Nguyen,L., and Myers,R.M. (2007). Serum response factor binding sites differ in three human cell types. Genome Res. 17, 136-144.

Denoeud,F., Kapranov,P., Ucla,C., Frankish,A., Castelo,R., Drenkow,J., Lagarde,J., Alioto,T., Manzano,C., Chrast,J., Dike,S., Wyss,C., Henrichsen,C.N., Holroyd,N., Dickson,M.C., Taylor,R., Hance,Z., Foissac,S., Myers,R.M., Rogers,J., Hubbard,T., Harrow,J., Guigo,R., Gingeras,T.R., Antonarakis,S.E., and Reymond,A. (2007). Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 17, 746-759.

Euskirchen,G.M., Rozowsky,J.S., Wei,C.L., Lee,W.H., Zhang,Z.D., Hartman,S., Emanuelsson,O., Stolc,V., Weissman,S., Gerstein,M.B., Ruan,Y., and Snyder,M. (2007). Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 17, 898-909.

Heintzman,N.D., Stuart,R.K., Hon,G., Fu,Y., Ching,C.W., Hawkins,R.D., Barrera,L.O., Van,C.S., Qu,C., Ching,K.A., Wang,W., Weng,Z., Green,R.D., Crawford,G.E., and Ren,B. (2007). Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39, 311-318.

Jeck,W.R., Reinhardt,J.A., Baltrus,D.A., Hickenbotham,M.T., Magrini,V., Mardis,E.R., Dangl,J.L., and Jones,C.D. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics. 23, 2942-2944.

Johnson,D.S., Mortazavi,A., Myers,R.M., and Wold,B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497-1502.

Kim,T.H., Abdullaev,Z.K., Smith,A.D., Ching,K.A., Loukinov,D.I., Green,R.D., Zhang,M.Q., Lobanenkov,V.V., and Ren,B. (2007). Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231-1245.

Kuznetsov,V.A., Orlov,Y.L., Wei,C.L., and Ruan,Y. (2007). Computational analysis and modeling of genome-scale avidity distribution of transcription factor binding sites in chip-pet experiments. Genome Inform. 19, 83-94.

Lim,C.A., Yao,F., Wong,J.J., George,J., Xu,H., Chiu,K.P., Sung,W.K., Lipovich,L., Vega,V.B., Chen,J., Shahab,A., Zhao,X.D., Hibberd,M., Wei,C.L., Lim,B., Ng,H.H., Ruan,Y., and Chin,K.C.

3

(2007). Genome-wide mapping of RELA(p65) binding identifies E2F1 as a transcriptional activator recruited by NF-kappaB upon TLR4 activation. Mol. Cell 27, 622-635.

Lin,C.Y., Vega,V.B., Thomsen,J.S., Zhang,T., Kong,S.L., Xie,M., Chiu,K.P., Lipovich,L., Barnett,D.H., Stossi,F., Yeo,A., George,J., Kuznetsov,V.A., Lee,Y.K., Charn,T.H., Palanisamy,N., Miller,L.D., Cheung,E., Katzenellenbogen,B.S., Ruan,Y., Bourque,G., Wei,C.L., and Liu,E.T. (2007). Whole-genome cartography of estrogen receptor alpha binding sites. PLoS. Genet. 3, e87.

Lin,J.M., Collins,P.J., Trinklein,N.D., Fu,Y., Xi,H., Myers,R.M., and Weng,Z. (2007). Transcription factor binding and modified histones in human bidirectional promoters. Genome Res. 17, 818-827.

Mardis,E.R. (2007). ChIP-seq: welcome to the new frontier. Nat. Methods 4, 613-614.

Ruan,Y., Ooi,H.S., Choo,S.W., Chiu,K.P., Zhao,X.D., Srinivasan,K.G., Yao,F., Choo,C.Y., Liu,J., Ariyaratne,P., Bin,W.G., Kuznetsov,V.A., Shahab,A., Sung,W.K., Bourque,G., Palanisamy,N., and Wei,C.L. (2007). Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res. 17, 828-838.

Xi,H., Shulha,H.P., Lin,J.M., Vales,T.R., Fu,Y., Bodine,D.M., McKay,R.D., Chenoweth,J.G., Tesar,P.J., Furey,T.S., Ren,B., Weng,Z., and Crawford,G.E. (2007). Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet. 3, e136.

Zhao,X.D., Han,X., Chew,J.L., Liu,J., Chiu,K.P., Choo,A., Orlov,Y.L., Sung,W.K., Shahab,A., Kuznetsov,V.A., Bourque,G., Oh,S., Ruan,Y., Ng,H.H., and Wei,C.L. (2007). Whole-genome mapping of histone h3 lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells. Cell Stem Cell 1, 286-298.

Zheng,D., Frankish,A., Baertsch,R., Kapranov,P., Reymond,A., Choo,S.W., Lu,Y., Denoeud,F., Antonarakis,S.E., Snyder,M., Ruan,Y., Wei,C.L., Gingeras,T.R., Guigo,R., Harrow,J., and Gerstein,M.B. (2007). Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 17, 839-851.

Bourque,G., Leong,B., Vega,V.B., Chen,X., Lee,Y.L., Srinivasan,K.G., Chew,J.L., Ruan,Y., Wei,C.L., Ng,H.H., and Liu,E.T. (2008). Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 18, 1752-1762.

Chen,X., Xu,H., Yuan,P., Fang,F., Huss,M., Vega,V.B., Wong,E., Orlov,Y.L., Zhang,W., Jiang,J., Loh,Y.H., Yeo,H.C., Yeo,Z.X., Narang,V., Govindarajan,K.R., Leong,B., Shahab,A., Ruan,Y., Bourque,G., Sung,W.K., Clarke,N.D., Wei,C.L., and Ng,H.H. (2008). Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106-1117.

Fullwood,M.J., Tan,J.J., Ng,P.W., Chiu,K.P., Liu,J., Wei,C.L., and Ruan,Y. (2008). The use of multiple displacement amplification to amplify complex DNA libraries. Nucleic Acids Res. 36, e32.

Hillier,L.W., Marth,G.T., Quinlan,A.R., Dooling,D., Fewell,G., Barnett,D., Fox,P., Glasscock,J.I., Hickenbotham,M., Huang,W., Magrini,V.J., Richt,R.J., Sander,S.N., Stewart,D.A., Stromberg,M., Tsung,E.F., Wylie,T., Schedl,T., Wilson,R.K., and Mardis,E.R. (2008). Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods 5, 183-188.

4

Johnson,D.S., Li,W., Gordon,D.B., Bhattacharjee,A., Curry,B., Ghosh,J., Brizuela,L., Carroll,J.S., Brown,M., Flicek,P., Koch,C.M., Dunham,I., Bieda,M., Xu,X., Farnham,P.J., Kapranov,P., Nix,D.A., Gingeras,T.R., Zhang,X., Holster,H., Jiang,N., Green,R.D., Song,J.S., McCuine,S.A., Anton,E., Nguyen,L., Trinklein,N.D., Ye,Z., Ching,K., Hawkins,D., Ren,B., Scacheri,P.C., Rozowsky,J., Karpikov,A., Euskirchen,G., Weissman,S., Gerstein,M., Snyder,M., Yang,A., Moqtaderi,Z., Hirsch,H., Shulha,H.P., Fu,Y., Weng,Z., Struhl,K., Myers,R.M., Lieb,J.D., and Liu,X.S. (2008). Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res. 18, 393-403.

Liu,X., Wang,L., Zhao,K., Thompson,P.R., Hwang,Y., Marmorstein,R., and Cole,P.A. (2008). The structural basis of protein acetylation by the p300/CBP transcriptional coactivator. Nature 451, 846-850.

Mardis,E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133-141.

Oleksyk,T.K., Zhao,K., De,L., V, Gilbert,D.A., O'Brien,S.J., and Smith,M.W. (2008). Identifying selected regions from heterozygosity and divergence using a light-coverage genomic dataset from two human populations. PLoS. ONE. 3, e1712.

Roh,T.Y. and Zhao,K. (2008). High-resolution, genome-wide mapping of chromatin modifications by GMAT. Methods Mol. Biol. 387, 95-108.

Schones,D.E., Cui,K., Cuddapah,S., Roh,T.Y., Barski,A., Wang,Z., Wei,G., and Zhao,K. (2008). Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887-898.

Schones,D.E. and Zhao,K. (2008). Genome-wide approaches to studying chromatin modifications. Nat Rev. Genet. 9, 179-191.

Wang,Z., Zang,C., Rosenfeld,J.A., Schones,D.E., Barski,A., Cuddapah,S., Cui,K., Roh,T.Y., Peng,W., Zhang,M.Q., and Zhao,K. (2008). Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 40, 897-903.

Wold,B. and Myers,R.M. (2008). Sequence census methods for functional genomics. Nat. Methods 5, 19-21.

Zhao,X., Ruan,Y., and Wei,C.L. (2008). Tackling the epigenome in the pluripotent stem cells. J. Genet. Genomics 35, 403-412.

Barski,A. and Zhao,K. (2009). Genomic location analysis by ChIP-Seq. J. Cell Biochem.

Cui,K., Zang,C., Roh,T.Y., Schones,D.E., Childs,R.W., Peng,W., and Zhao,K. (2009). Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4, 80-93.

Ho,L., Jothi,R., Ronan,J.L., Cui,K., Zhao,K., and Crabtree,G.R. (2009). An embryonic stem cell chromatin remodeling complex, esBAF, is an essential component of the core pluripotency transcriptional network. Proc. Natl. Acad. Sci U. S. A 106, 5187-5191.

Milne,T.A., Zhao,K., and Hess,J.L. (2009). Chromatin Immunoprecipitation (ChIP) for Analysis of Histone Modifications and Chromatin-Associated Proteins. Methods Mol. Biol. 538, 1-15.

Rosenfeld,J.A., Wang,Z., Schones,D.E., Zhao,K., DeSalle,R., and Zhang,M.Q. (2009). Determination of enriched histone modifications in non-genic portions of the human genome. BMC. Genomics 10, 143.

5

Wang,Z., Schones,D.E., and Zhao,K. (2009). Characterization of human epigenomes. Curr. Opin. Genet. Dev.

Wei,G., Wei,L., Zhu,J., Zang,C., Hu-Li,J., Yao,Z., Cui,K., Kanno,Y., Roh,T.Y., Watford,W.T., Schones,D.E., Peng,W., Sun,H.W., Paul,W.E., O'Shea,J.J., and Zhao,K. (2009). Global mapping of H3K4me3 and H3K27me3 reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells. Immunity. 30, 155-167.