deficiency of microrna mir-34a expands cell fate potential ... · viral matrix (ma), capsid (ca)...

76
Deficiency of microRNA miR-34a expands cell fate potential in pluripotent stem cells by YONG JIN CHOI A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Comparative Biochemistry in the Graduate Division of the University of California, Berkeley Committee in charge: Lin He, Chair Fenyong Liu Sangwei Lu Dirk Hockemeyer Summer 2016

Upload: others

Post on 13-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

Deficiency of microRNA miR-34a expands cell fate potential in pluripotent stem cells by

YONG JIN CHOI

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Comparative Biochemistry

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Lin He, Chair

Fenyong Liu

Sangwei Lu

Dirk Hockemeyer

Summer 2016

Page 2: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

1

Abstract

Deficiency of microRNA miR-34a expands cell fate potential in pluripotent stem cells by

YONG JIN CHOI

Doctor of Philosophy in Comparative Biochemistry

University of California, Berkeley

Professor Lin He, Chair

Embryonic stem cells and induced pluripotent stem cells have pluripotent developmental

potential, efficiently giving rise to all embryonic cell types, but rarely extra-embryonic lineages.

Here, we identify a microRNA miR-34a, whose deficiency in mouse pluripotent stem cells

expands their developmental potential to generate both embryonic and extra-embryonic lineages

in vitro and in vivo. miR-34a-/- pluripotent stem cells with this bidirectional cell fate potential

resemble totipotent 2-cell (2C) blastomeres not only in their cell fate potential, but also in the

key molecular signature, namely a strong induction of the MuERV-L (MERVL) family of

murine endogenous retroviruses (ERVs). miR-34a represses MERVL expression through

transcriptional regulation, at least in part, by repressing the transcription factor GATA-binding

protein 2 (Gata2). Consistently, the miR-34a/Gata2 pathway restricts the acquisition of

bidirectional cell fate potential in pluripotent stem cells. Altogether, our findings provide vital

insights into the complex molecular network that defines and restrict the developmental potential

of pluripotent stem cells.

Page 3: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

i i

Table of Contents Table of Contents ........................................................................................................................... i List of Figures and Tables ............................................................................................................ ii Acknowledgements ...................................................................................................................... iii Chapter 1: Introduction ................................................................................................................1

Pluripotent stem cell ............................................................................................................2 Transposable element ...........................................................................................................3 microRNA ............................................................................................................................4

Chapter 2: Materials and Methods ..............................................................................................6

Chapter 3: Results ........................................................................................................................14

miR-34a-/- pluripotent stem cells exhibit expanded cell fate potential ...............................15 miR-34a-/- pluripotent stem cells exhibit an induction of MERVL ERVs .......................16 MERVL induction in miR-34a-/- pluripotent stem cells is regulated transcriptionally ......17 Gata2 mediates elevated MERVL expression in miR-34a-/- pluripotent stem cells .........18 miR-34a restricts cell fate potential of pluripotent stem cells by directly repressing Gata2 ............................................................................................................................................19

Chapter 4: Conclusions ...............................................................................................................21 References .....................................................................................................................................23

Page 4: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

ii ii

List of Figures and Tables Figure 1: miR-34a-/- pluripotent stem cells exhibit expanded cell fate potential. Figure 2: miR-34a-/- pluripotent stem cells exhibit specific induction of the MERVL ERVs. Figure 3: Gata2 is essential for the MERVL induction in miR-34a-/- pluripotent stem cells. Figure 4: miR-34a restricts cell fate potential of pluripotent stem cells by targeting gata2. Figure S1: miR-34a-/- pluripotent stem cells exhibit an expanded cell fate potential in vitro and in vivo Figure S2: MERVL ERVs are specifically induced in mir-34a-/- pluripotent stem cells. Figure S3: The MERVL induction in miR-34a-/- pluripotent stem cells alters the expression and structure of a subset of MERVL proximal genes. Figure S4: The effect of the length and depth of RNA-seq data on the transcriptional profile characterization of miR-34a-/- iPSCs. Figure S5: Gata2 mediates the MERVL induction in miR-34a-/- pluripotent stem cells. Figure S6: Gata2 is a key target of miR-34a in pluripotent stem cells. Table S1: miR-34a-/- ESCs contribute to both embryonic and extra-embryonic cell lineages in chimeric analyses in vivo Table S2: Expression quantification of all retrotransposon families in wild-type and miR-34a-/-

iPSCs using RNA-seq data Table S3: Expression quantitation of individual MERVL loci and MERVL-related ERV loci in wild-type and miR-34a-/- iPSCs using RNA-seq data Table S4: A summary of genes differentially expressed between wild-type and miR-34a-/- iPSCs using RNA-seq data Table S5: Quantitation of chimeric junction reads between MERVL or MERVL-related loci and proximal protein-coding genes in wild-type and miR-34a-/- iPSCs using RNA-seq data Table S6: The quantitative PCR primers used in this study

Page 5: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

iii iii

Acknowledgements

First and foremost, I would like to express my sincere gratitude to my advisor Professor Lin He for the continuous support of my Ph.D study and related research, for her patience, motivation and immense knowledge. Her aupport and inspiring suggestions have been precious for the development of this thesis.

Besides my advisors, I would also like to thank my committee members: Professor Fenyong Liu, Professor Sangwei Liu and Professor Dirk Hockemeyer for their insightful comments and encouragement, but also for the hard question which incented me to widen research from various perspectives. I thank my fellow lab members for the stimulating discussions. I would never forget all the chats and beautiful moments I shared with them. They were fundamental in supporting me during these stressful and difficult moments. A special thanks goes to Paul (Chao-Po Lin) because without him I would have never completed this wonderful experience in Berkeley. My deepest gratitude goes to my family for all their love and encouragement. For my parents who raised me with a love of science and supported me in all my pursuits. Currently my father fighting cancer after he was diagnosed with cancer of ampulla of vater and my mother helping his fight. What I want him to know is that he isn’t fighting cancer. We are all fighting cancer. I have no doubt that he will fight cancer with the same ferocious determination that he have with other challenges in his life. I know that the great strength he have will prevail. I also would like to thank my sister and brother in law for their unflagging love and unconditional support throughout my studies. And most of all for my loving, supportive, encouraging, and patient my wife and my children whose faithful support during the final stages of this Ph.D is so appreciated.

Page 6: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

1 1

Chapter 1 Introduction

Page 7: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

2 2

Introduction Mouse embryonic stem cells (ESCs) derived from the inner cell mass (ICM) of peri-implantation blastocyst embryos, as well as induced pluripotent stem cells (iPSCs) generated by somatic reprogramming, are classically defined as pluripotent stem cells 1–3. As a population, ESCs and iPSCs efficiently contribute to all embryonic cell types in vitro and in vivo, yet rarely to extra-embryonic cell lineages in placenta and yolk sac 4,5. This restricted developmental potential of mouse pluripotent stem cells contrasts with that of totipotent zygotes and 2C blastomeres, which give rise to both embryonic and extra-embryonic cell lineages during normal development and ultimately generate the entire organism6,7. Differentiated somatic cells can be induced to generate pluripotent stem cells that functionally resemble ESCs, which means iPSCs can differentiate into tissues of all three germ layers, give rise to chimaeric mice, and transmit to the germline8,9. This reprogramming process, rooted in the remarkable cellular plasticity retained during differentiation, can be triggered by exogenous expression of a set of ES-cell specific genes3. Among the best-characterized reprogramming factors are a defined set of transcriptional regulators, Oct4 and Sox2, Klf4 and c-Myc 3,10–14, which constitute the core gene regulatory circuits that coordinately control pluripotency and self-renewal. With the enforced expression of these reprogramming transcription factors, iPSC generation occurs with low efficiency and slow kinetics, reflecting the existence of cellular and molecular barriers as impediments for this process 15. Several studies have revealed considerable mechanistic overlap between somatic cell reprogramming and malignant transformation 16. Many cellular mechanisms that enhance reprogramming, such as increased cell proliferation and cell survival, evasion of DNA damage response and cell immortalization, have long been demonstrated to promote tumorigenesis 17–20; and several potent oncogenes and tumor suppressors are essential regulators for somatic reprogramming 3,15,16. In particular, the inactivation of p53, one of the most important tumor suppressors, significantly enhances iPSC generation 17–19,21–23. As a tumor suppressor, p53’s transcriptional regulation converges onto multiple target genes that collectively mediate its downstream effects 24. Similarly, p53’s role in repressing somatic reprogramming is likely to be mediated through multiple targets as well. To date, the cell-cycle regulator p21 is the only p53 target with a demonstrated role in repressing reprogramming, yet its effect only partially phenocopies that of p53 17,21,25. Thus, additional p53 targets may exist to mediate the effects of p53 on iPSC generation through a yet unidentified mechanism(s). Transposable elements (TE) can be colonized the genomes of all eukaryotes and it can be divided into two main classes accordingly to their mechanism of transposition, retrotransposons and DNA transposons26–29. Retrotransposons, also known as class I transposable elements, replicates through reverse transposition and amplify via a “copy-and-paste” process 30–32. This process involves the transcription of an RNA intermediate by the enzyme machinery of the host cell, the subsequent reversetranscription to cDNA, and the integration into the host genome by the enzymes encoded by the retrotransposon. The class I transposable element is composed of two sub-types, the long terminal repeat (LTR) and the non-LTR retrotransposons30,33. Non-LTR retrotransposons, including the long interspersed nuclear elements (LINE) and short interspersed nuclear elements (SINE), are the most abundant elements in mammalian genomes.

Page 8: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

3 3

Approximately 33% of human genome can be recognized as a non-LTR retrotranspons. LTR retrotransposons, also known as endogenous retroviruses (ERVs), constitute approximately 8 and 10% of the human and mouse genome, respectively and are divided into classe I, class II and class III elements 34,35. Retrotransposition of a subset of ERVs is responsible for up to 10% of all spontaneous mutations in mice and therefore can have detrimental effects on the host fitness36. ERVs are endogenous viral elements that have infected by exogenous retroviruses and then integrated into the host cell. Retroviruses usually infect somatic cells and retroviral genes integrated into genomic DNA are not passed on to host progeny37. However, some types of retrovirus can occasionally infect germ cells, which means these viral elements are maintained as heritable genetic elements in the host species38. When the exogenous retrovirus infect host cell, viral reverse transcriptase make DNA copy of viral genome from RNA viral genome and then the retroviral sequences were integrated into host genome by integrase. The integrated retroviral element is called as a provirus and is recognized as a part of the host genome. The provirus consists of four retroviral genes, gag, pro, pol and env, and two LTRs flanked the 5’ and 3’ region. The gag gene encodes for the core structural proteins of a retrovirus, which include the viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag polyprotein precursor. The pol gene encodes the reverse transcriptase and integrase proteins, which are required for amplification and integration. The env gene is a viral protein that serves to form the viral envelope. However, most of ERVs are defective of retroviral genes due to the self-defense mechanism of hosts. In mice, ERVs are divided into three classes based on the sequence of their pol genes and range greatly in copy number from several to tens of thousands of copies per genome39. They also vary depends on different mouse strains40. First of all, class I ERVs/gammaretroviruses are classified as type C based on virion morphology and constitute up to 0.7% of the mouse genome41. This ERV class members are grouped together based on their similarity to the Moloney Murine Leukemia Virus (MoMLV or MLV), which was identified in leukemic cells of the AKR mutant mouse strain42. Depending on the mouse strain, there are 25 to 70 copies of MLV elements in the genome43. A few members of this class, such as GLN, have the capacity to produce functional virions. Heidmann et al demonstrated that GLN family of highly reiterated ERVs, one copy that encodes retroviral particles prone to infection of mouse cells44. Class II ERVs/betaretroviruses are classified as Type B and D based on their viral particle morphology and comprise of around 3% of the mouse genome45. This class consists of many more members capable of producing functional retroviral particles as compared to class I46,47. Class II ERVs show some sequence similarity to the Mouse Mammary Tumor Virus (MMTV) and Susan et al firstly identified as a factor capable of causing mammary cancers in mice, showing in vertical transmission from mother to offspring via viral particles released into the milk48. Another member of this class includes the well-studied non-infectious family of Intracisternal A Particles (IAPs), which is present in approximately 2000 copies in the mouse genome49,50. Other class II ERVs includes MusD and the closely related Early Transposon (ETn) elements. The LTR sequences of these two families are virtually identical and are present at a combined ~400 copies in the mouse genome40. Class III ERVs, also known as spumavirus-like elements, are the most numerous of all three ERV classes, comprising of about 5.4% of the mouse genome51. These elements consist largely of Murine ERV-L (MERV-L)52. Class III ERVs are amongst the most transcriptionally active and some elements have even been co-opted by the host for use as promoters during specific

Page 9: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

4 4

stages of early embryogenesis53. As retrotransposition is deleterious, numerous pathways have evolved to repress these retroelements54. For example, DNA methylation is required for transcriptional silencing of ERVs in differentiated cells. However, this epigenetic mark is dispensable for silencing of ERV during early stages of mouse embryogenesis and in mouse embryonic stem cells (mESCs). Hutnick et al reported that in demethylated ESCs cultures carrying mutations of DNA methyltransferase I (Dnmt1) show no increased expression of IAPs, which is murine endogenous retroviral repetitive elements, relative to the wild type ESCs and a dramatic increase of IAP mRNA and protein expression was observed upon induction of differentiation through the withdrawal of leukemia-inhibitory factor for 6 or more days suggesting mESCs, in contrast to somatic cells, are a unique stem cell type possessing a DNA methylation-independent IAP repression mechanism55. Interestingly, recent studies reported that histone modification and DNA methylation to function together to repress retrotransposons. Tamaru et al suggested that the H3K9 lysine methyltransferase (KMTase) DIM5 is required for CpG methylation in the filamentous fungus Neurospora crassa56. Lehnertz et al also demonstrated that the related H3K9 KMTase SUV39H1 and SUV39H2 are required for DNA methylation of pericentromeric repeats, a heterochromatic region in mice57. By contrast, DNA methylation is not required for H3K9 methylation of the same region. Thus, H3K9 methylation generally acts upstream of DNA methylation, but the role of this histone modification in silencing of ERVs in mESCs had not been addressed until recently. Interestingly, unlike most differentiated cell types, particular families of ERVs are expressed in cells of the early embryo and placenta58,59. As such elements have a greater chance to amplify and integrate into the germ line, they have the highest copy numbers. Among the most active ERVs are the IAP and MusD/ETn elements60. IAP expression has been observed in several mouse tumors and cell lines and ETn expression in undifferentiated embryonic carcinoma (EC) and embryonic stem cells (ESCs)61. Both families are highly expressed during early embryogenesis, but are silenced as development progresses. Pfaff etl al demonstrated that at the two-cell (2C) embryo stage, murine endogenous retrovirus (MuERVL, also known as ERV4) elements are transiently derepressed and produced 3% of the transcribed messenger RNAs and after 2C stage, MuERVL retroelement expression is silenced, suggesting this foreign sequence has helped to drive cell-fate regulation in early embryogenesis62. Totipotency is a unique feature of mammalian zygotes and early blastomeres, which becomes gradually restricted during preimplantation development 63. Totipotency can be induced by somatic nuclear transfer, suggesting that this transient developmental state can be re-established experimentally 64. Rare populations with expanded cell fate potential have been identified in cultured mouse pluripotent stem cells as a result of genetic alterations, specific culture and derivation conditions, or enrichment with specific molecular markers 65–68. Such mouse pluripotent stem cells resemble 2C-blastomeres, not only differentiating into embryonic and extra-embryonic lineages in functional assays, but also carrying a key molecular signature, namely the strong induction of the MuERV-L (MERVL) family of murine endogenous retroviruses (ERVs). Thus, cultured ESCs/iPSCs retain the cell fate plasticity to acquire features of early blastomeres. microRNAs (miRNAs) are a class of small, regulatory non-coding RNAs that regulate gene expression post-transcriptionally through a combined mechanism of mRNA degradation and

Page 10: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

5 5

translational repression 69–75. These small non-coding RNAs are increasingly recognized as key regulators of cell fate specification in normal development 76 and in pluripotent stem cells 77–80. Here, we have identified a miRNA, miR-34a, whose deficiency in ESCs and iPSCs expands their cell fate potential, giving rise to both embryonic and extra-embryonic lineages in vitro and in vivo. miR-34a-/- ESCs and iPSCs also exhibit a strong and specific induction of MERVL ERVs, a unique molecular hallmark shared by totipotent 2C-blastomeres and reported 2C-like ESCs 66,67,81. Our mechanistic studies demonstrate that miR-34a restricts cell fate potential of pluripotent stem cells in the embryonic linages, and silences MERVL expression in ESCs/iPSCs, at least in part, by directly repressing a transcription factor, Gata2. Taken together, we reveal the functional importance of the miR-34a/Gata2 pathway in regulating cell fate plasticity in pluripotent stem cells.

Page 11: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

6 6

Chapter 2

Materials and Methods

Page 12: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

7 7

Materials and Methods Mouse breeding and genotyping The generation of miR-34a-/- mice was described previously (16). Both wild-type and miR-34a-/-

mice were maintained on an isogenic C57BL/6J background, and housed in a non-barrier animal facility at UC-Berkeley. The following primers were used for genotyping, with parenthetical values indicating the size of the diagnostic PCR product: mir-34a-Common-R, ACTGCTGTACCCTGCTGCTT, with mir-34a-WT-F, GTACCCCGACATGCAAACTT (wild-type band, 400 bp), or mir-34a-KO-F, GCAGGACCACTGGATCATTT (knockout band, 263 bp) (16). NCr-nu/nu female athymic mice used for teratoma generation were purchased from Taconic (Taconic, Cat. # NCRNU). All the mouse work was done with approval of University of California, Berkeley’s Animal Care and Use Committee. University of California, Berkeley’s assurance number is A3084-01, and is on file at the National Institutes of Health Office of Laboratory Animal Welfare. Derivation of embryonic stem cells (ESCs) Mouse ESCs were isolated based on published protocols with slight modifications. Uteri containing E3.5 wild-type or miR-34a-/- embryos were isolated from timed pregnant females, and put in Knockout DMEM (Life Technologies, Cat. # 10829-018) supplemented with 10mM HEPES (Life Technologies, Cat. # 15630-080). E3.5 blastocysts were flushed with 1ml syringes with 18G needles and individually transferred to a 12-well plate with irradiated MEF (mouse embryonic fibroblasts) feeders in 1 ml N2B27 medium containing 100 U/ml LIF (EMD Millipore, Cat. # ESG1107), 1 µM PD0325901 (Sigma, Cat. # PZ0162) and 3 µM CHIR99021 (EMD Millipore, Cat. # 361559). After 5 days of incubation, embryo outgrowth was separated from the trophectoderm (TE) and picked up by a 10 µl pipette and transferred to 20 µl Accutase (Life Technologies, Cat. # A11105-01) and incubated at 37oC for 20 min to dissociate cells. Dissociated cells were then cultured on irradiated MEF feeder cells with N2B27 medium containing LIF and two inhibitors for one passage. Subsequently, ESCs were passaged with 0.25% Trypsin-EDTA and maintained in regular mouse ES medium. ESCs were also derived in regular ES medium (see below) to test for variation among derivation protocols. Generation of induced pluripotent stem cells (iPSCs) Wild-type and miR-34a-/- iPSCs were generated from primary mouse embryonic fibroblasts (MEFs) by somatic reprogramming (3). Primary MEFs were isolated from littermate-controlled E13.5 wild-type and miR-34a-/- embryos, infected with pMX retroviral vectors that encode mouse Oct4, Sox2 and Klf4 (Addgene, Cat. # 13366, 13367 and 13370), and cultured on irradiated MEF feeder in ES medium containing Knockout DMEM (Life Technologies, Cat. # 10829-018), 15% ES-grade FBS (Omega scientific, Cat. # FB-01), 2mM L-glutamine (Life Technologies, Cat. # 25030-164), 1x10-4M MEM non-essential amino acids (Life Technologies, Cat. # 11140-076), 1x10-4 M 2-mercaptoethanol (Sigma, Cat. # M3148) and 1X Penicillin-Streptomycin (Life Technologies, Cat. # 15140-122). Subsequently, single iPSC-like colonies were individually picked and expanded on irradiated MEF feeders to establish a stable line. At least three independent iPSC lines were generated for each genotype.

Page 13: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

8 8

RNA-seq data analysis RNA-seq reads were mapped to the GRCm38 (mm10) reference genome with TopHat to quantify gene and retrotransposon expression levels (36). MERVL-gene junctions were defined as those junctions, identified by TopHat, which overlap on one side with an annotated Ensembl gene (including protein coding genes, long ncRNAs, pseudogenes and antisense transcripts) and on the other side with an annotated element of MERVL (including both complete, truncated and solo LTR copies) (Table S4). We used EdgeR to test for differential expression between miR-34a-/- and wild-type iPSCs (37). We defined a gene as differentially expressed (DE) if it had an absolute log2-fold-change greater or equal than 2 and a False Discover Rate of 0.05 (Supplementary Text; Table S2-S5).

We performed the analysis using three datasets: HiSeq2000 100bp paired-end data (100PE); NextSeq500 150bp paired-end data (150PE); and a combined dataset obtained by pooling the reads of the two (combined). This allowed us to quantify the effect of read length and sequencing depth on our analyses: the results are highly reproducible across the three datasets (Fig. S4; Table S2-S5).

Real-time PCR analysis for gene expression RNA was isolated by Trizol extraction following manufacturer’s instruction (Life Technologies, Cat. # 15596). cDNA was reverse-transcribed using iScript Advanced Reverse-Transcriptase (Bio-Rad, Cat. # 1725037). For single colony analysis, cDNA was prepared using a Single Cell-to-Ct qRT-PCR kit (Life Technologies, Cat. # 4458236). All real-time qPCR analyses were performed using SYBR FAST qPCR Master Mix (Kapa Biosystems, Cat. # KK4604). All primers used are listed in Table S6. To detect MERVL expression, four pairs of primers were designed to amplify specific regions of MERVL (Fig. 3C) and yielded similar results (data now shown). One pair of primers detecting the MERVL pol region was used for all other MERVL real-time PCR analyses.

Embryoid body (EB) differentiation

For EB differentiation, ESCs or iPSCs were plated in 10cm petri dish (150,000 cells/ml) in ES cell medium without LIF and gently cultured on a rotator after removal of feeder cells. Samples were collected at day 0, 3, 6 and 9 post differentiation for real-time PCR analyses and for immunofluorescence staining.

Generation of chimeric blastocyst embryos and chimeric mice from ESCs/iPSCs To generate chimeric blastocysts by morula aggregation, we followed the method described by Nagy et. al. with minor modifications. One-cell stage, C57B6/J wild-type zygotes were collected at 0.5 day postcoitum (dpc), cultured in EmbryoMax KSOM Medium (Millipore, Cat. # MR-121-D) for 48h and only well-developed morula embryos were selected for aggregation. The ESCs or iPSCs were combined with morula embryos by sandwich method after removal of zona pellucida by acid Tyrode’s solution (Sigma, Cat. # T1788) and then cultured overnight. To generate chimeric blastocysts by microinjection, four ESCs or one ESC of the desired genotype were injected into E2.5 C57Bl/6N wild-type recipient morula embryos (16-32 cell

Page 14: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

9 9

stage) and then cultured in vitro overnight to obtain the chimeric blastocysts. To generate chimeric mice by microinjection, 10-15 ESCs of the desired genotype were injected into E3.5 recipient blastocyst embryos before implanted into pseudopregnant females. Chimeric embryos were collected at E9.5 and E12.5 and subjected to IF staining. In both experiments, the recipient embryos for microinjection were generated from 4 week old, super-ovulated C57Bl/6N female mice. These female mice were first treated with intraperitoneal injection with 5 IU of PMSG (Sigma, pregnant mare serum gonadotropin, Cat. # G4877-1000U) and 5 IU of HCG (Sigma, human chorionic gonadotropin, Cat. # CG5-1VL) and then mated with male mice from the same strain. Subsequently, one-cell embryos were collected from those carrying vaginal plugs 24 hour after intraperitoneal injection of HCG. Collected embryos were cultured for 2 or 3 days in vitro in EmbryoMax KSOM Medium (Millipore, Cat. # MR-121-D); and properly developed morulae or blastocysts were selected for microinjection. Preimplantation embryo expression analysis Superovulated wild-type and miR-34a-/- females were mated with males of matching genotype to generate 2C, 8C and blastocyst embryos at E1.5, E2.5, and E3.5, respectively. Oocytes were collected at E0.5 from unmated, superovulated females. 2C and 8C embryos were recovered by oviduct flushing with DMEM (Thermo Fisher, Cat. #11995-040), while blastocysts were recovered by uterine flushing. Embryos were washed in PBS and subject to real-time PCR analyses using a Single Cell-to-Ct qRT-PCR kit (Life Technologies, Cat. # 4458236). All primers used are listed in Table S6.

Generation of teratomas from pluripotent stem cells and histological analyses 1x106 of WT or miR-34a-/- iPSCs or ESCs were injected into the dorsal flanks of 6-7 week old immune-deficient NCr-nu/nu female mice (Taconic, Cat# NCRNU). After 4-5 weeks, resulting teratomas were collected by surgical removal, fixed overnight in 10% buffered formalin (Fisher Scientific, Cat. # SF100-4), dehydrated in a graded series of ethanol solutions, embedded in Paraplast X-TRA paraffin (Fisher Scientific, Cat. # 23-021-401), sectioned at 6 µm thickness, mounted on glass slides, and stained with hematoxylin and eosin (H&E) using standard procedures (16). Additionally, the paraffin sections were subjected to IF and immunohistochemistry.

Immunofluorescence (IF) and Immunohistochemistry (IHC) For IF staining of differentiated EBs or ESCs/iPSCs, samples were fixed with 4% paraformaldehyde for 10 min at room temperature and incubated with blocking solution (0.1%Triton X-100 and 5% normal goat serum in PBS) for 1 hour at room temperature. To detect the expression of pluripotent or lineage markers, EBs/cells were incubated overnight at 4°C with antibodies against MERVL-Gag (1:2000, a gift from T. Heidmann laboratory), Oct4 (1:100, Santa Cruz Biotechnology, Cat. # sc-5279), Gata4 (1:100, Santa Cruz Biotechnology, Cat. # sc-9053) or Cdx2 (1:100, Abcam, Cat. # ab76541 or #157524), followed by staining with goat anti-rabbit IgG (H+L) secondary antibody, Alexa Fluor 594-conjugated secondary antibody (1:500, Life Technologies, Cat. # A11037) for 1 hour at room temperature. EBs/cells were then stained with DAPI (300 nM, Sigma, Cat. # D9564) and subjected to imaging analyses using Laser scanning confocal microscopy (Zeiss LSM710) and Zeiss Observer.A1 microscope.

Page 15: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

10 10

For IF staining of chimeric blastocysts, samples were fixed in 4% paraformaldehyde (PFA) for 20 min at room temperature and then transferred to phosphate-buffered saline (PBS) containing 0.1% bovine serum albumin (BSA). Embryos were permeabilized using 0.1% Triton X-100 in PBS containing 0.1% BSA for 5 min, and then blocked for 1 hour at room temperature in blocking solution (10% goat serum diluted in PBS/0.1% BSA). Blastocysts were then incubated with anti-Cdx2 primary antibody (1:100, Abcam, Cat. # ab76541) in blocking solution at 4°C overnight and stained with goat anti-rabbit, Alexa Fluor 594-conjugated secondary antibody (1:300, Life Technologies, Cat. # A11037) in blocking solution for 1 hour at room temperature. Blastocysts were then placed individually into chamber slides (Lab-Tek, Cat. # 155411) in 400 ml PBS/0.1%BSA solution. Images were taken using an Olympus Revolution XD spinning disk confocal microscope. For IF staining of chimeric mouse embryos, including placentas and yolk sacs were fixed with 4% PFA for 2 hours, incubated in 30% sucrose for overnight in 4°C, embedded in Tissue-Tek O.C.T. compound (VWR, Cat. #25608-930), and cryo-sectioned at 8 mm. These sections were either directly visualized for GFP expression or subjected to IF using mouse anti-GFP (1:100, Abcam, Cat. # ab38689), rabbit anti-Tpbpa (1:200, Abcam, Cat. # ab104401), or rabbit anti-MTP1 (1:150, Alpha Diagnostic, Cat. # MTP11-A) primary antibodies. Trophoblast giant cells were identified based on their location in the placenta and the morphology of enlarged nuclei. Spongiotrophoblasts were identified based on the staining of the molecular marker, Tpbpa. The bilaminar structure of the yolk sac is identified by DAPI staining, and the visceral endoderm part is identified by its columnar, epithelial morphology. The GFP signals in three embryonic germ layers of chimeric mouse embryos (E12.5) and yolk sacs were directly visualized without staining. For immunohistochemistry (IHC) analyses on teratomas or placentas of chimeric embryos , 5 mm paraffin sections were deparaffinized, dehydrated, and subjected to heat-induced antigen retrieval in a pressure cooker using Target Retrieval solution (DAKO, Cat. # S1699). Slides were incubated for 10 minutes with 3% H2O2, blocked for 3 hours with PBS containing 5% BSA and 0.3% Triton X-100, and incubated with primary antibodies against PL-1 (1:75, Santa Cruz Biotechnology, Cat. # sc-34713) or GFP (1: 100, Abcam, Cat. # ab38689) overnight in PBS buffer containing 1% BSA and 0.3% Triton X-100. Slides were then incubated with horseradish peroxidase (HRP)-conjugated secondary antibodies for 2 hours at room temperature, and then subjected to 3,3’-Diaminobenzidine (DAB) staining (Life Technologies, Cat. # 00-2014) followed by a counterstain with Mayer's hematoxylin (Electron Microscopy Sciences, Cat. # 26503-04). The sinusoidal trophoblast giant cells (s-TGCs) were identified by their enlarged nuclei and adjacent location in the maternal blood sinusoid space. Chromatin immunoprecipitation (ChIP) For each ChIP experiment, 106 ESCs or iPSCs were fixed with 1% formaldehyde (VWR, Cat. # 5016-02) to extract chromatin for immunoprecipitation. Nuclei were isolated by Farnham lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP-40 with Protease Inhibitor Cocktail Tablet (Roche, Cat. # 11836153001) and lysed in nuclear lysis buffer (50 mM Tris pH 8.0, 10 mM EDTA, and 1% SDS with the protease inhibitor cocktail). Chromatin was fragmented by a Covaris S220 Focused ultrasonicator (peak power 175, duty factor 20, cycles/burst 200, duration

Page 16: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

11 11

30s, with 35 treatments) and diluted in RIPA buffer (10 mM Tris pH 7.6, 1 mM EDTA, 0.1% sodium deoxycholate, and 1% Triton X-100 with protease inhibitor cocktail) in a 1:9 ratio. The pull-down was performed at 4°C overnight using 40 ml Dynabeads protein A (Life Technologies, Cat. # 10001D) and 2 mg antibodies against Gata2 (Santa Cruz, Cat. # sc-9008), H3K4Me (Abcam, Cat. # ab8895), H3K4Me3 (EMD Millipore, Cat. # 17-614), H3K9Me2 (Abcam, Cat. # ab1220), and H3 (Abcam, Cat. #1791). Washes were performed twice with the low-salt wash buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.0, 150 mM NaCl), three times with the high-salt wash buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.0, 500 mM NaCl), and four times with the LiCl wash buffer (0.25 M LiCl, 1% IGEPAL CA630, 1% sodium deoxycholate 1 mM EDTA, 10 mM Tris pH 8.0). Chromatin-immunoprecipitated DNA was analyzed by qPCR using SYBR FAST qPCR Master mix (KAPA biosystems, Cat. # KK4602) and a 7900HT Fast Real-Time PCR machine (Applied Biosystems). Percent input of immunoprecipitated samples was calculated according to the real-time PCR results of serially-diluted lysates. The enrichment of each individual locus is calculated as (the percentage of input of modified histone H3)/(the percentage of input of histone H3). The following primers were used for qPCR analysis: mervl forward, 5’- CTTCCATTCACAGCTGCGAC-3’; mervl reverse 5’-CTAGAACCACTCCTGGTACC-3’; iap forward; 5’-GCTCCTGAAGATGTAAGCAATAAAG-3’; iap reverse, 5’-CTTCCTTGCGCCAGTCCCGAG-3’; mmervk10c forward, 5’-TTCGCCTCTGCAATCAAGCTCTC; mmervk10c reverse, 5’-TCGCTCRTGCCTGAAGATGTTTC-3’; tcstv1 forward, 5’-ATCTACTTGGGGTGCCTGGT-3’; tcstv1 reverse, 5’-GAAGACCAGCTGAACCATCC-3’. Luciferase Assays For MERVL-luciferase reporter assays, we used pGL3 luciferase reporter vectors (Promega, Cat. # E1751) harboring MERVL1-1000, MERVL1-493, and MERVL500-1000 fragments, as described by Macfarlan et. al. (25). The MERVL125-375-Luc reporter was constructed by truncating the MERVL1-493 fragment using a QuikChange Site-Directed Mutagenesis Kit (Strategene, Cat. # 200518). The two fully conserved Gata2 binding sites (BS1 and BS3) were ablated in the MERVL125-375-Luc reporter construct, either individually or in combination, using a QuikChange Site-Directed Mutagenesis Kit. The following primers were designed for mutagenesis: 1-375 Forward, TGGACTTCCATTCACCTCGAGATCTGCGATCTAAGTAAGC; 1-375 Reverse, ATCGCAGATCTCGAGGTGAATGGAAGTCCAAGGATCTAGC; 125-375 Forward, CTTACGCGTGCTAGCGATCTTGAGCCATAGTGGCTATGGA; 125-375 Reverse, CTATGGCTCAAGATCGCTAGCACGCGTAAGAGCTCGGTAC; Gata2ΔBS1 Forward, TCTCCGAGTTTAAGGAACACACCTTTGGGCTACGCCTTTC; Gata2ΔBS1 Reverse, AATCCCAGATGAAAGGCGTAGCCCAAAGGTGTGTTCCTTA; Gata2ΔBS3 Forward, TTAAAGGTGTGGTGGAACACACCTTTGGGCTACACCTTCT; and Gata2ΔBS3 Reverse, TGTCTCCAGCAGAAGGTGTAGCCCAAAGGTGTGTTCCACC. MERVL-Luc reporters and control Renilla luciferase reporter pRL-TK (Promega, Cat. # E2241) were co-transfected (600 ng and 150 ng per well of a 12-well plate, respectively) using Lipofectamine 2000 reagent (Life Technologies, Cat. # 11668027) into ESCs. Transfection complexes containing the reporter constructs were prepared in Opti-MEM Reduced-Serum Medium (Life Technologies, Cat. # 31985062) according to manufacturer’s instructions. After trypsinization with 0.25% Trypsin + EDTA (Life Technologies, Cat. # 25200-056), 100,000 cells were resuspended in ES media lacking Pen Strep, incubated with transfection complexes for 10 minutes at 37° C, and then

Page 17: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

12 12

transferred to one well of a 12-well plate containing feeders. After 48 hours, transfected ESCs were trypsinized, plated onto gelatin-coated plates for 1 hour to remove feeders, and then assayed for luciferase activity by Dual-Luciferase® Reporter Assay System (Promega, Cat. # E1910) using a Glomax 20/20 Luminometer (Promega). For gata2 3’UTR luciferase assays, a fragment that includes the 3’ portion of the ORF and the entire Gata2 3’UTR was amplified by PCR. The fragment was then cloned into a psiCheck-2 vector (Promega, Cat. # C8021) to generate the gata2 3’UTR-Luc reporter.miR-34a binding site mutants were generated using a QuikChange Site-Directed Mutagenesis Kit. The following primers were used: Gata2 3’UTR F XhoI, CTCGAGAGTCTCTCTTTTGGCCACCC; Gata2 3’UTR R NotI, GCGGCCGCCAAGGCCACCTGACAGCTTA; Gata2 3’UTRΔ34aBS F, CCGTCCAGCATGGTGATGGGCTAGGCAAGCCTCCCACTGG; Gata2 3’UTRΔ34aBS R, GCTTGCCTAGCCCATCACCATGCTGGACGGGTGGGGGTGG; Gata2 3’UTRΔ34aBS2 F AGAGACCCACTTCCTGCCTAGCCTGGCCGAAGCCACCTCT; Gata2 3’UTRΔ34aBS2 R, TCGGCCAGGCTAGGCAGGAAGTGGGTCTCTTGGGATGGGC; Gata2 3’UTRΔ34aBS3 F, CTTCTTTGGGACCTCCCAGTCAGGGCTCTCGGGGGCAGAC; Gata2 3’UTRΔ34aBS3 R, GAGAGCCCTGACTGGGAGGTCCCAAAGAAGGACCCCAAGA. The gata2 3’UTR-Luc reporters (2 ng per well of 12-well plate) were co-transfected with 100 nM siGFP or mature miR-34a RNA mimics using Lipofectamine 2000 reagent (Life Technologies, Cat. #11668027) into a feeder-free mouse ESC line (39). After 48 hours, cells were lysed and assayed for luciferase activity by Dual-Luciferase® Reporter Assay System (Promega, Cat. # E1910) using a Glomax 20/20 Luminometer (Promega). Transfection and retrovirus/lentivirus transduction To overexpress miR-34a in wild-type and miR-34a-/- iPSCs, cells were infected with MSCV (murine stem cell virus) retrovirus that encoded a LTR-miR-34a and a PGK-puromycin-IRES-GFP cassette (40). MSCV and MSCV-miR-34a transduced iPSCs were selected with 3 mg/ml puromycin for two days before collected for real-time PCR analyses and western blotting. The ESCs and iPSCs used for microinjection were labeled by green fluorescence protein (GFP) using the PiggyBac-GFP plasmid. The PiggyBac vector contains an EF1a-driven GFP expression cassette and an ubiquitin-puromycin selection marker. The PiggyBac-GFP plasmid was mixed with the PiggyBac transposase plasmid in a 1:1 ratio (41), and subsequently transfected into ESCs or iPSCs using Lipofectamine 2000 (Life Technologies, Cat. # 12566014). Cells were selected with 3 mg/ml puromycin for two days and cultured in puromycin-free ES medium for following analyses. To knock down gata2 by RNAi, two Gata2 shRNAs were cloned into pLKO.1 lentiviral vector (Addgene, #10878) using the following oligos (shgata2#1 sense: 5’-CCGGGAGGTGGATGTCTTCTTCAACCACTCGAGTGGTTGAAGAAGACATCCACCTCTTTTTG-3’; shGata2#1 antisense: 5’-AATTCAAAAAGAGGTGGATGTCTTCTTCAACCACTCGAGTGGTTGAAGAAGACATCCACCTC-3’; shGata2#2 sense: 5’-CCGGGGACGAGGTGGATGTCTTCTTCAACTCGAGTTGAAGAAGACATCCACCTCGTCCTTTTTG-3’; shGata2#2 antisense: 5’-AATTCAAAAAGGACGAGGTGGATGTCTTCTTCAACTCGAGTTGAAGAAGACATCCA

Page 18: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

13 13

CCTCGTCC-3’) (42); and the corresponding lentiviruses were produced by co-transfecting pLKO.1 shRNA vectors with pMD2.G and psPAX2 to HEK293T cells. After infection, iPSCs were selected in 3 mg/ml puromycin for two days and expanded for in vitro and in vivo analyses. Western blotting For ESC or iPSC collection, trypsinized cells were plated on a gelatin-coated plate for 1 hour to remove feeders. Cells separated from the feeders were then lysed in Laemmli sample buffer (60 mM Tris-Cl pH 6.8, 2% SDS, 100 mM DTT, 10% glycerol, 0.02% bromophenol blue) and subjected to western analyses. Antibodies against mouse Gata2 (Santa Cruz Biotechnology, Cat # CG2-96) was used at 1:500 dilution, and α-tubulin (Sigma, clone B-5-1-2) was used at a 1:4,000 dilution as a loading control. The quantitation of all western analyses was carried out with ImageJ (NIH). Bisulfite sequencing Genomic DNAs were purified by the standard phenol/chloroform method. 2 mg DNA was subjected to bisulfite conversion and subsequent purification using the EZ DNA Methylation-Gold Kit (Zymo Research, Cat. # D5005). Bisulfite-treated DNAs were amplified using Jumpstart REDTaq Readymix (Sigma, Cat. # P1107) with the following primers: MERVL forward, ATATGAATAAAGTGGTTATGGTGGT; MERVL reverse, AATTCCTAAACCCATAAATCCTAAC; IAP forward, TTGATAGTTGTGTTTTAAGTGG; IAP reverse, AAAACACCACAAACCAAAATC. The amplified DNA fragments were cloned to pGEM-T Easy vector (Promega, Cat. # A1360) for sequencing. The methylation patterns were analyzed by QUMA (Quantification tool for methylation analysis, http://quma.cdb.riken.jp).

Page 19: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

14 14

Chapter 3

Results

Page 20: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

15 15

miR-34a-/- pluripotent stem cells exhibit expanded cell fate potential microRNAs (miRNAs) are a class of small, regulatory non-coding RNAs that regulate gene expression post-transcriptionally through a combined mechanism of mRNA degradation and translational repression 69,71,75. These small non-coding RNAs are increasingly recognized as key regulators of cell fate specification in normal development and in pluripotent stem cells 77,78.

Initially identified as bona fide p53 transcriptional targets in tumor suppression, the miR-34 miRNAs (miR-34a, miR-34b and miR-34c), particularly miR-34a, have been previously characterized as a key barrier for somatic reprogramming 82. miR-34a deficiency significantly enhances the efficiency of iPSC generation 82, producing iPSCs with normal self-renewal and pluripotency (Fig. S1A, S1B and S1C). Surprisingly however, teratomas generated from miR-34a-/- iPSCs, but not wild-type iPSCs, contained cellular features reminiscent of trophoblast giant cells in the placenta, characterized by PL-1 (placental lactogen 1) expression, large cell volume, enlarged nuclei, and close proximity to internal hemorrhages (Fig. 1A). In ESCs, miR-34a constitutes the majority of expressed miR-34 miRNAs (Fig. S1D). Similarly, miR-34a-/- ESC derived teratomas, but not the wild-type controls, also contained areas reminiscent of extra-embryonic placental cell lineages (Fig.1A) and exhibited an induction of trophectoderm (TE) markers (Fig. S1E), including cdx2, elf5, psx1, fgfr2, egfr and mdfi 83,84. While we did not identify any areas morphologically resembling the visceral endoderm of the yolk sac, we detected a strong induction of primitive endoderm (PE) markers (gata4, gata6 and sox17) in miR-34a-/- teratomas, but not in wild-type controls (Fig. S1E). These findings suggest that miR-34a-/- pluripotent stem cells likely differentiate towards both embryonic and extra-embryonic cell lineages during teratoma formation.

The expanded potential of miR-34a-/- ESCs/iPSCs is also evident upon embryoid body (EB) differentiation (Fig. 1B and 1C). While markers from all three germ layers were similarly induced in wild-type and miR-34a-/- EBs, significant upregulation of TE markers (cdx2, elf5, esx1, tfap2c and gata3) 83,85,86 was observed only in miR-34a-/- EBs (Fig. 1B, 1C and S1F). Immunofluorescence (IF) staining confirmed that a significant percentage of miR-34a-/- EBs was Cdx2 positive (Fig. 1B), and these Cdx2-positive cells preferably localized to the periphery (Fig. 1B, S1G and S1H). Additionally, the extra-embryonic endoderm marker gata4 and pdgfra, as well as the trophoblast lineage marker mash2 (ascl2) and pl1 (prl3d1), were also induced in miR-34a-/- EBs (Fig. S1F). Thus, upon EB differentiation, miR-34a-/- ESCs exhibited expanded cell fate potential, generating cells with molecular features characteristic of both embryonic and extra-embryonic lineages.

To define the cell fate potential of miR-34a-/- pluripotent stem cells in normal development, we traced their lineage in chimeric blastocysts following microinjection or aggregation with recipient morulae. Initially, four GFP-labeled wild-type or miR-34a-/- ESCs were injected into each C57BL/6N recipient morula to generate chimeric blastocysts (Fig. 1D). While wild-type ESCs exclusively gave rise to cells localized to the ICM (Fig. 1D; Table S1), miR-34a-/- ESC

Page 21: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

16 16

progenies localized to both ICM and TE in ~60% of chimeric blastocysts (Fig. 1D; Table S1). This expanded cell fate potential is unlikely due to extra-embryonic contamination during miR-34a-/- ESC derivation, as miR-34a-/- iPSCs derived from mouse embryonic fibroblasts (MEFs) phenocopied miR-34a-/- ESCs in their developmental potential. When aggregated with recipient C57BL/6J morulae, miR-34a-/- ESCs and miR-34a-/- iPSCs colonize both ICM and TE of chimeric blastocysts, while passage- and littermate-controlled wild-type ESCs and iPSCs exclusively colonized the ICM (Fig. S1I).

The expanded cell fate potential of miR-34a-/- ESCs in chimeric blastocysts could be due to the presence of cells with bidirectional potential; alternatively, miR-34a-/- ESCs could contain a heterogeneous population of cells that preferentially differentiate into embryonic or extra-embryonic cell lineages. To distinguish between these two possibilities, we injected single, GFP-labeled miR-34a-/- ESCs into each recipient morula to generate chimeric blastocysts (Fig. 1E). In two independent miR-34a-/- ESC lines tested, single miR-34a-/- ESCs colonized both ICM and TE in 33% and 38% of chimeric blastocysts (n=13/40 and 8/21) (Fig. 1E; Table S1) respectively, suggesting that a significant portion of miR-34a-/- ESCs exhibit a bidirectional developmental potential at the single-cell level.

We then generated chimeric embryos by microinjecting 10-15 GFP-labeled wild-type or miR-34a-/- ESCs into C57BL/6N recipient blastocysts. While wild-type ESCs contributed exclusively to lineages of the three embryonic germ layers, miR-34a-/- ESCs contributed to both embryonic and extra-embryonic cell lineages in E9.5, E12.5 and E14.5 chimeric embryos (Fig. 1F, 1G and S1J; Table S1). In particular, we observed clusters of GFP-positive, miR-34a-/- ESC progenies in the visceral endoderm of the yolk sac, as well as in multiple extra-embryonic trophoblast lineages of the placenta (trophoblast giant cells, spongiotrophoblasts, syncytiotrophoblasts (STBs) and sinusoidal trophoblast giant cells (s-TGCs), Fig. 1F, 1G; Table S1). In these chimeric embryos, the number of GFP-positive cells in extra-embryonic cell lineages greatly surpasses the number of miR-34a-/- ESCs injected (Fig. 1F and 1G), suggesting that injected miR-34a-/- ESCs had undergone substantial proliferation before committing to multiple terminally differentiated extra-embryonic lineages.

miR-34a-/- pluripotent stem cells exhibit an induction of MERVL ERVs. To investigate the molecular basis for the bidirectional potential of miR-34a-/- pluripotent stem cells, we compared the transcriptomes of wild-type and miR-34a-/- iPSCs using RNA-sequencing (RNA-seq). We compared the abundance of all annotated transcripts between wild-type and miR-34a-/- iPSCs, including protein-coding genes, long non-coding RNAs (ncRNAs), pseudogenes, antisense transcripts, and retrotransposons using 100 bp paired end RNA-seq data (Fig. 2A). Given the repetitive nature of retrotransposons, we quantified retrotransposon expression at the family level using both uniquely and non-uniquely mapped reads (Supplemental Information S1). Surprisingly, the most highly expressed and differentially regulated transcript in miR-34a-/- iPSCs was transcribed from the MERVL family of ERVs (Fig. 2A and S2A), which were also highly induced in totipotent 2C blastomeres and reported ESCs with expanded potential 58,66,67,87. ERV induction in miR-34a-/- ESCs/iPSCs was largely specific

Page 22: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

17 17

to the MERVL family (Fig. 2A, 2B and S2A; Table S2). The majority of differentially expressed retrotransposons in miR-34a-/- iPSCs belonged to the canonical MERVL family of ERVs (a class-III ERV) (Fig. S2A; Table S2); a small fraction of differentially expressed loci belonged to the MT2A, MT2B, MT2B1, and MT2B2 ERV families that are highly related to the canonical MERVL solo LTR, MT2_Mm (Fig. S2A; Table S3).

Consistent with our RNA-seq results, we invariably detected a significant increase of MERVL expression in miR-34a-/- iPSCs and ESCs, using real-time PCR primer pairs designed from multiple highly conserved MERVL regions (Fig. 2B and 2C; data not shown). Interestingly, while MERVL induction in miR-34a-/- iPSCs persisted for more than 27 passages (Fig. S2B), MERVL was only induced in early passages of miR-34a-/- ESCs and became completely silenced around passage 12 (Fig. S2B). It is conceivable that MERVL expression in miR-34a-/- ESCs triggers additional mechanisms to re-establish their silencing. The expanded cell fate potential of miR-34a-/- ESCs was highly correlated with the strong MERVL induction, as the late passage (passage 17) miR-34a-/- ESCs lost both MERVL induction and the bidirectional potential (Fig. S2B and S2C).

The MERVL ERVs have been retained throughout mammalian evolution, with independent expansion in the murine and primate genomes 52. There are 2502 loci in the C57B6/J mouse genome, ~26% of which encode elements with an intact retroviral structure, comprising 5’- and 3’-LTRs flanking the coding sequences for gag, pol, and dUTPase, but lacking env-like open reading frames (ORFs) (Fig. 2C) 52. Another 32% of MERVL loci exhibit truncated retroviral structure, missing one or both LTRs (Fig. 2C). The remaining 41% of MERVL loci have undergone homologous recombination, yielding solo LTRs (MT2_Mm) with varying degrees of sequence degeneration (Fig. 2C). We obtained bioinformatic estimates of locus-specific MERVL expression in wild-type and miR-34a-/- iPSCs using our RNA-seq data (Table S3; Supplemental Information). Notably, definitive evidence for MERVL reactivation in miR-34a-/- iPSCs was observed predominantly for loci harboring MERVLs with a complete retroviral structure, but not for those with truncated structure (Fig. 2D and S2D; Table S3). A fraction of MT2_Mm solo LTRs, along with a few elements from the highly related MERVL solo LTRs (MT2B, MT2B1, MT2B2 and MT2A), also exhibited a similar induction (Fig. S2A; Table S3).

Approximately 300 MERVL loci still encode intact Gag viral protein 67. We observed a significant increase in MERVL-Gag expression and in the percentage of MERVL-Gag-positive cells in miR-34a-/- pluripotent stem cells (Fig. 2E and 2F). Interestingly, miR-34a-/- ESCs and iPSCs were heterogeneous populations, containing ~12% and ~20% MERVL-Gag-positive cells, respectively, in otherwise Oct4-positive colonies (Fig. 2E, 2F, S2E and S2G). Consistent with this observation, a fraction of individual miR-34a-/- iPSC colonies exhibited a significantly greater MERVL induction than the bulk population (Fig. S2F), suggesting that the extent of MERVL induction in individual cells was largely underestimated using the bulk population. In miR-34a-/- ESCs and iPSCs, the expression of MERVL-Gag and Oct4 were mutually exclusive (Fig. 2E, S2E and S2G), suggesting that the MERVL-positive, Oct4 negative miR-34a-/- cells possess a unique state of developmental potency, distinct from that of classic pluripotent stem cells characterized by Oct4 expression 66,67,88.

Page 23: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

18 18

The global protein-coding gene expression profiles of miR-34a-/- iPSCs resemble those of reported bi-potential ESCs in hierarchical clustering (Fig. S3A). Intriguingly, among the most differentially expressed protein-coding genes in miR-34a-/- iPSCs were those proximal to MERVL loci (Fig. 2G). Indeed, differential expression analysis between reported ESCs with bidirectional cell fate potential (lsd1-/-, 2C+, p60 knockdown and p150 knockdown ESCs) and their pluripotent controls revealed the induction of MERVL proximal genes as a key feature (Fig. S3B) 66,67,81. The MERVL derepression in miR-34a-/- iPSCs and ESCs also correlates with the induction of many protein-coding genes that harbor either a proximal upstream MERVL or an intronic MERVL on the same strand (Fig. 2G; Fig. S3C). In many cases, MERVL or related loci, particularly solo LTRs, act as alternative promoters, generating chimeric transcripts of proximal genes that differ in 5’-UTRs (Fig. 2H, S3D and S3E; Table S4 and S5). These MERVL-gene chimeric transcripts can be unambiguously identified by the corresponding splice junctions from the RNA-seq data (Table S5). Using real-time PCR analyses, we validated the induction of multiple MERVL proximal genes in miR-34a-/- ESCs and iPSCs, including tcstv1, tcstv3, zfp352, cml2 and p4ha2 (harboring a proximal upstream MERVL element), as well as abcb5, tmem132c and chit1 (harboring an intronic MERVL element) (Fig. 2H; Fig. S3D and S3E). The induction level of MERVL-gene isoforms varied among individual miR-34a-/- iPSC colonies, and largely correlated with the extent of MERVL induction (Fig. S3F). Consistently, miR-34a overexpression in miR-34a-/- iPSCs not only decreased MERVL expression, but also reduced the level of the MERVL-driven chimeric transcript (Fig. 2I).

We repeated our analysis using 150 bp pair-end RNA-seq data to gain confidence in the accuracy of our retrotransposon mapping (Fig. S4A). The analysis using longer sequence reads confirmed all our observations (Fig. S4A and S4B).

MERVL induction in miR-34a-/- pluripotent stem cells is regulated transcriptionally Given the correlation between MERVL induction and bi-potential pluripotent stem cells (Fig. S3A and S3B)66,67, the molecular pathway that mediates miR-34a-dependent MERVL repression could also regulate miR-34a-dependent restriction on pluripotent potential. To determine the key sequences required for MERVL induction, we transfected wild-type and miR-34a-/- ESCs with a MERVL1-1000-Luc (luciferase) reporter containing the full-length LTR (MT2_Mm) and a portion of the gag sequence as the promoter 81. This reporter showed an elevated luciferase activity in miR-34a-/- ESCs compared to wild-type ESCs, faithfully recapitulating the endogenous MERVL induction (Fig. 3A). The LTR sequence was both necessary and sufficient for the MERVL1-1000-Luc reporter activity in miR-34a-/- ESCs (Fig. 3A); furthermore, a minimal fragment, MERVL125-

375, containing a direct repeat and a TATA box, was sufficient to drive strong luciferase activity specifically in miR-34a-/- ESCs (Fig. 3A). The MERVL125-375 fragment is highly conserved among all highly induced MERVL loci in miR-34a-/- iPSCs (Fig. S5A), suggesting a sequence-dependent transcriptional mechanism for MERVL induction. Consistently, miR-34a-/- ESCs and iPSCs also exhibited H3K4Me3 enrichment near the LTR of MERVL retrotransposons and the MERVL LTR proximal to tcstv1, but not near other ERVs such as IAP or MMERK10C (Fig.

Page 24: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

19 19

3B). Thus, MERVL loci are specially enriched for active transcription machinery in miR-34a-/- pluripotent stem cells.

Gata2 mediates elevated MERVL expression in miR-34a-/- pluripotent stem cells. The MERVL125-375 fragment likely contains cis-regulatory elements necessary and sufficient to enable MERVL induction in miR-34a-/- ESCs/iPSCs. We failed to detect any significant sequence complementarity between miR-34a (pri-, pre-, or mature miRNA sequences) and the MERVL125-375 sequence, thus precluding a direct, RNA base-pairing-dependent repression mechanism. We predicted 70 candidate transcription factors that bind within MERVL125-375, among which, only GATA-binding protein 2 (Gata2) exhibits an expression pattern similar to that of MERVL during early pre-implantation development (Fig. S5A; ref. 52, 53). Gata2 is also reported to play an important role in cell fate potency of ESCs 89.

We aligned the LTR sequences from 18 MERVL loci that were strongly induced in miR-34a-/- iPSCs, most of which contain three predicted Gata2 binding sites (Fig. S5A). Mutating the two most conserved sites within the MERVL125-375-Luc reporter significantly reduced its activity in miR-34a-/- pluripotent stem cells (Fig. 3C). Similarly, gata2 knockdown in miR-34a-/- pluripotent stem cells effectively abolished the induction of MERVL and MERVL proximal genes (zfp352, tmem132c and chit1) in cell culture (Fig. 3D; Fig. S5C), and significantly decreased MERVL and cdx2 induction during teratoma formation (Fig. S5D). Gata2 knockdown also reduced H3K4Me3 deposition on MERVL elements and on the MERVL proximal gene tcstv1, suggesting a specific decrease in active transcriptional machinery on MERVL loci (Fig. 3E). Finally, using chromatin immunoprecipitation (ChIP), we demonstrated specific binding of Gata2 to the LTR region of MERVL, and not to the MERVL internal region or to other ERVs such as IAP or MMERK10C (Fig. 3F). Taken together, Gata2 plays an essential role in directly promoting the induction of MERVL ERVs and MERVL proximal genes in miR-34a-/- pluripotent stem cells.

Epigenetic modifications constitute another possible mechanism for MERVL induction in miR-34a-/- pluripotent stem cells. We investigated the role of DNA methylation on MERVL induction, but no difference was detected between wild-type and miR-34a-/- iPSCs (Fig. S5E), consistent with intact MERVL silencing in dnmt3a-/-; dnmet3b-/- ESCs/iPSCs (Fig. S5F). MERVL was previously shown to be induced by a global decrease in H3K9Me2 or a global increase in H3K4Me1 in g9a/glp-/- ESCs or lsd1-/- ESCs, respectively 81,90. Using ChIP, we found that H3K27Ac and H3K9Me2 deposition on MERVL was unaltered in miR-34a-/- pluripotent stem cells (Fig. S5G and S5H). Additionally, while miR-34a-/- ESCs and iPSCs exhibited a ~2-3 fold H3K4Me1 enrichment on MERVL loci (Fig. S5I), miR-34a overexpression silenced MERVL expression in wildtype and lsd1-/- ESCs with a similar efficiency (Fig. S5J), suggesting that the direct mechanism through which miR-34a silenced MERVL was likely independent of a global alteration of H3K4Me1. Hence, none of the tested epigenetic mechanisms appeared to be essential mechanisms for miR-34a-mediated MERVL repression in ESCs/iPSCs.

Page 25: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

20 20

miR-34a restricts cell fate potential of pluripotent stem cells by directly repressing Gata2. Gata2 not only plays an essential role in mediating MERVL activation in miR-34a-/- ESCs/iPSCs (Fig. 3D), it also emerges as a strong candidate as a direct miR-34a target. gata2 harbors three predicted miR-34a binding sites 91,92 (Fig. 4A), and exhibited miR-34a-dependent regulation in pluripotent stem cells—gata2 mRNA and protein are increased in miR-34a-/- pluripotent stem cells and reduced upon ectopic miR-34a overexpression (Fig. 4B and 4C). Similarly, a luciferase reporter containing a fragment of the gata2 gene with all three predicted miR-34a binding sites exhibited miR-34a-dependent repression in luciferase assays (Fig. 4D). Mutating all three predicted miR-34a binding sites in this reporter abolished this miR-34a-dependent regulation (Fig. 4D). These findings suggest that gata2 is a direct miR-34a target in ESCs/iPSCs that mediates MERVL regulation. Consistent with Gata2 derepression in miR-34a-/- ESCs/iPSCs, a number of previously characterized Gata2 targets, including dab2, cd34 and gata1 89,93,94, exhibited a Gata2-dependent upregulation in miR-34a-/- iPSCs (Fig. S6A and S6B).

Knockdown of gata2 in miR-34a-/- iPSCs phenocopies miR-34a overexpression, not only downregulating the expression of MERVL and MERVL-proximal genes (Fig. 3D and S5C), but also abolishing their bidirectional developmental potential to differentiate into both embryonic and extra-embryonic lineages (Fig. 4E and 4F). In EB differentiation assay, gata2 knockdown or miR-34a overexpression in miR-34a-/- pluripotent cells impaired the induction of the TE marker cdx2, without affecting the induction of the markers for all three germ layers (Fig. 4E and 4F). More importantly, while the control infected miR-34a-/- iPSCs contributed to both ICM and TE in 53% of the chimeric blastocysts (n=8/15, Fig. 4G; Table S1), miR-34a-/- iPSCs with gata2 knockdown lost this expanded cell fate potential (n=0/13, Fig. 4G; Table S1). To our knowledge, miR-34a is the first non-coding RNA known to restrict pluripotent cell fate potential in ESCs/iPSCs. While multiple miR-34a targets could act collectively to restrict cell fate potential and repress MERVL expression in pluripotent stem cells, the miR-34a/Gata2 axis clearly plays an essential role in this process (Fig. 4H).

Page 26: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

21 21

Chapter 4 Conclusions

Page 27: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

22 22

Mouse zygotes and early blastomeres possess a totipotent cell fate potential, generating both embryonic and extra-embryonic cell types during normal development. This totipotent cell fate potential is gradually restricted during preimplantation development. By the late blastocyst stage, the separation of TE (which develops into the placenta to support embryonic development), epiblast (which forms the embryo proper) and PE (which develops into the yolk sac) signals the completion of the first cell fate specification event, which commits the developmental potential of cells to either embryonic or extra-embryonic lineages. ESCs and iPSCs in culture faithfully recapitulate the developmentally restricted, pluripotent cell fate potential of the epiblast, efficiently contributing to all embryonic cell lineages in vivo, but rarely to extra-embryonic lineages 4. Experimentally, ESCs can be induced into cells with bidirectional developmental potential, using somatic nuclear transfer, genetic modifications, and specific ESC/iPSC enrichment/culture/derivation procedures 64–68,83. The low efficiency of such experimental manipulations reflects the existence of multiple cellular and molecular impediments for the acquisition of a bidirectional cell fate potential.

miR-34a plays an important role in maintaining cell fate identity in multiple contexts, as it restricts pluripotent cell fate potential of ESCs/iPSCs and acts as a barrier for somatic reprogramming 82. To our knowledge, miR-34a is the first non-coding RNA whose deficiency in ESCs or iPSCs expands their developmental potential, generating a significant fraction of cells with bidirectional developmental potential. It is intriguing that miR-34a deficient pluripotent stem cells also induce the expression of MERVL ERVs; and this MERVL induction in ESCs/iPSCs appears correlated with their expanded cell fate potential in a number of experimental systems (Fig. 4H). MERVL induction could simply be an indicator of an unique 2C-like transcriptional and epigenetic state; alternatively, MERVL could functionally contribute to the establishment and/or maintenance of the 2C-like cell fate potential by rewiring gene regulatory networks to induce many 2C-specific, MERVL-driven genes 66,67. An interesting parallel can be drawn to human ESCs, wherein fluctuating levels HERV-H family ERVs marks a dynamic population of naive-state pluripotency 95. Taken together, these findings suggest the possibility that ERVs, while traditionally viewed as evolutionary remnants of invading foreign DNA sequences, could have important yet unrecognized developmental functions.

While miR-34a deficiency expands cell fate potential in pluripotent stem cells, miR-34a-/- mice undergo normal preimplantation development in laboratory conditions 82,96. It is conceivable that the miR-34 family miRNAs act redundantly with other mechanisms to repress MERVL expression in preimplantation development. It is also possible that an impaired totipotency to pluripotency transition in miR-34a-/- embryos is tolerated in mouse preimplantation development due to its considerable cell fate plasticity 7,97. Nevertheless, miR-34a-/- pluripotent stem cells constitute a powerful experimental system to investigate the molecular basis underlying the developmental potential to both embryonic and extra-embryonic linages. Our studies provides vital insights into an intricate network of protein-coding genes, ncRNAs, and retrotransposons that act cooperatively to define cell fate plasticity and cell fate potential in pluripotent stem cells.

Page 28: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

23 23

References

Page 29: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

24 24

1. Evans, M.J & Kaufman,M.H Establishment in culture of pluripotential cells from mouse embryos. Nature. 292, 154-156.pdf.

2. Martin, G. R. Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells Developmental Biology : 78, 7634–7638 (1981).

3. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–76 (2006).

4. Beddington, R. S. & Robertson, E. J. An assessment of the developmental potential of embryonic stem cells in the midgestation mouse embryo. Development 105, 733–737 (1989).

5. Tam, P. P. L. & Rossant, J. Mouse embryonic chimeras: tools for studying mammalian development. Development 130, 6155–63 (2003).

6. Papaioannou, V. E., Mkandawire, J. & Biggers, J. D. Development and phenotypic variability of genetically identical half mouse embryos. Development 106, 817–27 (1989).

7. Karlson, P. & Luscher, M. 1959 Nature Publishing Group. Nature 183, 55–56 (1959). 8. Wernig, M. et al. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like

state. Nature 448, 318–24 (2007). 9. Okita, K., Ichisaka, T. & Yamanaka, S. Generation of germline-competent induced

pluripotent stem cells. Nature 448, 313–7 (2007). 10. Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by

defined factors. Cell 131, 861–72 (2007). 11. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells. Science

318, 1917–20 (2007). 12. Park, I.-H. et al. Reprogramming of human somatic cells to pluripotency with defined

factors. Nature 451, 141–6 (2008). 13. Meissner, A., Wernig, M. & Jaenisch, R. Direct reprogramming of genetically unmodified

fibroblasts into pluripotent stem cells. Nat. Biotechnol. 25, 1177–81 (2007). 14. Huangfu, D. et al. Induction of pluripotent stem cells from primary human fibroblasts with

only Oct4 and Sox2. Nat. Biotechnol. 26, 1269–75 (2008). 15. Stadtfeld, M. & Hochedlinger, K. Induced pluripotency: history, mechanisms, and

applications. Genes Dev. 24, 2239–2263 (2010). 16. Krizhanovsky, V. & Lowe, S. W. NEWS & VIEWS The promises and perils of p53.

Nature 460, 4–5 (2009). 17. Li, H. et al. The Ink4/Arf locus is a barrier for iPS cell reprogramming. Nature 460, 1136–

9 (2009). 18. Marión, R. M. et al. A p53-mediated DNA damage response limits reprogramming to

ensure iPS cell genomic integrity. Nature 460, 1149–53 (2009). 19. Utikal, J. et al. Immortalization eliminates a roadblock during cellular reprogramming into

iPS cells. Nature 460, 1145–8 (2009). 20. Hanna, J. et al. Direct cell reprogramming is a stochastic process amenable to

acceleration. Nature (2009). doi:10.1038/nature08592 21. Hong, H. et al. Suppression of induced pluripotent stem cell generation by the p53-p21

pathway. Nature 460, 1132–5 (2009). 22. Kawamura, T. et al. Linking the p53 tumour suppressor pathway to somatic cell

reprogramming. Nature 460, 1140–4 (2009). 23. Zhao, Y. et al. Two Supporting Factors Greatly Improve the Efficiency of Human iPSC

Page 30: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

25 25

Generation. Cell Stem Cell 3, 475–479 (2008). 24. Riley, T., Sontag, E., Chen, P. & Levine, A. Transcriptional control of human p53-

regulated genes. Nat. Rev. Mol. Cell Biol. 9, 402–412 (2008). 25. Banito, A. et al. Senescence impairs successful reprogramming to pluripotent stem cells.

Genes Dev. 23, 2134–9 (2009). 26. McClintock, B. The origin and behavior of mutable loci in maize. Proc. Natl. Acad. Sci.

36, 344–355 (1950). 27. Ravindran, S. Barbara McClintock and the discovery of jumping genes. Proc. Natl. Acad.

Sci. U. S. A. 109, 20198–9 (2012). 28. Fedoroff, N. V. McClintock’s challenge in the 21st century. Proc. Natl. Acad. Sci. U. S. A.

109, 20200–3 (2012). 29. Fedoroff, N. V. Transposable Elements , Epigenetics , and Genome Evolution. 338,

(2012). 30. Koito, A. & Ikeda, T. Intrinsic immunity against retrotransposons by APOBEC cytidine

deaminases. Front. Microbiol. 4, 28 (2013). 31. Elbarbary, R. A., Lucas, B. A. & Maquat, L. E. Retrotransposons as regulators of gene

expression. Science 351, aac7247 (2016). 32. Boeke, J. D. & Chapman, K. B. Retrotransposition mechanisms. Curr. Opin. Cell Biol. 3,

502–507 (1991). 33. Blomberg, J., Benachenhou, F., Blikstad, V., Sperber, G. & Mayer, J. Classification and

nomenclature of endogenous retroviral sequences (ERVs): problems and recommendations. Gene 448, 115–23 (2009).

34. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

35. Stocking, C. & Kozak, C. a. Murine endogenous retroviruses. Cell. Mol. Life Sci. 65, 3383–98 (2008).

36. Maksakova, I. a et al. Retroviral elements and their hosts: insertional mutagenesis in the mouse germ line. PLoS Genet. 2, e2 (2006).

37. Alamgir, A. S. M., Owens, N., Lavignon, M., Malik, F. & Evans, L. H. Precise Identification of Endogenous Proviruses of NFS / N Mice Participating in Recombination with Moloney Ecotropic Murine Leukemia Virus ( MuLV ) To Generate Polytropic MuLVs Precise Identification of Endogenous Proviruses of NFS / N Mice Participating. (2005). doi:10.1128/JVI.79.8.4664

38. Belshaw, R. et al. Long-term reinfection of the human genome by endogenous retroviruses. Proc. Natl. Acad. Sci. U. S. A. 101, 4894–9 (2004).

39. Stoye, J. P. Studies of endogenous retroviruses reveal a continuing evolutionary saga. Nat. Rev. Microbiol. 10, 395–406 (2012).

40. Zhang, Y., Maksakova, I. a, Gagnier, L., van de Lagemaat, L. N. & Mager, D. L. Genome-wide assessments reveal extremely high levels of polymorphism of two active families of mouse endogenous retroviral elements. PLoS Genet. 4, e1000007 (2008).

41. D’Souza, V., Dey, A., Habib, D. & Summers, M. F. NMR structure of the 101-nucleotide core encapsidation signal of the Moloney murine leukemia virus. J. Mol. Biol. 337, 427–42 (2004).

42. Steffen, D., Bird, S., Rowe, W. P. & Weinberg, R. a. Identification of DNA fragments carrying ecotropic proviruses of AKR mice. Proc. Natl. Acad. Sci. U. S. A. 76, 4554–8 (1979).

Page 31: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

26 26

43. Stoye, J. P. & Coffin, J. M. proviruses revealed by using virus Polymorphism of Murine Endogenous Proviruses Revealed by Using Virus Class-Specific Oligonucleotide Probes. 62, (1988).

44. Ribet, D., Harper, F., Esnault, C., Pierron, G. & Heidmann, T. The GLN family of murine endogenous retroviruses contains an element competent for infectious viral particle formation. J. Virol. 82, 4413–9 (2008).

45. Retroviruses, I. E. et al. Evolution and Distribution of Class Evolution and Distribution of Class II-Related Endogenous Retroviruses †. (2005). doi:10.1128/JVI.79.10.6478

46. Baillie, G. J., Lagemaat, L. N. Van De, Baust, C. & Mager, D. L. Multiple Groups of Endogenous Betaretroviruses in Mice , Rats , and Other Mammals Multiple Groups of Endogenous Betaretroviruses in Mice , Rats , and Other Mammals. (2004). doi:10.1128/JVI.78.11.5784

47. McCarthy, E. M. & McDonald, J. F. Long terminal repeat retrotransposons of Mus musculus. Genome Biol. 5, R14 (2004).

48. Czarneski, J., Rassa, J. C. & Ross, S. R. Mouse Mammary Tumor Virus. 469–479 (2003). 49. Chase, D. G. & Pikó, L. Expression of A- and C-type particles in early mouse embryos. J.

Natl. Cancer Inst. 51, 1971–5 (1973). 50. Qin, C. et al. Intracisternal A particle genes: Distribution in the mouse genome, active

subtypes, and potential roles as species-specific mediators of susceptibility to cancer. Mol. Carcinog. 49, 54–67 (2010).

51. Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–62 (2002).

52. Bénit, L., Lallemand, J., Philippe, H. & Heidmann, T. ERV-L Elements : a Family of Endogenous Retrovirus-Like Elements Active throughout the Evolution of Mammals ERV-L Elements : a Family of Endogenous Retrovirus-Like Elements Active throughout the Evolution of Mammals. (1999).

53. Peaston, a E., Knowles, B. B. & Hutchison, K. W. Genome plasticity in the mouse oocyte and early embryo. Biochem. Soc. Trans. 35, 618–22 (2007).

54. Leung, D. C. & Lorincz, M. C. Silencing of endogenous retroviruses: when and why do histone marks predominate? Trends Biochem. Sci. 37, 127–133 (2011).

55. Hutnick, L. K., Huang, X., Loo, T.-C., Ma, Z. & Fan, G. Repression of retrotransposal elements in mouse embryonic stem cells is primarily mediated by a DNA methylation-independent mechanism. J. Biol. Chem. 285, 21082–91 (2010).

56. Tamaru, H. & Selker, E. U. A histone H3 methyltransferase controls DNA methylation in Neurospora crassa. Nature 414, 277–83 (2001).

57. Lehnertz, B. et al. Suv39h-Mediated Histone H3 Lysine 9 Methylation Directs DNA Methylation to Major Satellite Repeats at Pericentric Heterochromatin. 13, 1192–1200 (2003).

58. Peaston, A. E. et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7, 597–606 (2004).

59. Biosci, C., Lib, N. R. & Dupressoir, A. Germ line-specific expression of intracisternal A-particle retrotransposons in transgenic mice . These include : Germ Line-Specific Expression of Intracisternal A-Particle Retrotransposons in Transgenic Mice. 16, (1996).

60. Brûlet, P., Condamine, H. & Jacob, F. Spatial distribution of transcripts of the long repeated ETn sequence during early mouse embryogenesis. Proc. Natl. Acad. Sci. U. S. A. 82, 2054–8 (1985).

Page 32: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

27 27

61. Baust, C. et al. Structure and Expression of Mobile ETnII Retroelements and Their Coding-Competent MusD Relatives in the Mouse Structure and Expression of Mobile ETnII Retroelements and Their Coding-Competent MusD Relatives in the Mouse. (2003). doi:10.1128/JVI.77.21.11448

62. Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).

63. Rossant, J. & Tam, P. P. L. Blastocyst lineage formation, early embryonic asymmetries and axis patterning in the mouse. Development 136, 701–713 (2009).

64. I. Wilmut et al. Viable offspring derived from fetal and adult mammalian cells Nature 385, 810-813 (1997).

65. Mosteiro, L. et al. Reprogramming in vivo produces teratomas and iPS cells with totipotency features. doi:10.1038/nature12586

66. Ishiuchi, T. et al. Early embryonic-like cells are induced by downregulating replication-dependent chromatin assembly. Nat. Struct. Mol. Biol. 22, 662–671 (2015).

67. Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).

68. Morgani, S. M. et al. Totipotent embryonic stem cells arise in ground-state culture conditions. Cell Rep. 3, 1945–57 (2013).

69. Ambros, V. The functions of animal microRNAs. Nature 431, 350–5 (2004). 70. Ameres, S. L. & Zamore, P. D. Diversifying microRNA sequence and function. Nat. Rev.

Mol. Cell Biol. 14, 475–88 (2013). 71. Bartel, D. P. MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell 116,

281–297 (2004). 72. Chekulaeva, M. & Filipowicz, W. Mechanisms of miRNA-mediated post-transcriptional

regulation in animal cells. Curr. Opin. Cell Biol. 21, 452–460 (2009). 73. He, L. & Hannon, G. J. MicroRNAs: small RNAs with a big role in gene regulation. Nat.

Rev. Genet. 5, 522–531 (2004). 74. Kim, V. N., Han, J. & Siomi, M. C. Biogenesis of small RNAs in animals. Nat. Rev. Mol.

Cell Biol. 10, 126–139 (2009). 75. Zamore, P. D. & Haley, B. Ribo-gnome: the big world of small RNAs. Science (80-. ).

309, 1519–24 (2005). 76. Chen, L., Wang, D., Wu, Z., Ma, L. & Daley, G. Q. Molecular basis of the first cell fate

determination in mouse embryogenesis. Cell Res. 20, 982–93 (2010). 77. Judson, R. L., Babiarz, J. E., Venere, M. & Blelloch, R. Embryonic stem cell-specific

microRNAs promote induced pluripotency. Nat. Biotechnol. 27, 459–61 (2009). 78. Kanellopoulou, C. et al. Dicer-deficient mouse embryonic stem cells are defective in

differentiation and centromeric silencing. Genes Dev. 19, 489–501 (2005). 79. Viswanathan, S. R. & Daley, G. Q. Lin28: A MicroRNA Regulator with a Macro Role.

Cell 140, 445–449 (2010). 80. Viswanathan, S. R. et al. MicroRNA expression during trophectoderm specification. PLoS

One 4, 17–19 (2009). 81. Macfarlan, T. S. et al. Endogenous retroviruses and neighboring genes are coordinately

repressed by LSD1/KDM1A. Genes Dev. 25, 594–607 (2011). 82. Choi, Y. J. et al. miR-34 miRNAs provide a barrier for somatic cell reprogramming. Nat

Cell Biol 13, 1353–1360 (2011). 83. Niwa, H. et al. Interaction between Oct3/4 and Cdx2 determines trophectoderm

Page 33: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

28 28

differentiation. Cell 123, 917–929 (2005). 84. Giakoumopoulos, M. & Golos, T. G. Embryonic stem cell-derived trophoblast

differentiation: A comparative review of the biology, function, and signaling mechanisms. J. Endocrinol. 216, (2013).

85. Kubaczka, C. et al. Direct Induction of Trophoblast Stem Cells from Murine Fibroblasts. Cell Stem Cell 17, 557–568 (2015).

86. Benchetrit, H. et al. Extensive Nuclear Reprogramming Underlies Lineage Conversion into Functional Trophoblast Stem-like Cells. Cell Stem Cell 17, 543–556 (2015).

87. Kigami, D. MuERV-L Is One of the Earliest Transcribed Genes in Mouse One-Cell Embryos. Biol. Reprod. 68, 651–654 (2002).

88. Nichols, J. et al. Formation of pluripotent stem cells in the mammalian embryo dependes on the POU transcription factor Oct4. Cell 95, 379–391 (1998).

89. Zhang, C., Ye, X., Zhang, H., Ding, M. & Deng, H. GATA factors induce mouse embryonic stem cell differentiation toward extraembryonic endoderm. Stem Cells Dev. 16, 605–613 (2007).

90. Maksakova, I. A. et al. Distinct roles of KAP1, HP1 and G9a/GLP in silencing of the two-cell-specific retrotransposon MERVL in mouse ES cells. Epigenetics Chromatin 6, 15 (2013).

91. Lewis, B. P., Shih, I., Jones-Rhoades, M. W., Bartel, D. P. & Burge, C. B. 33-Prediction of Mammalian MicroRNA Targets. Cell 115, 787–798 (2003).

92. Miranda, K. C. et al. A Pattern-Based Method for the Identification of MicroRNA Binding Sites and Their Corresponding Heteroduplexes. Cell 126, 1203–1217 (2006).

93. Orkin, H. GATA- Binding Transcript ion Factors in Hema to poie t ic Cells. Cancer 575–581 (1992).

94. Suzuki, M. et al. GATA factor switching from GATA2 to GATA1 contributes to erythroid differentiation. Genes to Cells 18, 921–933 (2013).

95. Wang, J. et al. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516, 405–9 (2014).

96. Concepcion, C. P. et al. Intact p53-dependent responses in miR-34-deficient mice. PLoS Genet. 8, (2012).

97. Rossant, J. Postimplantation development of blastomeres isolated from 4- and 8-cell mouse eggs. J. Embryol. Exp. Morphol. 36, 283–90 (1976).

Page 34: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

29 29

Figures and Tables

Page 35: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

30 30

Page 36: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

31 31

Page 37: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

32 32

Figure Legends

Fig 1. miR-34a-/- pluripotent stem cells exhibit expanded cell fate potential. (A). miR-34a-/- teratomas contain extra-embryonic cell lineages and extra-embryonic cell markers. Teratomas generated from miR-34a-/- iPSCs and miR-34a-/- ESCs contain cells with the typical placental trophoblast giant cell morphology (black arrows) and placental lactogen 1 (PL-1) expression. Asterisks: the blood-filled lacunae associated with placenta giant cell-like cells. Scale bars, 50 mm. B, C. miR-34a-/- embryoid bodies (EBs) exhibit an induction of both embryonic and extra-embryonic cell markers in immunofluorescence (IF) staining (B) and real-time PCR analyses (C). B. miR-34a-/- EBs yield a greater percentage of Cdx2-positive EBs in IF staining and exhibit an induction of the TE marker Cdx2 primarily in cells at the periphery. Scale bars, 100 mm. Error bars: s.d., n=5-7 (randomly selected 10x fields). C. miR-34a-/- EBs showed an increase in TE markers cdx2, elf5, esx1, tfap2c and gata3. Three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- ESC lines were compared. Error bars, s.d., n=3. D-G. miR-34a-/- ESCs contribute to both embryonic and extra-embryonic cell lineages in chimeric assays in vivo. D. Four GFP-labeled wild-type or miR-34a-/- ESCs were microinjected into each C57BL/6N recipient morula, and the contribution of their progenies to the inner cell mass (ICM) and the trophectoderm (TE) were determined by the localization of GFP-positive cells (left). Scale bar, 20 mm. The percentage of chimeric blastocyst embryos with ESC contribution to the ICM, the TE, and ICM+TE were measured for both wild-type and miR-34a-/- ESCs (right). Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- ESCs were compared. n, the number of chimeric embryos obtained for each ESC line from three independent injections (Table S1). Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- ESC lines were compared. E. Single GFP-labeled miR-34a-/- ESCs are able to contribute to both ICM and TE (white arrows) of chimeric blastocysts. Representative images were shown for two chimeric blastocysts (top). Scale bar, 20 mm. The percentage of chimeric embryos with ESC contribution to the ICM, the TE, and ICM+TE were quantified (bottom). Two independent miR-34a-/- ESC lines were examined. n, the number of chimeric blastocyst embryos for each ESC line from three independent injections for each line. All P-values were calculated on a basis of a two-tailed student’s t-test. * P < 0.05; ** P < 0.01; *** P < 0.001. Two independent miR-34a-/- ESC lines were examined. F, G. miR-34a-/- ESCs contribute to multiple differentiated cell lineages in embryo, yolk sac and placenta in chimeric analyses in vivo. 10-15 GFP-labeled wild-type or miR-34a-/- ESCs were microinjected into C57BL/6N blastocysts or aggregated with CD1 recipient morulae to generate chimeric embryos. GFP-labeled miR-34a-/- ESCs contributed to the visceral endoderm derivatives of the yolk sac (F, scale bar, 50 mm), and multiple trophoblast lineages of the placenta (trophoblast giant cells, spongiotrophoblasts, syncytiotrophoblasts (STBs) and sinusoidal trophoblast giant cells (s-TGCs) (G, scale bar, 50 mm) of the chimeric embryos at E12.5 and E14.5. F. A diagram illustrates the yolk sac tissue architecture and major cell types (top). GFP-positive visceral endoderm cells are identified based on the bilaminar structure of the yolk sac and their characteristic columnar epithelial morphology (bottom). G. A diagram illustrates the placenta tissue architecture and major cell types (top). (Bottom) GFP-positive trophoblast giant cells (white arrows) and s-TGCs (red arrows) are identified based on their specific distribution in placenta and their unique cell morphology; GFP-positive spongiotrophoblasts or STBs (white arrows) are identified based on the IF co-staining with trophoblast specific protein alpha (Tpbpa) or ferroportin (MTP1). fbc, nucleated fetal blood cells; f, fetal blood; m, maternal blood. The

Page 38: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

33 33

statistic summary of multiple passage- and littermate-controlled wild-type and miR-34a-/- ESC lines were summarized in Table S1.

Page 39: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

34 34

Page 40: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

35 35

Page 41: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

36 36

Fig. 2. miR-34a-/- pluripotent stem cells exhibit specific induction of the MERVL ERVs. A. The MERVL ERVs are highly induced and differentially expressed (DE) transcriptional unit in miR-34a-/- iPSCs compared to wild-type iPSCs. An MA-plot compares the transcription profiles of miR-34a-/- and wild-type (WT) iPSCs using RNA-seq data. DE transcriptional units, including protein-coding genes, long non-coding RNAs (lncRNAs), pseudogenes, antisense transcripts, and retrotransposons, are color-coded by class. The 441 DE transcription units between miR-34a-

/- and wild-type iPSCs (False Discovery Rate 5%, absolute log2 fold-change ≥ 2) include 352 protein-coding genes, 54 pseudogenes, 13 lncRNAs, 6 antisense transcripts, 4 retrotransposon families, and 12 others. Among these DE genes, MERVL-int (the MERVL internal sequence that encodes gag, pol and dUTPase) and MT2_Mm (the solo-LTR of the canonical MERVL) are strongly induced and highly expressed in miR-34a-/- iPSCs. RNA-seq data were generated from three pairs of independently derived, passage- and littermate-controlled wild-type and miR-34a-/- iPSCs. CPM: counts per million. B. MERVL ERVs are specifically induced in miR-34a-/- iPSCs and ESCs, as measured by real-time PCR analyses. All other ERVs tested, including intracisternal A-particle (IAP), LINE L1, SINE B1, and MMERGLN, exhibited no significant difference between wild-type and miR-34a-/- pluripotent stem cells. Three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC and ESC lines were compared. Error bars: s.d., n=3. C. A schematic diagram illustrates the transcript structure of the MERVL family of ERVs, with the position of four pairs of validated real-time PCR primers indicated (a, b, c and d) (top). (Bottom) A pie chart shows the relative abundance for the different MERVL subclasses, categorized according to their structural features. MERVL loci with a complete structure (red) carry both 5’ and 3’ LTRs, along with an internal sequence (annotated by Repeat Masker as MERVL-int); truncated MERVL elements (green) lack one or both LTRs; and solo-LTRs of canonical MERVL (pink), also designated as MT2_Mm, are generated through homologous recombination during evolution. D. The MERVL induction in miR-34a-/- iPSCs occurs primarily in loci with a complete structure. Histograms are shown for the log2 fold-change of the expressed MERVL loci between miR-34a-/- and wild-type iPSCs, either with a complete structure (top) or with a truncated structure (bottom). E. The MERVL-Gag and Oct4 expression is mutually exclusive in a subset of miR-34a-/- ESCs, while MERVL-Gag staining is absent in wild-type ESCs. Scale bars, 20 mm. F. The percentage of MERVL-Gag positive cells were quantified in early passage of passage- and littermate-controlled wild-type and miR-34a-/- ESCs (left) and iPSCs (right). Error bars: s.d., n=5-6 (randomly selected 10x fields). E-F. Three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC and ESC lines were compared. G-I. The MERVL activation in miR-34a-/- pluripotent stem cells induces many MERVL proximal gene isoforms. G. MERVL proximal genes with an upstream or intronic MERVL element on the same strand are preferentially up-regulated in miR-34a-/- iPSCs. Box-plots of log2 fold-change between miR-34a-/- and wild-type iPSCs are shown for genes proximal to annotated, complete MERVL loci (left) or MERVL solo LTR (MT2_Mm) loci (right). The numbers in parentheses indicate the number of protein-coding genes in each category; the box plot of the log2 fold-change of all protein-coding genes is included for reference. Same ori. (same orientation): MERVL and its proximal gene are on the same strand; opposite ori. (opposite orientation): MERVL and its proximal gene are on opposite strands. Genes harboring a complete MERVL copy in their introns on the same strand, as well as genes with MT2_Mm on the same strand either upstream or in their introns, have significantly larger fold-changes than the rest of the genes. **, P < 0.01; *** P < 0.001; Wilcoxon-Mann-Whitney test. H. Examples are shown for induced expression and altered transcript structure of MERVL proximal genes in miR-34a-/-

Page 42: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

37 37

iPSCs. The MT2B1 solo LTR elements upstream of zfp352, tcstv1 or tcstv3 act as promoters to strongly induce their expression in miR-34a-/- iPSCs. Similarly, a complete MERVL element upstream of cml2 acts as an alternative promoter to induce the expression of an MERVL-cml2 isoform in miR-34a-/- iPSCs. Intronicly localized MERVLs, either a solo LTR (as that in intron 13 of abcb5) or a complete ERV element (as that in intron 5 of tmem132c), also act as alternative promoters to drive the expression of truncated gene isoforms that contain only the downstream exons. Error bars: s.d., n=3, *** P < 0.001. I. miR-34a overexpression in miR-34a-/- iPSCs using a MSCV retroviral vector significantly suppresses the level of MERVL and the MERVL-zfp352 chimeric transcript in real-time PCR analysis, but causes no alteration in the level of IAP. One pair of passage- and littermate-controlled wild-type and miR-34a-/- iPSCs were examined. Error bars: s.d., n=3. All P-values were calculated on a basis of a two-tailed Student’s t-test unless stated otherwise. * P < 0.05, ** P < 0.01, *** P < 0.001.

Page 43: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

38 38

Page 44: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

39 39

Page 45: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

40 40

Fig.3. Gata2 is essential for the MERVL induction in miR-34a-/- pluripotent stem cells. A. A full length or a fragment of MERVL LTR can be activated specifically in miR-34a-/-ESCs. (Left) A schematic diagram shows the MERVL LTR and the truncated fragments that were tested for promoter activity using luciferase (Luc) assays. (Right) The Luc reporters driven by MERVL fragments containing the full length LTR (MERVL1-1000-Luc and MERVL1-493-Luc) exhibit a strong activation in miR-34a-/- ESCs, but not in wild-type ESCs. A Luc reporter containing the truncated MERVL125-375 fragment completely recapitulates this differential reporter activity in wild-type and miR-34a-/- ESCs. Error bars: s.d., n=2. PBS: primer binding site. B. Chromatin ImmunoPrecipitation (ChIP) reveals an increased H3K4Me3 modification on the MERVL LTR and the MERVL-tcstv1 chimeric gene in miR-34a-/- ESCs. In comparison, the H3K4Me3 level is unaltered on IAP LTR and MMERVK10C LTR. C. Mutations of two predicted Gata2 binding sites (BS1 and BS3) in the MERVL125-375-Luc reporter synergistically impair the reporter activity in miR-34a-/- ESCs. (Left) A schematic diagram shows the MERVL125-375-Luc reporter and the mutated derivatives. (Right) While the mutation of BS1 or BS3 alone in the MERVL125-375-Luc reporter modestly impairs its activity in miR-34a-/- ESCs, mutations of both BS1 and BS3 significantly reduces the reporter activity. Error bars: s.d., n=2. D. gata2 knockdown significantly decreases the expression of MERVL and MERVL proximal genes. Using two independent shRNAs targeting gata2, we effectively knocked down gata2 in miR-34a-/- iPSCs using RNA interference (RNAi) (also see Fig. S5C), and observed a decreased expression of MERVL and MERVL proximal genes (zfp352 and tmem132c). E. gata2 knockdown in miR-34a-/- iPSCs significantly decreases the H3K4Me3 deposition on MERVL elements and on the specific MERVL element upstream of tcstv1, but has no effects on H3K4Me3 deposition on IAP or MMERVK10C. F. ChIP reveals an increase of Gata2 binding to the MERVL LTR region upon Gata2 overexpression in miR-34a-/- iPSCs. Gata2 binding to MERVL pol, IAP LTR, or MMERVK10C LTR is unaltered upon Gata2 overexpression. C, E-F. Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines were compared. Error bars: s.d., n=3. All P-values were calculated on a basis of a two-tailed Student’s t-test. * P < 0.05, ** P < 0.01, *** P < 0.001, n.s., not significant

Page 46: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

41 41

Page 47: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

42 42

Page 48: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

43 43

Fig. 4. miR-34a restricts cell fate potential of pluripotent stem cells by targeting gata2. A. A schematic representation of three predicted miR-34a binding sites in the gata2 mRNA, with one site (1) located at the 3’ end of the open reading frame (ORF) and two sites (2 and 3) located within the 3’UTR. Site 1 (red) is predicted as a strong miR-34a binding site by both duplex folding energy and the 7mer-A1 seed-match rule (28, 29). While site 3 (red) does not have a perfect seed sequence, it contains a compensatory 3’ base-pairing and exhibits a strong folding energy. In contrast, site 2 (orange) represents a weaker prediction (28). B. Gata2 exhibits miR-34a-dependent repression in miR-34a-/- pluripotent stem cells. Gata2 protein (left) and gata2 mRNA (right) are elevated in miR-34a-/- iPSCs as compared with wild-type iPSCs. Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines were measured by Western blotting and real-time PCR analyses. Error bars: s.d., n=3, * P < 0.05. C. Overexpression of miR-34a in miR-34a-/- iPSCs represses the Gata2 protein level. The quantification of western blot analyses was performed by ImageJ. D. Mutating all three predicted miR-34a binding sites within the Luc-gata2-3UTR luciferase reporter completely abolishes its miR-34a-dependent repression. Error bars: s.d., n=2. E. gata2 knockdown in miR-34a-/- iPSCs by RNAi abolishes the induction of the TE marker cdx2 during EB differentiation, but has no effects on the induction of ectoderm (pax6), mesoderm (brachyury), or endoderm (foxa2) markers. Error bars: s.d., n=3. F. miR-34a overexpression in miR-34a-/- iPSCs significantly suppresses the induction of TE markers cdx2 and elf5 upon EB differentiation. Error bars: s.d., n=3. E-F. Results shown are representative of two independent experiments using the same miR-34a-/- iPSC line. G. The expanded cell fate potential of miR-34a-/- iPSCs is restricted by gata2 knockdown in chimeric analyses. We injected four GFP-labeled cells into each recipient morula. While 50% of chimera blastocysts generated from control infected miR-34a-/- iPSCs contain iPSC contribution to both ICM and TE (n=7/14), miR-34a-/- iPSCs with gata2 knockdown lost this expanded cell fate potential (n=0/13), and primarily contributed to the ICM (n=11/13). All P-values were calculated on a basis of a two-tailed Student’s t-test. * P < 0.05, ** P < 0.01. H. A diagram shows our proposed model in which the miR-34a/gata2 pathway restricts the pluripotent cell fate potential of pluripotent stem cells.

Page 49: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

44 44

Page 50: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

45 45

Page 51: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

46 46

Fig. S1. miR-34a-/- pluripotent stem cells exhibit an expanded cell fate potential in vitro and in vivo. A. Wild-type and miR-34a-/- iPSCs exhibit ESC-like morphology and express pluripotency markers Oct4 and SSEA1. Scale bars, 50 mm for differential interference contrast (DIC) images and 20 mm for immunofluorescence (IF) images. B. Wild-type and miR-34a-/- iPSCs (top) and ESCs (bottom) generate differentiated teratomas containing tissues derived from three germ layers (ectoderm, mesoderm and endoderm) as shown by hematoxylin and eosin (H&E) staining. Scale bars, 25 mm. Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSCs and ESCs are compared. C. miR-34a-/- iPSCs efficiently contribute to adult chimeric mice as determined by coat color pigmentation. Shown here is a representative image of three independent experiments, in which three independent lines of Oct4-Gfp/+; a/a; miR-34a-

/- iPSCs were microinjected into albino-C57BL/6/cBrd/cBrd/cr blastocysts. D. The expression level of miR-34/449 family miRNAs in wild-type ESCs measured by small RNA sequencing (left) and real-time PCR analyses (right). miR-34a is the most highly expressed miR-34/449 miRNA in two independent wild-type ESC lines measured. E. The trophectoderm (TE) marker cdx2, elf5, psx1, fgfr2, egfr and mdfi, as well as the primitive endoderm marker gata4, gata6, and sox17, were highly induced in miR-34a-/- teratomas, as determined by real-time PCR. In contrast, wild-type and miR-34a-/- teratomas similarly induced the expression of pax6 (an ectoderm marker), brachyury (a mesoderm marker) and foxa2 (an endoderm marker). Teratomas were generated from two independent pairs of passage- and littermate- controlled wild-type and miR-34a-/- ESC lines. Error bars: standard deviation (s.d.), n=3. F. Compared to wild-type EBs, miR-34a-/- EBs derived from ESCs yield a greater expression of extra-embryonic endoderm marker gata4 and pdgfra, and trophoblast lineage marker mash2 (ascl2) and pl1 (prl3d1). The real-time PCR analyses were performed using data collected from three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- ESC lines. Error bars: s.e.m., n=3. G. During EB differentiation, miR-34a-/- iPSCs, but not wild-type iPSCs, exhibited strong induction of the TE marker cdx2. In contrast, wild-type and miR-34a-/- EBs similarly induced the expression of pax6 (an ectoderm marker), brachyury (a mesoderm marker) and foxa2 (an endoderm marker). Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines are compared. Error bars: s.d., n=3. H. Compared to wild-type EBs, miR-34a-/- EBs derived from iPSCs yield a greater percentage of Cdx2-positive EBs, as well as an increased percentage of Cdx2+ cells in each EB. Scale bars, 100 mm. Shown here are representative images and quantitation from one pair of passage- and littermate-controlled wild-type and miR-34a-/- iPSC and ESC lines. Error bars, s.d., n=6 random fields measured. I. miR-34a-/- iPSCs contribute to both ICM and TE when aggregated with recipient wild-type morulae. The ICM versus the TE contributions from the GFP-labeled wild-type or miR-34a-/- iPSCs were determined by the localization of GFP positive cells in the chimeric blastocysts. While wild-type iPSCs exclusively contribute to the ICM, a fraction of miR-34a-/- iPSCs contributes to both ICM and TE (white arrows) (top). Scale bar, 20 mm. The percentage of chimeric embryos with iPSC (left) or ESC (right) contribution to the ICM, the TE and ICM+TE in the chimeric blastocysts was quantified for both wild-type and miR-34a-/- iPSCs from six independent aggregation experiments (bottom). Three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC and ESC lines were compared. J. miR-34a-/-ESCs contributed to all three lineage germ layers. Scale bars, 50 mm for IHC images and 100 mm for IF images. All P-values were calculated on a basis of a two-tailed Student’s t-test. * P < 0.05, ** P < 0.01.

Page 52: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

47 47

Page 53: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

48 48

Page 54: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

49 49

Fig. S2. MERVL ERVs are specifically induced in mir-34a-/- pluripotent stem cells. A. MA plots illustrate the comparison of the transcription profiles of all retrotransposon loci between wild-type and miR-34a-/- iPSCs using the RNA-seq data. We found 949 differentially expressed (DE) retrotransposon loci (FDR of 5%, absolute log2 fold-change ≥ 2). (Left) The DE loci are color-coded by retrotransposon class, with 69 LINEs (red, 7%), 101 SINEs (blue, 11%), 777 ERVs (green, 82%) and 2 others (purple, <1%). (Middle) Loci from the ERV-III class are preferentially derepressed in miR-34a-/- iPSCs. When only the ERV loci are highlighted for DE retrotransposons in the MA plot, 46 belong to the ERV-I class (red, 6%), 97 belong to the ERV-II class (blue, 12%) and 634 belong to the ERV-III glass (green, 82%). (Right) Loci from the MERVL family of ERVs are preferentially derepressed in miR-34a-/- iPSCs. When only ERV-III loci are highlighted for DE retrotransposons in the MA plot, we identified 552 MERVL (red, 87%), 27 MT2_Mm (blue, 4%), 1 MT2A (green, <1%), 3 MT2B (purple, <1%), 6 MT2B1 (orange, 1%), 11 MT2B2 (yellow, 2%) and 34 other ERV-III loci (brown, 5%). MT2_Mm is designated as the solo LTR of the canonical MERVL ERV; MT2A, MT2B, MT2B1, and MT2B2 refer to solo-LTRs that are related to MT2_Mm. CPM, counts per million. B, C. The extent of MERVL derepression (B) and the expanded cell fate potential (C) is decreased in late passages of miR-34a-/- ESCs. In contrast, MERVL derepression is more stable in miR-34a-/- iPSCs. The level of MERVL expression was measured using real-time PCR analyses for three (ESC) or two (iPSC) independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- cells. Error bars: s.d., n=3. C. When four GFP-labeled ESCs were microinjected into each recipient morula, 63% of early passage (P10) miR-34a-/- ESCs contribute to both ICM and TE (n=12/19), while only 6% of late passage (P17) miR-34a-/- ESCs exhibit the same phenotype (n=1/16). Scale bar, 20 mm. D. An MA-plot compares the transcription profiles of all retrotransposon loci between wild-type and miR-34a-/- iPSCs using the RNA-seq data. Among the 949 DE retrotransposon loci (FDR of 5%, absolute log2 fold-change of 2 or more); 600 belong to the MERVL family (highlighted in color, 63%), revealing a specific derepression of this ERV family in miR-34a-/- iPSCs. Interestingly, 86.5% (519) are MERVL elements with a complete structure (red); 7.5% (45) are solo LTRs (blue); and 6% (36) are truncated copies (green). CPM, counts per million. E. miR-34a-/- ESC cultures exhibit an increase of cells expressing MERVL-Gag but lacking Oct4, as determined by IF staining. Scale bars, 20 mm. Images shown are representative from two pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSCs. F. The MERVL expression level is heterogeneous among miR-34a-/- iPSC colonies. Using single-cell real-time PCR technology, we measured the MERVL expression level in individual colonies of wild-type and miR-34a-/- iPSCs, demonstrating the existence of colonial heterogeneity. Error bars: s.e.m., n=7-8. All P-values were calculated on a basis of a two-tailed Student’s t-test. ** P < 0.01. G. A representative miR-34a-/- iPSC colony contains cells with strong MERVL-Gag expression, yet no Oct4 expression. Scale bars, 50 mm.

Page 55: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

50 50

Page 56: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

51 51

Page 57: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

52 52

Fig. S3. The MERVL induction in miR-34a-/- pluripotent stem cells alters the expression and structure of a subset of MERVL proximal genes. A-B. miR-34a-/- pluripotent stem cells and other reported totipotent-like ESCs share a similar expression profile. We reanalyzed several publicly available datasets (see Supplementary Text), and revealed a similar transcriptional profile between miR-34a-/- iPSC and other published totipotent-like ESCs, such as 2C+ ESCs (7), kdm1a-/- ESCs (25), and chromatin assembly factor-1 (CAF-1)-deficient (si-p60 and si-150) ESCs (9). A. Hierarchical clustering of published datasets (7, 9, 25) effectively clusters the expression data in two main branches: bi-potential vs. pluripotent. B. Differential expression analysis of bi-potential cells versus pluripotent stem cells using published datasets (7, 9, 25). We excluded our miR-34a-/- and wild-type iPSC data from this analysis to avoid the whole dataset driven by the miR-34a-/- iPSC data. The volcano plot shows that a large fraction of MERVL-associated genes (58/224) is differential upregulated in totipotent-like cells. FC: fold-change (bi-potential/pluripotent); CPM: counts per million. C. The differentially upregulated protein-coding genes in miR-34a-/- iPSCs are enriched for genes with an observed MERVL-gene junction or genes proximal to MERVL (n=50/242, Fisher’s exact test, P <10-15). An MA-plot is shown to compare the transcription profiles of protein-coding genes between miR-34a-/- and wild-type iPSCs using the RNA-seq data. The DE protein-coding genes are color-coded by their relation to MERVL. Green: DE genes with an observed junction read with an adjacent MERVL element (see Supplementary Text); orange: DE genes with an upstream or intronic MERVL element; blue: all other DE genes. CPM: counts per million. D, E. Examples are shown for induced expression and altered transcript structure of MERVL proximal genes in miR-34a-/- iPSCs and ESCs. MERVL associated genes are induced in miR-34a-/- iPSCs (D) and miR-34a-/- ESCs (E). D. In miR-34a-/- iPSCs, the p4ha2 transcription is induced by an upstream MT2B element, a solo LTR related to MERVL LTR; the chit1 gene contains an intronic MERVL element in intron 1, which acts as an alternative promoter to drive a chit1 isoform containing all downstream exons. The expression level of chit1 and p4ha2 was measured using three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines. Error bars: s.d., n=3. E. Real-time PCR analyses validate the induction of MERVL-gene isoforms in miR-34a-/- ESCs, including cml2 and p4ha2 (with an upstream MERVL element) as well as abcb5, tmem132c and chit1 (with an intronic MERVL element). Intronicly localized MERVLs, either a solo LTR (as that in intron 13 of abcb5) or a complete ERV (as that in intron 5 of tmem132c or intron 1 of chit1), act as alternative promoters to drive the expression of truncated gene isoforms that only contain the downstream exons. Three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- ESC lines were compared. Error bars: s.d., n=3. F. Induction of MERVL proximal genes is heterogeneous among different miR-34a-/- iPSC colonies. Using single cell real-time PCR analyses, we measured the expression of MERVL proximal genes (abcb5, zfp352, tcstv1, tcstv3, and chit1) in individual colonies of wild-type and miR-34a-/- iPSCs, demonstrating the colonial heterogeneity in their expression level. Interestingly, one of a previously reported totipotency marker hex1 (8) is expressed at a similar level between wild-type and miR-34a-/- ESCs, suggesting that the Hex1-positive ESCs are likely a different cell population from the totipotent-like miR-34a-/- ESC population. Error bars: s.d., n=3. All P-values were calculated on a basis of a two-tailed Student’s t-test. * P < 0.05, ** P < 0.01, *** P < 0.001, n.s., not significant.

Page 58: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

53 53

Page 59: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

54 54

Fig. S4. The effect of the length and depth of RNA-seq data on the transcriptional profile characterization of miR-34a-/- iPSCs. We re-sequenced the wild-type and miR-34a-/- iPSC RNA-seq libraries to a greater length (150 bp paired-end (PE)) and sequencing depth (additional 192 million reads), in order to explore to what extent our previous RNA-seq analysis is limited by the technical difficulty to precisely map the repetitive RNA-seq reads. A. (Top) The MERVL ERVs are the most highly induced and differentially expressed (DE) transcriptional unit in miR-34a-/- iPSCs. MA-plots compare the transcription profiles of miR-34a-/- and wild-type (WT) iPSCs. DE transcriptional units, including protein-coding genes, long non-coding RNAs (lncRNAs), pseudogenes, antisense transcripts, and retrotransposons, are color-coded by class, as in Fig. 2A. The results of 100 bp PE RNA-seq (left, same as Fig. 2A), the results of 150 bp PE RNA-seq (middle), and the results of the combined dataset by pooling together the two sequencing datasets (right) are shown as MA plots. (Bottom) The upregulated protein-coding genes in miR-34a-/- iPSCs are enriched for genes with an observed MERVL-gene junction or genes proximal to MERVL. MA-plots compare the transcription profiles of protein-coding genes between miR-34a-/- and WT iPSC using RNA-seq data. The DE protein-coding genes are color-coded by their relation to MERVL, as in Fig. S3C. The results of 100 bp PE RNA-seq (left, same as Fig. S3C), the results of 150 bp PE RNA-seq (middle), and the results of the combined datasets (right) are shown as MA plots. B. Venn diagrams showing the concordance between the 100 bp PE and 150 bp PE RNA-seq datasets. For each of the datasets (100 bp PE, 150 bp PE, and combined), we identified DE retrotransposons families, MERVL loci, DE genes, and DE MERVL-gene junctions (see Supplementary Text). The datasets are largely in agreement, suggesting that 100 bp PE reads are sufficient to detect most DE MERVL-gene junctions and DE MERVL loci. As expected, the combined dataset has more power to detect differential expression, since we are effectively doubling the sequencing depth.

Page 60: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

55 55

Page 61: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

56 56

Page 62: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

57 57

Page 63: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

58 58

Fig. S5. Gata2 mediates the MERVL induction in miR-34a-/- pluripotent stem cells. A. Clustal-W LTR sequence alignment of 18 differentially expressed MERVL loci in miR-34a-/-

iPSCs reveals three conserved predicted Gata1/2/3 binding sites within the minimal region of MERVL LTR, MERVL125-375. Among the three predicted GATA binding sites (designated as BS1, BS2, and BS3 and highlighted in yellow), BS1 and BS3 are fully conserved across all 18 MERVL elements, while BS2 is partially conserved. B. Expression patterns of MERVL, gata2 and miR-34a during mouse preimplantation development. In published datasets (52, 53), the levels of MERVL and gata2 both peak in 2C embryos; and the mature miR-34a is highly expressed from the 8C to the blastocyst stage (top). Using single-embryo real-time PCR analyses, we validated the expression patterns of MERVL, gata2, and pri-miR-34a in mouse preimplantation embryos (bottom). C. The expression level of the MERVL proximal genes is dependent on gata2. Two independent shRNAs against gata2 are able to effectively knock down gata2 and the MERVL proximal gene chit1 in miR-34a-/- iPSCs. Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines were compared. Error bars: s.d., n=3. D. gata2 is necessary for MERVL and cdx2 induction during teratoma formation. In teratomas generated from miR-34a-/- iPSCs, knockdown of gata2 in miR-34a-/- iPSCs reduces the MERVL and cdx2 levels during teratoma formation. E, F. DNA methylation is not essential for the MERVL induction in miR-34a-/- pluripotent stem cells. E. Wild-type and miR-34a-/- iPSCs have similar level of modest DNA methylation on MERVL elements, as determined by bisulfite sequencing. In contrast, iPSCs of both genotypes exhibit a high level of DNA methylation on the IAP elements. Black circle, methylated CpG; open circle, unmethylated CpG. F. No MERVL induction is detected in dnmt3a-/-; dnmt3b-/- ESCs and iPSCs that are deficient for de novo DNA methylation. Error bars: s.d., n=3. n.s., not significant. G-I Characterization of epigenetic modifications on MERVL in wild-type and miR-34a-/- pluripotent stem cells. Wild-type and miR-34a-/- pluripotent stem cells have similar deposition of H3K27Ac (G) and H3K9Me2 (H) on the MERVL LTR and the MERVL-tcstv1 chimeric gene, yet the deposition of H3K4Me1 (I) on MERVL is increased in miR-34a-/- pluripotent stem cells. As a control, H3K27Ac (G), H3K9Me2 (H) and H3K4Me1 (I) deposition on IAP LTR or MMERVK10C LTR is similar between wild-type and miR-34a-/- pluripotent stem cells. Error bars: s.d., n=3. Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- ESCs and iPSCs are compared. J. miR-34a overexpression in lsd1 deficient (lsd1GT/GT) ESCs using a murine stem cell virus (MSCV) retroviral vector effectively suppresses the level of MERVL and the MERVL proximal gene zfp352, but causes no alteration in IAP. Error bars: s.d., n=3. All P-values were calculated on a basis of a two-tailed Student’s t-test. * P < 0.05, ** P < 0.01, *** P < 0.001, n.s., not significant.

Page 64: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

59 59

Page 65: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

60 60

Fig. S6. Gata2 is a key target of miR-34a in pluripotent stem cells. A. Consistent with Gata2 being a key target for miR-34a, miR-34a-/- iPSCs exhibit an increase of multiple well-characterized gata2 targets (dab2, cd34 and gata1) in our real-time PCR analyses. Three independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines were compared. Error bars, s.d., n=3. B. The expression level of characterized gata2 targets is dependent on gata2 in miR-34a-/- iPSCs. Two independent gata2 shRNAs are able to effectively suppress dab2, cd34 and gata1 in miR-34a-/- iPSCs. Two independent pairs of passage- and littermate-controlled wild-type and miR-34a-/- iPSC lines were compared by real-time PCR analyses. Error bars: s.d., n=3. All P-values were calculated on a basis of a two-tailed Student’s t-test. * P < 0.05, ** P < 0.01, *** P < 0.001.

Page 66: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

61 61

Page 67: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

62 62

Page 68: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

63 63

Page 69: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

64 64

Page 70: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

65 65

Page 71: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

66 66

Page 72: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

67 67

Page 73: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

68 68

Page 74: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

69 69

Page 75: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

70 70

Page 76: Deficiency of microRNA miR-34a expands cell fate potential ... · viral matrix (MA), capsid (CA) and nucleoproteins (NC). The pro gene encodes the protease, which cleaves the gag

71 71