research article open access analysis of nucleosome positioning determined by dna ... ·...

12
RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA helix curvature in the human genome Hongde Liu, Xueye Duan, Shuangxin Yu, Xiao Sun * Abstract Background: Nucleosome positioning has an important role in gene regulation. However, dynamic positioning in vivo casts doubt on the reliability of predictions based on DNA sequence characteristics. What role does sequence-dependent positioning play? In this paper, using a curvature profile model, nucleosomes are predicted in the human genome and patterns of nucleosomes near some key sites are investigated. Results: Curvature profiling revealed that in the vicinity of a transcription start site, there is also a nucleosome-free region. Near transcription factor binding sites, curvature profiling showed a trough, indicating nucleosome depletion. The trough of the curvature profile corresponds well to the high binding scores of transcription factors. Moreover, our analysis suggests that nucleosome positioning has a selective protection role. Target sites of miRNAs are occupied by nucleosomes, while single nucleotide polymorphism sites are depleted of nucleosomes. Conclusions: The results indicate that DNA sequences play an important role in nucleosome positioning, and the positioning is important not only in gene regulation, but also in genetic variation and miRNA functions. Background Nucleosome positioning refers to the position of a DNA helix with respect to the histone core [1]. Positioning has important roles in gene regulation, because packing DNA into nucleosomes can limit the accessibility of the sequences [2-4]. High-resolution genome-wide nucleo- some maps are now available for the genomes of yeast, worms, flies, and humans [2,5-7]. Studies of these nucleosome position datasets have revealed some inter- esting characteristics, especially for promoter sequences. A typical nucleosome-free region (NFR) is near the tran- scription start site (TSS) and is followed by a well- positioned nucleosome [4]. Low nucleosome occupancy is a significant feature of a functional transcription fac- tor binding site (TFBS) [8]. At the same time, computational predictions using DNA sequence information have also advanced. Since the report of the nucleosome positioning code (an ~10 bp repeating pattern of dinucleotides AA-TT-TA/GC) in yeast [9], some models for predicting nucleosomes have been developed using DNA sequence properties, such as dinucleotide periodicity, and structural information of the DNA helix [5,7,10-13]. The successful predictions suggest that DNA sequences partly encode nucleosomes themselves, although some deviations are observed between the predicted and the experimentally deter- mined positions [4,9]. On further investigation, it was realized that dynamic positioning is a general rule in cells. Dynamic remodelling of one or two nucleosomes was revealed in yeast promoters [14]. Nucleosome reor- ganization of a gene might result from a cell-specific change or a condition-dependent change [15,16]. For cells from the same cell line, the first nucleosome down- stream of the TSS exhibits differential positioning in active and silent genes, and such nucleosome reorganiza- tion can be induced in resting T-cells [2]. Relative posi- tioning was also found to be a general characteristic in Caenorhabditis elegans [6]. Such variations of nucleosome positions in vivo cast doubt on the reliability of predictions based on DNA sequence characteristics [4,6]. Moreover, a recent work in yeast showed that there is no genome code in nucleo- some positioning; even intrinsic histone-DNA interac- tions are not the major determinant [17]. Also, the mechanism by which DNA sequences guide nucleo- somes positions is different between S. pombe and S. cerevisiae [18]. Nucleosome organization at the 3* Correspondence: [email protected] State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China Liu et al. BMC Genomics 2011, 12:72 http://www.biomedcentral.com/1471-2164/12/72 © 2011 Liu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: others

Post on 13-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

RESEARCH ARTICLE Open Access

Analysis of nucleosome positioning determinedby DNA helix curvature in the human genomeHongde Liu, Xueye Duan, Shuangxin Yu, Xiao Sun*

Abstract

Background: Nucleosome positioning has an important role in gene regulation. However, dynamic positioningin vivo casts doubt on the reliability of predictions based on DNA sequence characteristics. What role doessequence-dependent positioning play? In this paper, using a curvature profile model, nucleosomes are predicted inthe human genome and patterns of nucleosomes near some key sites are investigated.

Results: Curvature profiling revealed that in the vicinity of a transcription start site, there is also a nucleosome-freeregion. Near transcription factor binding sites, curvature profiling showed a trough, indicating nucleosomedepletion. The trough of the curvature profile corresponds well to the high binding scores of transcription factors.Moreover, our analysis suggests that nucleosome positioning has a selective protection role. Target sites of miRNAsare occupied by nucleosomes, while single nucleotide polymorphism sites are depleted of nucleosomes.

Conclusions: The results indicate that DNA sequences play an important role in nucleosome positioning, and thepositioning is important not only in gene regulation, but also in genetic variation and miRNA functions.

BackgroundNucleosome positioning refers to the position of a DNAhelix with respect to the histone core [1]. Positioninghas important roles in gene regulation, because packingDNA into nucleosomes can limit the accessibility of thesequences [2-4]. High-resolution genome-wide nucleo-some maps are now available for the genomes of yeast,worms, flies, and humans [2,5-7]. Studies of thesenucleosome position datasets have revealed some inter-esting characteristics, especially for promoter sequences.A typical nucleosome-free region (NFR) is near the tran-scription start site (TSS) and is followed by a well-positioned nucleosome [4]. Low nucleosome occupancyis a significant feature of a functional transcription fac-tor binding site (TFBS) [8].At the same time, computational predictions using

DNA sequence information have also advanced. Sincethe report of the nucleosome positioning code (an ~10bp repeating pattern of dinucleotides AA-TT-TA/GC) inyeast [9], some models for predicting nucleosomes havebeen developed using DNA sequence properties, such asdinucleotide periodicity, and structural information of

the DNA helix [5,7,10-13]. The successful predictionssuggest that DNA sequences partly encode nucleosomesthemselves, although some deviations are observedbetween the predicted and the experimentally deter-mined positions [4,9]. On further investigation, it wasrealized that dynamic positioning is a general rule incells. Dynamic remodelling of one or two nucleosomeswas revealed in yeast promoters [14]. Nucleosome reor-ganization of a gene might result from a cell-specificchange or a condition-dependent change [15,16]. Forcells from the same cell line, the first nucleosome down-stream of the TSS exhibits differential positioning inactive and silent genes, and such nucleosome reorganiza-tion can be induced in resting T-cells [2]. Relative posi-tioning was also found to be a general characteristic inCaenorhabditis elegans [6].Such variations of nucleosome positions in vivo cast

doubt on the reliability of predictions based on DNAsequence characteristics [4,6]. Moreover, a recent workin yeast showed that there is no genome code in nucleo-some positioning; even intrinsic histone-DNA interac-tions are not the major determinant [17]. Also, themechanism by which DNA sequences guide nucleo-somes positions is different between S. pombe andS. cerevisiae [18]. Nucleosome organization at the 3’

* Correspondence: [email protected] Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096,China

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

© 2011 Liu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

Page 2: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

end of genes conforms to the principles of statisticalpositioning [19].However, some factors should be noted. Firstly, the

~10-bp periodicity of dinucleotides AA-TT-TA/GC,which is identified as a positioning code in yeast, is alsofound in C. elegans, flies, and humans [9,20-24]. Sec-ondly, strikingly similar features, including the NFR nearthe TSS, and the uniform spacing of internucleosomesdownstream of the TSS, are observed both in the pre-dicted data and in the experimentally determined data[5,10,11]. Low nucleosome occupancy is encodedaround functional transcription factor binding sites[8,9]). Thirdly, some sequence-dependent models aresuitable for predicting nucleosome positions in multiplegenomes without additional information [7,11]. In addi-tion, the chromatin remodelling complex can establishspecific local chromatin structures by reading out DNAfeatures and targeting nucleosomes to specific positions[25]. All of the above highlight the importance ofsequence preferences in positioning.In this paper, using the curvature profile, a new model

based on the curvature pattern of nucleosomal DNA,nucleosomes positions were predicted for the humangenome. Patterns of nucleosomes near interesting sites,including TSSs, TFBSs, single nucleotide polymorphism(SNP) sites, and target sites of miRNAs, were thoroughlyinvestigated. The results also demonstrated the impor-tant roles of DNA sequences in determining nucleo-somes. Moreover, we revealed that nucleosomes are notonly functional in gene regulation, but also in geneticvariation.

Results and DiscussionPredictions of nucleosome positions and the estimationof sequence-dependenceNucleosomes positions were predicted by recognizing thecurvature pattern of the core DNA helix (see methods).The reconstructed curvature pattern of a nucleosome isshown in Figure 1. To test how representative the patternderived from the crystal structures was, 634 well-positioned(ratio of signal to noise > 100) nucleosome DNA sequences

were collected from Zhao et al.’s experimental dataset [2].An averaged curvature curve of the 634 sequences resem-bles the pattern in shape (Additional file 1: Figure s1), indi-cating the pattern represents a canonical curvature of anucleosomal DNA helix.Using the curvature profile, nucleosomes positions

were predicted for human chromosome 20. A wavelet-based algorithm MSCWT [26] was employed in detect-ing exact dyad positions (Additional file 1: Figure s2). Inthe curvature profile and Kaplan et al.’s predictions, theaveraged centre-to-centre distance of neighbour nucleo-somes is ~190 bp (Additional file 1: Figure s3), close tovalue of 185 bp in the literatures [2,4]. For Zhao et al.’sexperimental dataset, due to the scarcity of the coverage,this value is 245 bp, which actually indicates nucleoso-mal repeats rather than the average distance.A quantitative comparison between the predicted and

the experimentally determined nucleosomes positionswere carried out by measuring the distance betweenrespective dyad positions (see Methods). More than 53%of the experimentally determined nucleosomes werepredicted by curvature profile, with a 40-bp deviation(Figure 2). Using the experimental data as a standard,the curvature profile shows a slightly higher perfor-mance than Kaplan et al.’s model, especially for a lowdeviation (< 25 bp). Importantly, both the curvatureprofile and Kaplan et al.’s model has a much highermatching ratio for the experimental data than randompositioning does. In comparison with the model nu-Score [13] on a 50k-bp sequence, the curvature profileexhibits a comparable performance (Additional file 1:Table s6). Additionally, nucleosomes have a good matchin activated and resting CD4+ T cells (65%, deviation >35 bp) (Figure 2), suggesting that most nucleosomes donot change in either type of cell; only a small number ofnucleosomes, such as the first nucleosome downstreamof a TSS [2], exhibit different position.Figure 3 and Figure s5 (Additional file 1) demonstrate

the predictions for two arbitrarily selected DNAsequences. The positive accuracy of the curvature profileis more than 55%, with a deviation < 30 bp (Additionalfile 1: Table s7). Moreover, the curvature profile exactlylocates most of the TSSs and TFBSs in NFRs. This indi-cates that the curvature profile has a good capacity forpredicting nucleosome positions, especially for key func-tional sites.The curvature profile is based on the curvature charac-

teristics of nucleosomal DNA, and is therefore sequencedependent. Subsequently, the curvature-dependentdegree of nucleosome positioning was estimated usingthe nucleosome occupancy ratio of hexanucleotides inthe whole of human chromosome 20. The correlationbetween occupancy ratio of the predictions and theexperimental data was 0.6123 (Figure 4). It should be

Figure 1 Curvature pattern of a nucleosomal DNA helix.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 2 of 12

Page 3: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

pointed out that the curvature profile in fact reflects anindirect measurement of nucleosome position. Thus, thedependence in this study only reflects the influence ofthe curvature of the DNA helix on nucleosome position-ing. We speculate that DNA sequences might encode adefault arrangement of nucleosomes and that reorganiza-tion of nucleosomes in vivo is based on this defaultarrangement.

Nucleosome distribution in protein-coding promoters andindependent miRNA promotersNucleosome positioning at promoters has been extensivelyinvestigated because of its role in occluding binding sites.A typical NFR locates around a TSS, which allows thebinding sites to be exposed to the pre-initiation complex(PIC) [2,5,9,11,16]. The nucleosomes flanking the NFRalso provide a steric match for the complex.

Figure 2 Performance of the curvature profile for nucleosome position prediction on human chromosome 20. Given a deviation, thematching ratio is a ratio of matched nucleosomes to all experimentally determined nucleosomes. Overlapping ratio is [2(73-deviation)+ deviation]/147, indicating the degree of overlap between the predicted and the experimentally determined nucleosomes. “CP” indicates curvature profile.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 3 of 12

Page 4: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

In this study, the averaged curvature profile (fine blackline in Figure 5A) results in an ~150-bp NFR aroundthe TSS of 3571 protein-coding genes, which is consis-tent with previous reports [2,5,9,11,16]. The NFR is also

revealed by Zhao et al.’s experimental data in activatedCD4+ T cells (grey line in Figure 5A). However, not allTSSs are nucleosome free in vivo. Using Zhao et al.’sexperimental data in the range of -150 bp to 50 bp fromthe TSS, 3571 TSSs were divided into two classes by ak-means clustering method. This resulted in 1080 occu-pied TSSs (class I) and 2491 nucleosome-free TSSs(class II). Both the curvature profile and the experimen-tal data are consistent with nucleosome depletion atclass II TSSs (Figure 5C). Around the class I TSSs, nodistinct positioning signal is observed in the curvatureprofile (Figure 5B), while a positioned nucleosome issuggested by the experiment data. This differencebetween the prediction and the experiment indicatesthat positioning is not completely determined by DNAsequences in vivo. The trough near the TSS in Figure5B is slightly higher than that in Figure 5C, suggestingthat a minority of class I TSSs are occupied by DNAsequence-encoded nucleosomes. As shown above, thenucleosome-free state is the default configuration,partly determined by DNA sequences at a TSS; how-ever, in vivo, due to the function of the remodellingcomplexes, some of TSSs are occupied. This is whythere are differences between the prediction and theexperiment.We computed the dinucleotide distribution near all

the TSSs. Fraction of WW (W = A or T) dinucleotidesdecreased in a broad range near the TSS (Additionalfile 1: Figure s6A), Interestingly, in the range of ~100 bpupstream of the TSS, WW shows a sharp increase(Additional file 1: Figure s6A), also corresponding to anincrease of poly (dA:dT) (Additional file 1: Figure s6B).We inferred that the increased poly (dA:dT) is asso-ciated with the NFR near the TSS, because the poly

Figure 3 Nucleosome predictions for a segment of human DNA sequence (chr13, 90798k-90801k bp). Rows from top to bottom indicateTSS, SNP sites, TFBS, nucleosomes predicted by curvature profile, experimentally determined nucleosomes [2], and nucleosomes by Kaplan et al.’smodel [11], respectively. The filled blocks indicate nucleosomes; the original signals are shown in Additional file 1, Figure s4.

Figure 4 Estimation of curvature-dependent degree ofnucleosome positioning. Black and grey dots represent theoccupancy ratios and the nucleosome-free ratios of 4096 6-mernucleotides, respectively. The estimated correlation coefficient is0.6123; NP, nucleosome positioning; NFR, nucleosome-free region.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 4 of 12

Page 5: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

(dA:dT) disfavors nucleosome formation [17,18]. Theresults suggest that DNA sequence influences nucleo-some positioning.Taking these results together, DNA sequences partly

encode a default NFR around a TSS. Due to the require-ment of gene expression in vivo, some TSSs are occu-pied through chromatin remodelling, while others arestill in NFRs. The positioned nucleosome at a TSS can

block the binding sites of the pre-initiation complex,implying that a TSS-occupied gene should exhibit alower expression level. This implication was verified byexamining mRNA levels using gene expression data inCD4+ T cells [27]. As shown in Figure 5F, the mRNAlevels of TSS-occupied genes (class I) are lower thanthose of nucleosome-free TSS genes (class II). A similarresult is observed using Zhao et al.’s gene expression

Figure 5 Nucleosome organizations near transcription start sites (TSSs) in human chromosome 20. (A), average nucleosome signals for allhuman chromosome 20 promoters aligned by TSS; y-axis represents an averaged scaled signal; fine line and thick grey line represent thecurvature profile and the experimental data, respectively; the setting is the same for B and C; (B), nucleosome signals near class I TSSs (1080nucleosome-occupied TSSs in vivo); (C), nucleosome signals near class II TSSs (2491 nucleosome-free TSSs in vivo); (D) and (E), the expressionlevels of genes in class I and class II; (F), counts of genes in according to their mRNA levels; the genes in class II have higher expression levelsthan those in Class I.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 5 of 12

Page 6: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

data in activated CD4+ T cells (see Additional file 1:Figure s7). The results indicate that curvature-dependentnucleosomes partly determine gene expression level.Subsequently, nucleosome positioning at miRNA pro-

moters was investigated. We predicted nucleosomes ofmiRNA promoters with the curvature profile andKaplan et al.’s model [11]. Surprisingly, both modelsgive a unanimous result (Figure 6). The mean of thedeviations is less than 25 bp. Moreover, the TSSs ofmiRNA promoters were exactly located in NFRs,which is similar to protein-coding promoters. Two well-positioned nucleosomes flank the NFR. The results sug-gest that nucleosome positioning is involved in the regu-lation of protein-coding genes and of miRNA genes.It has been reported that there is a low nucleosome

level near a TFBS [8]. This feature of nucleosome orga-nization was also observed in the curvature profile. InFigure 7A, a trough of nucleosome level is observednear TFBSs, suggesting open chromatin is indispensableto the binding of transcription factors. Due to the lowresolution of the curvature profile, the trough in thecurvature profile is broader than that determined by theexperimental data.As TFBSs are most frequently located in the NFRs of

promoters, the NFR will correspond to the region witha high density of TFBSs (high scanning score of TFBS)in promoters [8,11]. This hypothesis was verified forboth protein-coding promoters and independent miRNA

promoters, by examining the relationship between thecurvature profile and the average binding score profileof 64 human transcription factors (TFs) in the promo-ters. Eight protein-coding promoters and thirteenhuman independent miRNA promoters were used (seemethods). As expected, the trough of the curvature pro-file corresponds well to the high binding scores of TFs,and regions with high levels of positioning signal (indi-cating nucleosome occupancy) have correspondingly lowbinding scores (see Figure 7). The mutually antagonisticrelationship is obvious in the region of -600 bp to 400 bpfor protein-coding promoters (Figure 7B), and -500 bp to100 bp for independent miRNA promoters (Figure 7C).Outside of these regions, the relationship broke down,indicating that the regions mentioned above are impor-tant for TF binding. Despite the small number of genesused, the results indicate that a positioned nucleosomelimits TF binding for both protein-coding promoters andmiRNA promoters. Most importantly, it strongly suggeststhat the DNA sequences affect transcription, not only byproviding special binding sites, but also by influencingnucleosome positioning.Recent findings suggest that the mechanisms by which

DNA sequences affect nucleosome positioning is distinctin some species [6,18,19,27]; and the choice of the peri-odical dinucleotides differs considerably from oneorganism to another [28], indicating the difficulty infinding a universal positioning code. However, it was

Figure 6 Average nucleosome organization in the vicinity of an miRNA TSS. (A), curvature profile; (B), Kaplan et al.’s model prediction [11];ellipses indicate nucleosomes; the numbers below them indicate the dyad position relative to transcription start sites.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 6 of 12

Page 7: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

observed that some important characteristics (such asNFRs) revealed by DNA sequence-based models arecoincident with those observed in vivo [5,7-13]. DNAstructure-related periodicity (~10-bp) is suggested inyeast, fly, worm, and human genomes [9,20-24]. Theseindicate that DNA sequence is one of the contributorsto nucleosome positioning. The detail of how DNAsequence affects nucleosome positioning requires furtherinvestigation. One should be very carefully in using fea-tures derived from one organism to predict nucleosomesin other organisms. Recent studies showed that in RSC-depleted cells, nucleosomes move toward predicted sites[29]. Taken together, we speculate that DNA sequencespartly determine a default pattern of nucleosomespositions, on which nucleosome reorganization is

based. Thus, feature-based models can provide a viewof nucleosome configuration determined by DNAsequences, and assist in finding certain key sites (suchas TSSs) when an experimental dataset is absent.

Genetic variation and nucleosome positioningPatterns of nucleosomes near SNP sites were investigated(see Figure 8). SNP sites are grouped into four classes, sin-gle, insertion/deletion (indels), insertion, and deletion. Nearthe sites of indels, insertions, and deletions, the curvatureprofile shows a large trough (Figure 8B-D), indicating suchmutation events favor linker-DNA. Importantly, a similarresult was also observed using the experimental data.Although the trough near the single SNP sites is not as lowas that near the other types of SNP (Figure 8A),

Figure 7 Patterns of nucleosomes near transcription factor binding sites (TFBSs). (A), patterns of nucleosomes near transcription factorbinding sites (TFBS); (B) and (C), the relationship between nucleosome positioning and the distribution of TFBS, (B) for 13 independent MiRNApromoters [37], (C) for eight protein-coding genes’ promoters (ZBED4, ZNF378, PIK4CA, EWSR1, SMC1L2, NF2, ARFGAP3, KIAA0542) [38]; the boldblack lines represent the distribution of TFBS (the average binding score profile of 64 TFs, see methods), the area with shading are averagedcurvature profile, in which reference lines are cut-off lines. The vertical blocks highlight the correspondences between NFR and the distributionof TFBS.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 7 of 12

Page 8: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

nucleosome depletion is obvious. In the genome of Oryziaslatipes, genetic variation downstream of a TSS occurs withan ~200-bp periodicity, and the insertion/deletion of morethan 1 bp frequently occurs in the linker-DNA region [30],which is consistent with our findings in humans. Moreover,we found that nucleosomes are also depleted near SNPsites in the dog genome (Additional file 1: Figure s8).The above results suggest a link between nucleosome

positioning and genetic variation. Nucleosome position-ing has a protective role for splice sites [31]. Our find-ings show the opposite effect on sites depleted ofnucleosomes. The sites without the nucleosomes areprone to mutations. In fact, a well-positioned nucleo-some is observed at the target sites of miRNAs, protect-ing the key sites (see below).

Nucleosome positioning at target sites of miRNAFigure 9A shows the positioning pattern of nucleosomesin the vicinity of miRNA target sites. A well-positionednucleosome is observed at the target sites in the curva-ture profile. The curve of the experimental data is verysimilar to the curvature profile in the region far fromthe target sites. However, near the centre of the targetsites, the experimental data gives a great valley while thecurvature profile shows a peak. The similarity of twocurves in the region either side of the target sites sug-gests that we cannot simply attribute the opposing pat-terns (the trough in the experimental data and the peakin curvature profile near the centre of target site) to theinaccuracy of the curvature profile. We suspect that

some unknown factors prevent Zhao et al.’s experimentfrom successfully detecting the nucleosomes positionedat the target sites. Nucleosomes have a protective rolefor some special sites, such as splice sites [31]. miRNAsare known to be involved in post-transcriptional regula-tion, by binding to the 3’-UTR region of an mRNAsequence using antisense base pairing and cleaving thetarget mRNA or repressing its translation into protein[32]. Mutations at miRNA target sites will result inrecognition errors for miRNAs. Thus, it is essential toprotect the target sites of miRNAs in the nucleolus. Thecurvature profile’s results suggest that miRNA targetsites are protected by nucleosomes.Assuming that the target sites of miRNAs are pro-

tected, the accumulation of genetic variation in targetsites should be at a low level. Thus, we examined the dis-tribution of SNPs in miRNA target sites (see Figure 9B).In range of -15 bp to 10 bp from the target sites, theSNP counts obviously decreases, indicating that the targetsites are conserved in evolution. Taking into considera-tion the nucleosome depletion near SNP sites, it isthought that nucleosomes have a role in protecting targetsites of miRNA.As indicated above, nucleosome positioning protects

some key sites, while allowing other sites to remainopen. This selective protection facilitates both genomeconservation and evolution. The patterns were revealedby curvature-dependent computations (the curvatureprofile). Therefore, genome sequences partly encodenucleosomes, and the latter allow mutations to occur at

Figure 8 Patterns of nucleosomes near SNP sites. (A), single nucleotide variation (single); (B), insertions/deletions (in-del); (C), insertion SNP;(D), deletion SNP.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 8 of 12

Page 9: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

some sites and protect other special sites. Thus, nucleo-some positioning has roles in genetic variation.

ConclusionsIn this study, human nucleosome positions were pre-dicted with a curvature-dependent model; the curvatureprofile. The results indicate that the importance of DNAsequences in determining nucleosome positions, provid-ing a default pattern. DNA sequence partly encodes anNFR near a TSS. In vivo, the TSSs of some genes areoccupied by nucleosomes, and changes to nucleosomepositioning will affect the genes’ expression levels. Inpromoters, nucleosomes are depleted near TFBSs, andthe distribution of TFBSs corresponds well with theNFRs. Moreover, a selective protection role of nucleo-somes was revealed. SNP sites are enriched in NFRs,and miRNA target sites are associated with well-positioned nucleosomes. Our results indicate the vitalrole of the DNA sequence in encoding nucleosomes,and that the functions of nucleosome positioning areprobably involved in further biological processes.

MethodsThe prediction modelIn our previous work [23], we found that WW (W = A orT) dinucleotides of core DNA sequences showed smallerspacing (~10.3 bp) at the two ends (~50 bp) of a nucleo-some, with larger (~11.1 bp) spacing in the middle section(~47 bp). In fact, this is consistent with the cutting peri-odicities of core DNA [1]. Correspondingly, core DNAhelices showed greater bending at the two ends, withsmaller curvatures in the middle section. Using these find-ings, we constructed two nucleosome prediction models,

the periodicity profile and the curvature profile [23]. Theperiodicity profile has good resolution; however, due towavelet transformation, it is time-consuming. The curva-ture profile is highly efficient in computation, but overlap-ping peaks and significant noise decrease its resolutionand hinder the recognition of nucleosomes.To improve the resolution of the curvature profile, the

curvature pattern was reconstructed in this paper. Eigh-teen human nucleosomal DNA sequences (146 bp) wereextracted from the crystal structure dataset of thenucleosomal DNA and histone proteins (Additional file1: Table s1). The DNA curvatures of the nucleosomalsequences were estimated using the curvature vector C(eq.1), which is calculated with a matrix of roll r andtilt τ angles (Additional file 1: Table s2), obtained forthe sixteen dinucleotide steps (eq.1) [33].

C n n iij

j j

j n

n

= −( ) −( ) ⎛⎝⎜

⎞⎠⎟

=∑

02 1

10

2

1

2

exp (1)

where υ0 is the double-helix average periodicity(10.4 bp). The numbers (n2-n1) represent the integrationsteps. The modulus of the vector represents the devia-tion from B-DNA. The results generated a uniformpattern of curvature for nucleosomal sequences (seeFigure 1). The pattern is called the curvature pattern ofnucleosomal DNA.Given a DNA sequence, the curvature value is calcu-

lated at each position using eq.1. The whole curvatureof the sequence is called the curvature curve. Thenucleosome positions can be predicted from the convo-lution of the curvature curve and the curvature pattern

Figure 9 Nucleosome positioning at miRNA target sites. (A), patterns of nucleosomes near target sites of miRNAs; (B), distribution of SNPs inthe vicinity of target sites of miRNAs in human chromosome 20.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 9 of 12

Page 10: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

signal (Figure 1). If a segment of the curve resemblesthe pattern signal, the convolution will peak at the cor-responding position, indicating a nucleosome. The con-voluted curvature curve is called the curvature profile(Additional file 1: section 1).

Analysis of nucleosome positioning in the humangenomeHuman genomic DNA sequences were retrieved fromNCBI (http://www.ncbi.nlm.nih.gov/) (build 36.1), andnucleosomes were mapped using the curvature profile.To evaluate the predictions, the experimentally deter-mined nucleosomes dataset of activated and resting CD4+

T-cells was downloaded from Zhao’s website (http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/hgtcellnucleosomes.aspx) [2]. The dataset was used as the “standard” datasetfor comparison and is called “the experimental data” inthis paper. Zhao et al.’s dataset contains two columns foreach chromosome, the first column indicates the genomicpositions and the second shows scores of nucleosomes.A method based on wavelet transformation was used todetect the exact nucleosome positions from the series ofscores (see below). Kaplan et al.’s predicted nucleosomedataset was downloaded from website (http://genie.weiz-mann.ac.il/software/nucleo_genomes.html) [11]. Compari-sons between curvature profiles and Kaplan et al.’s modelwere carried out on human chromosome 20.In the experimentally determined datasets, Kaplan

et al.’s predictions, and the curvature profile, the datasetare a series of numbers. Thus, it is important to identifypeaks positions to perform a quantitative comparison.Here, the maximal spectrum of continuous wavelet trans-formation (MSCWT) [26] (Additional file 1: section 2)was used to detect the dyad positions of nucleosomesfrom the raw signals of the curvature profiles and theexperimentally determined nucleosomes dataset. Anucleosomal DNA was defined in a 147-bp DNA byextending 73 bp in both directions (5’ and 3’) from eachof the dyad positions. The dataset of the dyad positions isavailable on our website.The dyad positions determined by the curvature pro-

file were compared with that in Zhao et al.’s experimen-tal data by measuring the distance between respectivedyad positions. Given a deviation, the amount ofmatched nucleosomes was counted, and a matchingratio was estimated by dividing the amount of matchednucleosomes by the total number of experimentallydetermined nucleosomes. The deviation varied from 1bp to 60 bp. The comparison was performed for humanchromosome 20. To explore the dynamic nucleosomepositioning, the experimental datasets from both acti-vated and resting CD4+ T cells were used. Additionally,the performance of Kaplan et al.’s predictions waspresented.

Estimation of the sequence-dependence of nucleosomepositioningThe occupancy ratio for each 6-mer nucleotide wascomputed by dividing the counts of nucleosome-occu-pancy by the counts of the hexanucleotides in humanchromosome 20. The procedure was performed for boththe predicted nucleosomes and the experimentally deter-mined nucleosomes. The sequence-dependence degreewas then estimated by correlating the two sets of ratios.

Pattern of nucleosomes near special sitesMore attention was paid to nucleosomes around specialsites within the genome. Upstream and downstreamsequences of TSSs, TFBSs [34], SNPs [35] and target sitesof miRNAs [36] were extracted from UCSC (http://gen-ome.ucsc.edu/). Information on miRNA promoters wasextracted from the literature [37]. Details of thesesequences are listed in Tables s3 and s4 (Additionalfile 1). The sequences were aligned by their special sites,and the nucleosome patterns were represented by aver-aged curvature profiles. To examine the effect of nucleo-somes on gene expression in vivo, protein-coding TSSswere separated into two classes by a k-means clusteringmethod using Zhao et al.’s experimental data [2] in arange of 150 bp upstream to 50 bp downstream of theTSS in activated CD4+ T cell. One class (class I) con-tained TSSs that are occupied by nucleosomes; the other(class II) contained nucleosome-free TSSs. A dataset ofmRNA levels [27] was used to check the effect of nucleo-somes on gene expression (Additional file 1: section 3).To test whether nucleosomes limit the binding of

transcription factors, the distribution of TFBSs wascomputed for both protein-coding promoters (ZBED4,ZNF378, PIK4CA, EWSR1, SMC1L2, NF2, ARFGAP3,and KIAA0542) [38] and independent miRNA promo-ters (bold and italic in Additional file 1, Table s4) [37].We scanned the binding scores of 64 human TFs (Addi-tional file 1: Table s5) on the promoter sequences. EachTF had a unique position weight matrix (PWM); scan-ning with PWM on a promoter sequence resulted in abinding score profile, which indicated the potentialbinding regions of the TF on the sequence. The distri-bution of TFBSs was represented by the average bindingscore profile of 64 TFs on all promoters. PWMs wereobtained using JASPAR [39].An online-prediction tool for the curvature profile is

provided (http://www.gri.seu.edu.cn/icons).

Additional material

Additional file 1: Section 1 - The prediction model. Section 2 -Detection of peak positions in curvature profiles, Zhao et al.’s dataset andKaplan et al.’s predictions. Section 3 - Examining the expression level ofTSS-occupied genes and TSS nucleosome-free genes. Table s1 - Eighteen

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 10 of 12

Page 11: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

crystal structure datasets of the DNA-histone proteins used to reconstructthe curvature characteristic. Table s2 - Values of roll r and tilt τ angles ofsixteen dinucleotide steps. Table s3 - Details on sequences aroundtranscription start sites (TSSs), single-nucleotide polymorphism (SNP) sites,target sites of miRNAs, start and stop codons, and boundaries of histonemodifications. Table s4 - Three types of miRNAs in humans. Table s5 - Sixty-four human transcription factors used in scanning. Table s6 - Comparison ofperformances of the curvature profile and nu-Score. Table s7 - Predictionperformance of the curvature profiles in Figure 3 and s5. Table s8 - Top 206-mer nucleotides that are favorable for nucleosomes and nucleosome-freeregions. Figure s1 - Curvature pattern derived from 634 well-positionednucleosome DNA sequences in the experimental dataset. Figure s2 -Identification of nucleosome dyad positions in DNA sequences from 8 k bpto 28 k bp of human chromosome 20. Figure s3 - Distribution of centre-to-centre distance of nucleosomes. Figure s4 - Predictions of nucleosomes forthe segment from 90798 k bp to 90801 k bp of human chromosome 13.Figure s5 - Predictions of nucleosomes for a segment of humanchromosome 17 (52269 k-52289 k bp). Figure s6 - (A) Faction distributionsof WW (W = A ot T) dinucleotides and SS (S = G or C) dinucleotides near3571 transcription start sites; (B), fraction of poly (dA) and poly (dT); (C),fraction of poly (dG) and poly (dC). Figure s7-Gene expression levels (mRNAlevels) for the occupied-TSS genes (class I) and the open-TSS (class II) inactivated CD4+ T cells, the gene expression data is from Zhao et al’sexperiment (GEO accession number, GSE10437). Figure s8- Patterns ofnucleosomes near SNP sites in the dog genome.

AcknowledgementsWe thank the anonymous reviewers for their suggestions to improve the article’squality. This work was supported by the Natural Science Foundation of China(30800209,61073141), the Jiangsu Natural Science Foundation (SBK201022749),and the Leading Science Project of Southeast University of China.

Authors’ contributionsHDL: model construction, data analysis, and paper preparation; SXY: webserver construction; XYD: language revision and data analysis; XS: analysis ofresults. All authors have read and approved the final manuscript.

Received: 8 June 2010 Accepted: 27 January 2011Published: 27 January 2011

References1. Lewin B: Gene VIII. Prentice Hall 2004, Chap. 20.2. Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang ZB, Wei G, Zhao KJ:

Dynamic Regulation of Nucleosome Positioning in the Human Genome.Cell 2008, 132:887-898.

3. Henikoff S: Nucleosome destabilization in the epigenetic regulation ofgene expression. Nat Rev Genet 2008, 9(1):15-26.

4. Jiang C, Pugh BF: Nucleosome positioning and gene regulation:advances through genomics. Nat Rev Genet 2009, 10(3):161-72.

5. Yuan GC, Liu JS: Genomic sequence is highly predictive of localnucleosome depletion. PLoS Comput Biol 2008, 4:e13.

6. Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, Zeng K,Malek JA, Costa G, McKernan K, Sidow A, Fire A, Johnson SM: A high-resolution, nucleosome position map of C elegans reveals a lack ofuniversal sequence-dictated positioning. Genome Res 2008,18(7):1051-1063.

7. Miele V, Vaillant C, d’Aubenton-Carafa Y, Thermes C, Grange T: DNAphysical properties determine nucleosome occupancy from yeast to fly.Nucl Acids Res 2008, 36:3746-3756.

8. Daenen F, Roy F, De Bleser J: Low nucleosome occupancy is encodedaround functional human transcription factor binding sites. BMCGenomics 2008, 9:332-330.

9. Segal E, Mittendorf YF, Chen L, Thåström A, Field Y, Moore IK, Wang JZ,Widom J: A genomic code for nucleosome positioning. Nature 2006,442(17):772-778.

10. Ioshikhes I, Albert I, Zanton SJ, Pugh BF: Nucleosome positions predictedthrough comparative genomics. Nat Genet 2006, 38(10):1210-1215.

11. Kaplan N, Moore IK, Mittendorf YF, Gossett AJ, Tillo D, Field Y, LeProust EM,Hughes TR, Lieb JD, Widom J, Segal E: The DNA-encoded nucleosomeorganization of a eukaryotic genome. Nature 2009, 458:362-366.

12. Morozov AV, Fortney K, Gaykalova AD, Studitsky MV, Widom J, Siggia ED:Using DNA mechanics to predict in vitro nucleosome positions andformation energies. Nucl Acids Res 2009, 37(14):4707-4722.

13. Tolstorukov MY, Choudhary V, Olson WK, Zhurkin VB, Park PJ: nuScore: aweb-interface for nucleosome positioning predictions. Bioinformatics2008, 24:1456-1458.

14. Shivaswamy S, Bhinge A, Zhao Y, Jones S, Hirst M, Iyer VR: Dynamicremodeling of individual nucleosomes across a eukaryotic genome inresponse to transcriptional perturbation. PLoS Biol 2008, 6(3):e65.

15. Steven JP, Lis JT: Rapid, Transcription-Independent Loss of Nucleosomesover a Large Chromatin Domain at Hsp70 Loci. Cell 2008, 134(1):74-84.

16. Ozsolak F, Song JS, Liu XS, Fisher DE: High-throughput mapping of thechromatin structure of human promoters. Nat Biotech 2007, 25(2):244-248.

17. Zhang Y, Moqtaderi Z, Rattner PB, Euskirchen G, Snyder M, Kadonaga TJ,Liu XS, Struhl K: Intrinsic histone-DNA interactions are not the majordeterminant of nucleosome positions in vivo. Nat struct mol biol 2009,16(8):847-853.

18. Lantermann AB, Straub T, Strålfors A, Yuan GC, Ekwall K, Korber P:Schizosaccharomyces pombe genome-wide nucleosome mappingreveals positioning mechanisms distinct from those of Saccharomycescerevisiae. Nat Struct Mol Biol 2010, 17:251-257.

19. Mavrich TN, Ioshikhes IP, Venters BJ, Jiang CZ, Tomsho LP, Qi J, Schuster SC,Albert I, Pugh BF: A barrier nucleosome model for statistical positioningof nucleosomes throughout the yeast genome. Genome Res 2008,18(7):1073-83.

20. Herael H, Weiss O, Trifonov EN: 10-11bp periodicities in completegenomes reflect protein structure and DNA folding. Bioinformatics 1999,15(3):187-193.

21. Schieg P, Herzel H: Periodicities of 10-11bp as indicators of thesupercoiled state of genomic DNA. J Mol Biol 2004, 234:891-901.

22. Fukushima A, Ikemura T, Kinouchi M, Oshima T, Kudo Y, Mori H, Kanaya S:Periodicity in prokaryotic and eukaryotic genomes identified by powerspectrum analysis. Gene 2002, 300:203-211.

23. Liu HD, Wu JS, Xie JM, Yang XN, Lu ZH, Sun X: Characteristics ofnucleosome core DNA and their applications in predicting nucleosomepositions. Biophys J 2008, 94(12):4597-4604.

24. Salih F, Salih B, Trifonov EN: Sequence structure of hidden 104-baserepeat in the nucleosomes of C elegans. J Biomol Struct Dyn 2008,26:273-282.

25. Rippe K, Schrader A, Riede P, Strohner R, Lehmann E, Längst G: DNAsequence- and conformation-directed positioning of nucleosomes bychromatin-remodeling complexes. Proc Natl Acad Sci USA 2007,104(40):15635-15640.

26. Lu XQ, Liu HD, Xue ZH, Wang XQ: Maximum Spectrum of ContinuousWavelet Transform and Its Application in Resolving an OverlappedSignal. J Chem Inf Comput Sci 2004, 44:1228-1237.

27. Andrew IS, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J,Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB:A gene atlas of the mouse and human protein-encoding transcriptomes.Proc Natl Acad Sci USA 2004, 101(16):6062-6067.

28. Bettecken T, Trifonov EN: Repertoires of the Nucleosome-PositioningDinucleotides. PLoS ONE 2009, 4(11):e7654.

29. Hartley PD, Madhani HD: Mechanisms that Specify Promoter NucleosomeLocation and Identity. Cell 2009, 137:445-458.

30. Sasaki S, Mello CC, Shimada A, Nakatani Y, Hashimoto S, Ogawa M,Matsushima K, Gu SGP, Kasahara M, Ahsan B, Sasaki A, Saito T, Suzuki Y,Sugano S, Kohara Y, Takeda H, Fire A, Morishita S: Chromatin-AssociatedPeriodicity in Genetic Variation Downstream of Transcriptional StartSites. Science 2009, 323(5912):401-404.

31. Kogan S, Trifonov EN: Gene splice sites correlate with nucleosomepositions. Gene 2005, 6(352):57-62.

32. Pasquinelli AE, Hunter S, Bracht J: MicroRNAs: A developing story. CurrOpin Genet Dev 2005, 15:200-205.

33. Cacchione S, Santis PD, Foti D, Palleschi A, Savino M: Periodicalpolydeoxynucleotides and DNA curvature. Biochemistry 1989,28:8706-8713.

34. Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of invivo protein-DNA interactions. Science 2007, 316(5830):1497-502.

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 11 of 12

Page 12: RESEARCH ARTICLE Open Access Analysis of nucleosome positioning determined by DNA ... · 2017-04-05 · scription start site (TSS) and is followed by a well-positioned nucleosome

35. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K:dbSNP: the NCBI database of genetic variation. Nucl Acids Res 2001,29(1):308-11.

36. Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked byadenosines, indicates that thousands of human genes are microRNAtargets. Cell 2005, 4(1):15-20, 120.

37. Ozsolak F, Poling LL, Wang ZX, Liu H, Liu XS, Roeder RG, Zhang XM,Song SJ, Fisher DE: Chromatin structure analyses identify miRNApromoters. Genes Develop 2008, 22:3172-3183.

38. Rebecca M, Euskirchen G, Bertone P, Stephen H, Thomas ER, Luscombe NM,Rinn JL, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M: Distributionof NF-κB-binding sites across human chromosome 22. Proc Natl Acad SciUSA 2003, 100(21):12247-12252.

39. Wasserman WW, Sandelin A: Applied bioinformatics for the identificationof regulatory elements. Nat Rev Genet 2004, 5(4):276-87.

doi:10.1186/1471-2164-12-72Cite this article as: Liu et al.: Analysis of nucleosome positioningdetermined by DNA helix curvature in the human genome. BMCGenomics 2011 12:72.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Liu et al. BMC Genomics 2011, 12:72http://www.biomedcentral.com/1471-2164/12/72

Page 12 of 12