histone-related genes are hypermethylated in lung cancer ......ples, 20 liver cancer and 23...

13
Genome and Epigenome Histone-Related Genes Are Hypermethylated in Lung Cancer and Hypermethylated HIST1H4F Could Serve as a Pan-Cancer Biomarker Shihua Dong 1 , Wei Li 1 , Lin Wang 2 , Jie Hu 3 ,Yuanlin Song 3 , Baolong Zhang 1 , Xiaoguang Ren 1 , Shimeng Ji 3 , Jin Li 1 , Peng Xu 1 , Ying Liang 1 , Gang Chen 4 , Jia-Tao Lou 2 , and Wenqiang Yu 1 Abstract Lung cancer is the leading cause of cancer-related deaths worldwide. Cytologic examination is the current "gold stan- dard" for lung cancer diagnosis, however, this has low sensi- tivity. Here, we identied a typical methylation signature of histone genes in lung cancer by whole-genome DNA methyl- ation analysis, which was validated by The Cancer Genome Atlas (TCGA) lung cancer cohort (n ¼ 907) and was further conrmed in 265 bronchoalveolar lavage uid samples with specicity and sensitivity of 96.7% and 87.0%, respectively. More importantly, HIST1H4F was universally hypermethy- lated in all 17 tumor types from TCGA datasets (n ¼ 7,344), which was further validated in nine different types of cancer (n ¼ 243). These results demonstrate that HIST1H4F can function as a universal-cancer-only methylation (UCOM) marker, which may aid in understanding general tumorigen- esis and improve screening for early cancer diagnosis. Signicance: These ndings identify a new biomarker for cancer detection and show that hypermethylation of histone- related genes seems to persist across cancers. Introduction Lung cancer is one of the most common malignant tumors and the leading cause of cancer-related deaths worldwide (1, 2). Early detection and surgery offer the best chance for survival, with the 5-year survival rate as high as 80% (3). However, most patients with lung cancer have been diagnosed with inoperable advanced stage with metastasis, and patients must undergo chemotherapy, radiotherapy, immunotherapy, or targeted therapy. The 5-year survival rate of patients in the advanced stage is below 10% (4, 5). Over the past decade, low-dose CT (LDCT) is the most commonly used screening method for lung cancer, which has been shown to improve early detection and reduce mortality (6). However, due to its low specicity, LDCT is far from satisfactory as a screening tool for clinical application, similar to other currently used cancer biomarkers, such as carcinoembryonic antigen (CEA), neuron- specic enolase, CYFRA 21-1, etc. Therefore, effective biomarkers for early detection, diagnosis, prognosis, and monitoring of lung cancer are urgently needed (7). Epigenetic and genetic abnormalities are hallmarks of lung cancer (810). Abnormal DNA methylation is the most common epigenetic variation in the process of lung cancer. Compared with DNA mutations, DNA methylation occurs much earlier and is more stable in the early diagnosis of tumors, and aberrant DNA methylation pattern can be used for predicting the liver cancer metastasis to lung (11). Although many DNA methylation biomarkers have been reported, they are still under the exploration process and rarely used in clinical applications. Sensitivity and specicity of current methylation markers are insufcient with high false positives and false negatives risk (12, 13). Therefore, applying methylation mar- kers to clinical applications is challenging, and searching for new biomarkers for the early detection of cancer is urgently needed (14). Histones are major essential components of chromatin and conserved in eukaryotic cells (15). There are ve major types of histones: H1, H2A, H2B, H3, and H4. Histones H2A, H2B, H3, and H4 are known as the core histones, whereas histone H1 is known as the linker histone (16). Histones are divided into canonical replication-dependent histones that are expressed dur- ing the S-phase of the cell cycle and replication-independent histone variants, which are expressed during each phase of the cell cycle. Genes encoding canonical histones are intron-less and lack a polyA tail at the 3 0 end, having instead a stem-loop structure, and canonical histone genes also tend to be clustered in the genome. Genes encoding histone variants are usually not clustered and have introns and polyA tails (17, 18). In the human 1 Shanghai Public Health Clinical Center and Department of General Surgery, Huashan Hospital, Cancer Metastasis Institute and Laboratory of RNA Epige- netics, Institutes of Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai, China. 2 Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China. 3 Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University, Shanghai, China. 4 Department of Pathology, Zhongshan Hospital, Fudan University, Shanghai, China. Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). S. Dong, W. Li, L. Wang, J. Hu, and Y. Song contributed equally to this article. Corresponding Authors: Wenqiang Yu, Fudan University, 130 Dong'an Road, West 13# Building, Room 419, Shanghai 200032, China. Phone: 8621-5423-7978; Fax: 8621-5423-7339; E-mail: [email protected]; and Jia-Tao Lou, Department of Laboratory Medicine, Shanghai Chest Hospital, 241 West Huaihai Road, Shanghai 200030, China. Phone: 86212-22000-01503; Fax: 8621-6280- 8279; E-mail: [email protected] Cancer Res 2019;79:610112 doi: 10.1158/0008-5472.CAN-19-1019 Ó2019 American Association for Cancer Research. Cancer Research www.aacrjournals.org 6101 on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Upload: others

Post on 24-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

Genome and Epigenome

Histone-Related Genes Are Hypermethylated inLung Cancer and Hypermethylated HIST1H4FCould Serve as a Pan-Cancer BiomarkerShihuaDong1,Wei Li1, LinWang2, Jie Hu3,Yuanlin Song3, BaolongZhang1, XiaoguangRen1,Shimeng Ji3, Jin Li1, Peng Xu1, Ying Liang1, Gang Chen4, Jia-Tao Lou2, and Wenqiang Yu1

Abstract

Lung cancer is the leading cause of cancer-related deathsworldwide. Cytologic examination is the current "gold stan-dard" for lung cancer diagnosis, however, this has low sensi-tivity. Here, we identified a typical methylation signature ofhistone genes in lung cancer by whole-genome DNA methyl-ation analysis, which was validated by The Cancer GenomeAtlas (TCGA) lung cancer cohort (n ¼ 907) and was furtherconfirmed in 265 bronchoalveolar lavage fluid samples withspecificity and sensitivity of 96.7% and 87.0%, respectively.More importantly, HIST1H4F was universally hypermethy-

lated in all 17 tumor types from TCGA datasets (n ¼ 7,344),which was further validated in nine different types of cancer(n ¼ 243). These results demonstrate that HIST1H4F canfunction as a universal-cancer-only methylation (UCOM)marker, which may aid in understanding general tumorigen-esis and improve screening for early cancer diagnosis.

Significance: These findings identify a new biomarker forcancer detection and show that hypermethylation of histone-related genes seems to persist across cancers.

IntroductionLung cancer is one of themost commonmalignant tumors and

the leading cause of cancer-related deaths worldwide (1, 2). Earlydetection and surgery offer the best chance for survival, with the5-year survival rate as high as 80% (3). However, most patientswith lung cancer have been diagnosed with inoperable advancedstage with metastasis, and patients must undergo chemotherapy,radiotherapy, immunotherapy, or targeted therapy. The 5-yearsurvival rate of patients in the advanced stage is below 10% (4, 5).Over the past decade, low-dose CT (LDCT) is themost commonlyused screening method for lung cancer, which has been shown toimprove early detection and reduce mortality (6). However, due

to its low specificity, LDCT is far from satisfactory as a screeningtool for clinical application, similar to other currently used cancerbiomarkers, such as carcinoembryonic antigen (CEA), neuron-specific enolase, CYFRA 21-1, etc. Therefore, effective biomarkersfor early detection, diagnosis, prognosis, and monitoring of lungcancer are urgently needed (7).

Epigenetic and genetic abnormalities are hallmarks oflung cancer (8–10). Abnormal DNA methylation is the mostcommon epigenetic variation in the process of lung cancer.Compared with DNA mutations, DNA methylation occursmuch earlier and is more stable in the early diagnosis oftumors, and aberrant DNA methylation pattern can be usedfor predicting the liver cancer metastasis to lung (11). Althoughmany DNA methylation biomarkers have been reported, theyare still under the exploration process and rarely used in clinicalapplications. Sensitivity and specificity of current methylationmarkers are insufficient with high false positives and falsenegatives risk (12, 13). Therefore, applying methylation mar-kers to clinical applications is challenging, and searching fornew biomarkers for the early detection of cancer is urgentlyneeded (14).

Histones are major essential components of chromatin andconserved in eukaryotic cells (15). There are five major types ofhistones: H1, H2A, H2B, H3, and H4. Histones H2A, H2B, H3,and H4 are known as the core histones, whereas histone H1 isknown as the linker histone (16). Histones are divided intocanonical replication-dependent histones that are expressed dur-ing the S-phase of the cell cycle and replication-independenthistone variants, which are expressed during each phase of thecell cycle. Genes encoding canonical histones are intron-less andlack a polyA tail at the 30 end, having instead a stem-loopstructure, and canonical histone genes also tend to be clusteredin the genome. Genes encoding histone variants are usually notclustered and have introns and polyA tails (17, 18). In the human

1Shanghai Public Health Clinical Center and Department of General Surgery,Huashan Hospital, Cancer Metastasis Institute and Laboratory of RNA Epige-netics, Institutes of Biomedical Sciences, Shanghai Medical College, FudanUniversity, Shanghai, China. 2Department of Laboratory Medicine, ShanghaiChest Hospital, Shanghai Jiao Tong University, Shanghai, China. 3Department ofPulmonary Medicine, Zhongshan Hospital, Fudan University, Shanghai, China.4Department of Pathology, Zhongshan Hospital, Fudan University, Shanghai,China.

Note: Supplementary data for this article are available at Cancer ResearchOnline (http://cancerres.aacrjournals.org/).

S. Dong, W. Li, L. Wang, J. Hu, and Y. Song contributed equally to this article.

Corresponding Authors: Wenqiang Yu, Fudan University, 130 Dong'an Road,West 13# Building, Room419, Shanghai 200032, China. Phone: 8621-5423-7978;Fax: 8621-5423-7339; E-mail: [email protected]; and Jia-Tao Lou,Department of Laboratory Medicine, Shanghai Chest Hospital, 241 West HuaihaiRoad, Shanghai 200030, China. Phone: 86212-22000-01503; Fax: 8621-6280-8279; E-mail: [email protected]

Cancer Res 2019;79:6101–12

doi: 10.1158/0008-5472.CAN-19-1019

�2019 American Association for Cancer Research.

CancerResearch

www.aacrjournals.org 6101

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 2: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

genome, histone genes mainly form histone cluster 1 (Chr6p21)and histone cluster 2 (Chr1q21; ref. 19). Other histone genes aredistributed randomly in the human genome. Although histonemodifications have been extensively studied in chromatin regu-lation, epigenetic variation in the family of histone genes them-selves is rarely considered. It has been shown that histone genecluster 1 is occupied by abnormally higher order chromatinorganization in breast cancer (20). However, DNA methylationalteration in histone genes' loci has not yet been systematicallyinvestigated, especially in cancer development.

Here, through genome-wide DNA methylation analysiswith an unusual strategy, we found that many histone geneloci are abnormally hypermethylated in lung cancer, whichpiqued our interest for further investigation. We demonstratethat methylation of histone genes can be used as a biomarkerfor early detection in bronchoalveolar lavage fluid (BALF)samples. Furthermore, histone gene loci are not only abnor-mally hypermethylated in lung cancer but also specificallymethylated in various tumors. In particular, the HIST1H4Fgene is abnormally hypermethylated in 17 types of cancer,which could act as a potential universal-cancer-only methyla-tion (UCOM) marker. We speculate that the methylation ofHIST1H4F will be of great significance for early diagnosis,especially during the screening process of cancer in clinicalapplications.

Materials and MethodsWhole genome bisulfite sequencing data analysis

Whole genome bisulfite sequencing (WGBS) datasets weredownloaded from the Encode database (https://www.encodeproject.org/) and the SRA database (https://www.ncbi.nlm.nih.gov/sra); the serial numbers are summarized in SupplementaryTable S1. DNA methylation levels were calculated using BSMAPsoftware (21) as described previously (11), where hg19 humangenome assembly and University of California, Santa Cruz(UCSC) reference gene annotations were used. Specifically, foreach CpG site, reads supporting either methylation or unmethy-lation were achieved, and themethylation value was calculated asthe ratio of the number of reads supporting methylation to thesum of the number of reads supporting both methylation andunmethylation. Only CpG sites covered by more than five readsand detected in all the seven WGBS datasets were used forsubsequent analysis.

Differentially methylated sites, differentially methylatedregion, and differentially methylated genes definition

The methylation levels of four normal lung cell samples(NC) and three lung cancer cell samples (CC) were calculated.For each CpG site, we calculated methylation value differencefor all 12 CC – NC pairs (CCi – NCj, where i ¼ 1, 2, or 3 and j ¼1, 2, 3, or 4). CpG sites with all 12 (CCi – NCj) � 50% weredefined as cancer cell-differentially methylated sites (CC-DMS).Similarly, CpG sites with all 12 (NCj – CCi) � 50% weredefined as normal cell-differentially methylated sites (NC-DMS). In addition, CpG sites with all 12 (|CCi – NCj|) �20% were defined as NO-differentially methylated sites (NO-DMS). A differentially methylated region (DMR) was defined asat least three adjacent DMS within 100 bp genomic window.Genes overlapping with any DMR were defined as differentiallymethylated genes (DMG).

The Cancer Genome Atlas DNA methylation data analysisThe Illumina 450Kmethylation array level three data from The

Cancer Genome Atlas (TCGA) database were downloaded fromthe UCSC Xena browser (https://xenabrowser.net/). For eachhistone gene, only probes within the gene-body region (listedin Supplementary Table S2) were selected to calculate an averagemethylation value. Probes with "NA" values were excluded. Theabsolute methylation values were calculated from the b values of450K methylation array [methylation value ¼ (b value þ 0.5) �100%]. For each gene, the final methylation value was calculatedby the average of all CpG sites selected. The samples used fromTCGAdatabase and themethylation levels ofHIST1H4F are listedin Supplementary Table S3.

Clinical samplesWe collected 243 primary tissue samples and 265 BALF

samples from Shanghai Chest Hospital and Zhongshan Hos-pital of Fudan University. Primary tissue samples included 25lung cancer and 25 paired para-cancer control samples, 12colorectal cancer and 12 paired para-cancer control samples,10 esophagus cancer and 12 paired para-cancer control sam-ples, 20 liver cancer and 23 para-cancer control samples, ninepancreatic cancer and nine paired para-cancer control samples,10 cervical cancer and 10 control samples, 10 gastric cancer and10 para-cancer control samples, 14 breast cancer and 14 pairedpara-cancer control samples, and 10 head and neck cancer and10 paired para-cancer control samples. Clinical characters ofthese samples are summarized in Supplementary Table S4.BALF samples contained a benign lung disease (BLD) controlgroup and lung cancer group. BLD control group contained 59samples, including pneumonia, emphysema, tuberculosis, etc.The lung cancer experimental group included 92 lung squa-mous cell carcinoma (LUSC) samples, 70 lung adenocarcinomasamples, and 44 small-cell lung carcinoma (SCLC) samples.BALF samples were randomly assigned to a training set and avalidation set. All patients provided written informed consentbefore their samples were collected. Institutional reviewboard's approval for research on human subjects was obtainedfrom the hospital.

DNA extraction and bisulfite-PCR pyrosequencingGenomic DNA from cultured cell lines and primary tissue

samples was extracted with phenol-chloroform. Genomic DNAfromBALF sampleswas extractedwith theQiagenDNAExtractionKit (Qiagen, catalog no. 51404). Next, 20–200 ng genomic DNAwas taken for bisulfite treatment (ZYMO Research, catalog no.D5006), and the recovered bisulfite-treated DNA was used as thesubsequent PCR template. We detected 11 CpG sites forHIST1H4F gene (chr6:26,240,743–26,240,800) and eight CpGsites for HIST1H4I gene (chr6:27,107,185–27,107,239). Thegenomic sequences and primers designed for target genes arelisted in Supplementary Table S5. Two rounds of seminested PCRwere performed to produce the single-band biotin-modified PCRproducts. The out forward primer and the reverse primer wereused for the first round of PCR amplification. The inner forwardprimer and the reverse primer were used for the second round ofPCR amplification. The two-round PCR was performed with thesame program: 98�C30 seconds for predenaturation and 98�C10seconds, 58�C 30 seconds, and 72�C 30 seconds for a 30-cycleamplification, 72�C 3 minutes for a final elongation. The pyr-osequencing assay was performed on a PyroMark Q96 ID

Dong et al.

Cancer Res; 79(24) December 15, 2019 Cancer Research6102

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 3: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

Instrument (Qiagen). For each target gene, the average of eachCpG site detected by pyrosequencing matched the final methyl-ation value.

Cell cultureThe human lung cancer cell line A549, human lung fibroblast

cell line MRC5, and human hepatocarcinoma cell line HepG2were kindly provided by Stem Cell Bank, Chinese Academy ofSciences. All the cell lines were authenticated by the PowerPlex 16System (Promega) and were negative for Mycoplasma tested byqPCR. A549, MRC5, and HepG2 cells were cultured in DMEMsupplementedwith 10%v/v FBS and1%v/v antibiotics at 37�C ina humidified atmosphere of 5% CO2. For passaging, cells werewashed once by PBS and dissociated using 1 mL 0.25% trypsin,then neutralized with 1 mL DMEM, and equally plated into two10-cm dishes.

ResultsThe pipeline of genome-wide WGBS data analysis andidentified DMRs validated by the TCGA cohort

To detect genome-wide screening DNA methylation biomar-kers for the early diagnosis of lung cancer, we collected threeWGBS datasets of lung cancer cells and another four WGBSdatasets of cell samples derived from normal lung tissues ascontrols (Supplementary Table S1). To effectively screen for lungcancer biomarkers from theseWGBSdatasets,wedeveloped anewdata analysis strategy (Fig. 1A).

(i) Weperformed a genome-widemethylation analysis for eachWGBS sample and obtained all CpG sites covered by morethan five reads. By this process, we obtained at least 30 �106 CpG sites per sample, covering at least 55.7% ofwhole genomes (Supplementary Table S1). To robustlyanalyze the difference between normal and cancer sam-ples at single-nucleotide resolution, only CpG sitesdetected in all seven samples were selected for furtheranalysis. In total, 19,461,312 CpG sites were selected,covering 34.5% of all possible sites in human genome.This rate is much higher than both the reduced represen-tation bisulfite sequencing, whose coverage was estimatedto be 1%–3%, and the Illumina 450k methylation array,covering 485,455 CpG sites and accounting for approx-imately 2% of all possible sites (22, 23). The averagemethylation levels showed that the cancer samples werehypomethylated compared with normal ones (Fig. 1B),which is consistent with the previous report that cancer isglobally hypomethylated. Meanwhile, the 19,461,312CpG sites were expected to distribute throughout thewhole genome, including intergenic, intron, exon, andpromoter regions (Fig. 1C). These results indicate that ourapproach is applicable throughout the genome withminor sequence bias (Supplementary Fig. S1A).

(ii) Based on the 19,461,312 CpG sites, by calculating themethylation differences between CCi and NCj, we found24,257 CC-DMS, 442,233 NC-DMS, and 4,456,347NO-DMS, which accounted for 0.12%, 2.27%, and22.9% of all 19,461,312 CpG sites, respectively. Comparedwith the equilibrium distribution of all the 19,461,312CpG sites (Fig. 1C), CC-DMS were obviously enriched inthe promoter and exonic regions (hypergeometric test,

P < 1e-5; Fig. 1D); NC-DMS was enriched in the intergenicregion (hypergeometric test, P < 1e-5; Fig. 1E); meanwhile,NO-DMS were mostly enriched in the intronic region(hypergeometric test, P < 1e-5; Fig. 1F). In addition,13,932 CpG sites of 24,257 CC-DMS (57.4%) were locatedin CpG island regions. In contrast, only 3,518 CpG sites of442,233 NC-DMS (0.8%) were located in the CpG islandregions, indicating that DNA methylation in tumor usuallyoccurred in cis-regulating elements. However, for NO-DMS,hypomethylated CpG sites (methylation level� 20%) weremainly distributed in the promoter region (SupplementaryFig. S1B), whereas hypermethylated CpG sites (methylationlevel� 80%) were mainly distributed in the intronic region(Supplementary Fig. S1C). These results reveal that thecancer cells are globally hypomethylated and locally hyper-methylated, and these locally hypermethylated regions aremainly distributed in promoter and exonic regions.

(iii) Similar to the genetic linkage effect, DNA methylationwithin a small genome region also tends to be consis-tent (24). On the basis of this principle, adjacent CpG sitestogether among regional DNA methylation behavior aremuch more reliable than single CpG sites. For example,DMR or methylation haplotypes have been widely used forDNA methylation analysis. Therefore, we further definedDMR by more than three DMS within the 100 bps genomeregion. Among the 24,257 CC-DMS sites, we identified2,408 CC-DMR. Calculating on the 442,233 NC-DMS, wefound 36,393 NC-DMR. Meanwhile, based on 4,456,347NO-DMS, we found 435,249 NO-DMR. We further ana-lyzed these DMR-embedded genes. There were 958 CC-DMR–related genes and 1,925 NC-DMR–related genes,which we called CC-DMG (cancer cell-differentially meth-ylated genes; Supplementary Table S6) and NC-DMG (nor-mal cell-differentially methylated genes; SupplementaryTable S7). We calculated the methylation levels of CC-DMG and NC-DMG in WGBS and TCGA data (Fig. 1G andH; Supplementary Tables S6 and S7). Kyoto Encyclopedia ofGenes and Genomes pathway analysis showed that CC-DMG were mainly enriched in tumor-associated signalingpathways, such as the Hippo signaling pathway and tran-scriptional misregulation in cancer. NC-DMGwas enrichedin olfactory transduction with less link to tumor-relatedsignaling pathways. NO-DMGwasmainly enriched in basiccellular function–related pathways (SupplementaryFig. S1D). Interestingly, both CC-DMG and NC-DMG wereenriched in the neuroactive ligand–receptor interactionsignaling pathway. Particularly, some adrenaline signal-ing–related genes, such as ADRA1A, ADRA2A, ADRA2C,and ADRBK1, appeared in the CC-DMG list, but somecholinergic signaling–related genes, such as CHRM2,CHRM3, and CHRM5, were found in the NC-DMG list. Thevariation in DNA methylation in nerve-related genes indi-cates that neuroregulation plays an important role in thegenesis and development of lung cancer, which is supportedby evidence from several groups showing that cancer devel-opment in a variety of tissues is controlled by an assortmentof nerve-mediated signals, including neurotransmitters andother molecules (25–27), indicating that epigenetic regu-lation of neuron-related genes will be of great interest incancer development. As expected, many renowned lungcancer methylation biomarkers that were reported in the

HIST1H4F as a Universal-Cancer-Only Methylation Marker

www.aacrjournals.org Cancer Res; 79(24) December 15, 2019 6103

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 4: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

A B

Methylation (0,20):961,219 CpG sites

Methylation (20,40):426 CpG sites

Methylation (40,60):429 CpG sites

Methylation (60,80):5,920 CpG sites

Methylation (80,100):3,488,353 CpG sites

19,575 NO-DMG

WGBS: 958 CC-DMG(1,508 CC-DMR, 11,233 CC-DMS)

WGBS: 1925 NC-DMG(11,430 NC-DMR, 52,454 NC-DMS)

WGBS: 36,393 NC-DMR(165,221 NC-DMS)

435,249 NO-DMR

WGBS:2,408 CC-DMR(17,670 CC-DMS)

Normal:NC-1, NC-2, NC-3, NC-4

Cancer:CC-1 ,CC-2, CC-3

All detected CpGsites: 19461312(34.5% of Genome)

WGBS: 24,257 CC-DMS(CCi–NCj) ≥ 50%

WGBS: 442,233 NC-DMS(NCj –CCi) ≥ 50%

WGBS: 4,456,347 NO-DMS(|CCi–NCj|) ≤ 20%

WGBS data All CpG sites DMS DMR DMG

450K Methylation Array:845 CC-DMS

450K Methylation Array:488 CC-DMR (624 CC-DMS)

450K Methylation Array:251 CC-DMG (401 CC-DMS)

450K Methylation Array:1,662 NC-DMS

450K Methylation Array:736NC-DMR (840 CC-DMS)

450K Methylation Array:200 NC-DMG (377 CC-DMS)

69.674.7

79.0 80.9

53.046.0

68.6

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

Met

hyla

tion

leve

l (%

)

Average methylation

C D E

Intergenic41%

Intron44%

Exon8%

Promoter7%

19461312 All Detected CpG Sites

Intergenic24%

Intron37%

Exon21%

Promoter18%

24257 CC-DMS

Intergenic17%

Intron54%

Exon13%

Promoter16%

4456347 NO-DMS

Intergenic68%

Intron27%

Exon3%

Promoter2%

442233 NC-DMS

F

G

CC

-DM

GN

C-D

MG

Methylation level (%

)

0

20

40

60

80

100

DMG methylation in WGBS

H

0.0

20.0

40.0

60.0

80.0

100.0

0.0 20.0 40.0 60.0 80.0 100.0

Can

cer m

ethy

latio

n Le

vel (

%)

Normal methylation level (%)

DMG methylation verified in TCGA data

CC-DMG

NC-DMG

NC-1NC-2

NC-3NC-4

CC-1CC-2

CC-3

NC-1NC-2

NC-3NC-4

CC-1CC-2

CC-3

Figure 1.

Systemic analysis of WGBS data and validation by TCGA datasets. A,Outline ofWGBS data analysis and TCGA data validation. Four NC-WGBS data and threeCC-WGBS data were collected, then CpG sites detected by all seven samples were selected to do subsequent analysis. DMS was defined by the methylationdifference between CCi and NCj, DMR was defined by continuous three DMS in 100 bps region, and DMGwas defined by DMR-embedded genes. CCi representsany of cancer cell samples, and NCj represents any of normal cell samples. B,Average methylation level of each normal and cancer sample inWGBS data showedcancer genomes are global hypomethylated (Wilcox test, P¼ 0.057). C–F, Genomic distribution of all detected CpG sites, CC-DMS, NC-DMS, and NO-DMS,and the promoter region was defined by TSS� 1k. G, Heatmap of CC-DMG and NC-DMGmethylation fromWGBS data. Each row represents one gene.H, Validation of CC-DMG and NC-DMGmethylation by TCGA datasets. Blue dot, CC-DMG; red dot, NC-DMG. The x-axis represents the average methylation ofnormal samples in TCGA data, and the y-axis represents the average methylation of cancer samples in TCGA data. TSS, transcriptional start sites.

Dong et al.

Cancer Res; 79(24) December 15, 2019 Cancer Research6104

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 5: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

literature are amongourCC-DMG list, for example, SHOX2,POU4F2, BCAT1, HOXA9, and PTGDR (28–32). Theseresults further support our strategy of analysis.

(iv) To further confirm the veracity of the WGBS analysis, wedownloaded the Illumina 450K methylation array data ofthe TCGA lung cancer cohort. The Illumina 450K methyl-ation array contains 485,455 CpG probes. The TCGA lungcancer cohort contained a total of 907 samples, including 75para-cancer normal control samples and 832 lung cancersamples (Supplementary Table S8). We selected overlap-ping detected CpG sites among 450K probes and CpGs inDMS/DMR/DMG (Fig. 1A; Supplementary Fig. S1E) toverify our WGBS analysis. In the 485,455 450K probes,845 and 1,662 CpG sites were commonly detected in450K probes with CC-DMS and NC-DMS, respectively.Methylation levels of CC-DMS and NC-DMS from WGBSwere clearly either hypermethylated or hypomethylatedbetween cancer and normal samples in TCGA datasetsaccordingly. There were 624 and 840 CpG sites commonlydetected in 450K probes with CC-DMR and NC-DMR,respectively. Similarly, CC-DMR and NC-DMR obtainedfrom WGBS are also verified by TCGA datasets (Supple-mentary Fig. S1F–S1I). As for DMG, 401 and 377 CpG siteswere both detected in 450K probes with CC-DMG and NC-DMG, respectively, and their DNA methylation status wasall supported by TCGA datasets (Fig. 1H). Taken together,our results can be fully verified by lung cancer 450K meth-ylation array data from the TCGA, which further prove thevalidity of our previously analyzed approach.

Abnormally hypermethylated signature of histone gene in lungcancer

In addition to some already acknowledged biomarkers, such asSHOX2 and POU4F2, we effectively foundmany unreported newgenes on our CC-DMG list. More interestingly, some histonegenes appeared on the CC-DMG list, such as HIST1H3C,HIST1H4F, andHIST1H4I, which called for further investigation.

As essential and conserved housekeeping genes, histones arestably expressed in almost all eukaryotic cells. Because of theimportant function of histones, each histone protein is encodedby multiple histone genes (19). In total, 85 histone genes havebeen found in the human genome, including 68 canonicalhistone genes and 17 histone variant genes. Canonical histonegenes include six H1 genes, 17 H2a genes, 18 H2b genes, 13H3 genes, and 14H4 genes. Variant histone genes include fourH1variants, seven H2a variants, two H2b variants, and four H3variants. Histone modifications have been widely investigatedin the epigenetic field (33, 34). Unfortunately, DNA methylationof the histone gene family has not been well described in theliterature. We summarize the 85 histone genes in SupplementaryTable S9.

We further focus on the analysis of DNA methylation of thewhole histone gene family in WGBS data (Fig. 2A). Four histonegenes were not included in our analysis (HIST2H2AA4,HIST2H3C, HIST2H4A, and H2BFS), because they were notall detected in the WGBS dataset; therefore, we excludedthese four genes from the subsequent analysis. According to theDNAmethylation signature of histone gene in normal and cancersamples, they can be divided into seven groups. As shown ingroup 1, normal and cancer cells are all poorly methylated,and meanwhile, group 2 histone genes are all highly methylated

in normal and cancer samples. Group 3 histone genes are ran-domly methylated in normal and cancer samples, and group 4,including 14 histone genes (HIST1H4I, HIST1H2BM,HIST1H3C,HIST1H4F, HIST1H2BB, HIST1H2BE, HIST1H1A, HIST1H2BI,HIST1H3G, HIST1H2AD, HIST1H2BE, HIST1H3J, HIST1H2BH,and HIST1H4D), were hypermethylated in all lung cancer sam-ples (Fig. 2B; Supplementary Fig. S2A). To confirm thisfinding,wereanalyzed the methylation of these hypermethylated histonegenes on the Illumina 450Kmethylation arrays of the TCGA lungcancer cohort (n ¼ 907), and the results showed that nine ofthe 14 genes (HIST1H4I, HIST1H4F, HIST1H3C, HIST1H2BE,HIST1H2BM, HIST1H3J, HIST1H2BB, HIST1H1A, andHIST1H2BI) were significantly hypermethylated in both lungadenocarcinoma and LUSC (Fig. 2C). In addition, we found thatDNA methylation of histone genes can be used for the classifi-cation of the three main types of lung cancer. We found that fourhistone genes (group 5: HIST1H2AG, HIST3H2A, HIST3H2BB,and HIST1H3F) were specifically hypermethylated in lung ade-nocarcinoma (Fig. 2D), and four histone genes (group 6:HIST1H4A, HIST1H3A, HIST1H2AL, and HIST1H3I) were onlymethylated in LUSC samples (Fig. 2E), and another six histonegenes (group 7: HIST1H2BL, HIST2H3D, HIST1H2AJ, H2AFJ,HIST1H2AI, and HIST1H1D) were high methylated in SCLC(Fig. 2F). More importantly, these cancer type–specific hyper-methylated genes can be verified in the TCGA datasets (Supple-mentary Fig. S2B). These results suggest that methylation ofhistone gene loci may be used for distinguishing lung cancersubtypes.

We further performed ROC analysis on 14 histone genes thatwere hypermethylated by using TCGA datasets. The results showthat HIST1H4F and HIST1H4I have much higher specificity andsensitivity; the specificity and sensitivity ofHIST1H4Fwere 97.3%and 82.7%, respectively, and the specificity and sensitivity ofHIST1H4I were 96.0% and 87.5%, respectively (SupplementaryTable S10). Moreover, they exhibit an excellent performancewithin stage I of lung cancer, and ROC analysis reveals that theyhave a similar AUCs between different stages, which indicates thatmethylation of HIST1H4F and HIST1H4I can act as early lungcancer diagnosis biomarker (Supplementary Fig. S2C and S2D;Supplementary Table S10). Furthermore, ROC analysis showedthat maximum methylation of HIST1H4F and HIST1H4I (Max-IF) performedbetter than individual genes,with anAUCof 0.95, aspecificity of 96.0%, and a sensitivity of 92.9% (SupplementaryFig. S2E; Supplementary Table S10).

To further confirm our results, the lung cancer primary tissuesamples were used for verification. We collected 25 lung cancertissue samples and paired para-cancer tissue samples as control(Supplementary Table S11). Methylation of HIST1H4F andHIST1H4I was detected by bisulfite-PCR pyrosequencing. Theresults showed that HIST1H4F and HIST1H4I were significantlyhypermethylated in lung cancer, and ROC analysis showed veryhigh sensitivity and specificity for each gene (SupplementaryFig. S2F).Max-IFwas significantly hypermethylated in lung cancersamples, with an AUC ¼ 0.98, a sensitivity of 96%, and aspecificity of 88% (Fig. 2G and H).

Methylation pattern of histone gene for lung cancer diagnosisby BALF samples

BALF is of great significance in the early diagnosis of lungcancer (35, 36). Therefore, we tried to diagnose lung cancer bydetecting the methylation of histone genes using BALF samples.

HIST1H4F as a Universal-Cancer-Only Methylation Marker

www.aacrjournals.org Cancer Res; 79(24) December 15, 2019 6105

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 6: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

C

D E

B

HIST1H4I

HIST1H4F

HIST1H3C

HIST1H2B

E

HIST1H2B

M

HIST1H3J

HIST1H2B

B

HIST1H1

HAIST1H2A

D

HIST1H2B

F

HIST1H2B

H

HIST1H2B

I

HIST1H 3G

HIST1H4D

0

20

40

60

80

100

Normal (n = 75)LUAD (n = 460)LUSC (n = 372)

****

****

****

***

****

****

****

****

****

ns

****

ns

****

****

****

****

****

****

****

****

****

***

****

********

**** ****

****

Methylation of histone genes in lung cancer validated by TCGA data

Methylation of histone genes in lung cancer samples from WGBS data

HIST2H2A

G

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

NC-1 NC-2 NC-3 NC-4 CC-1 CC-2 CC-3

HIST1H4I HIST1H2BM

HIST1H3C HIST1H4F

HIST1H2BB HIST1H2BE

HIST1H1A HIST1H2BI

HIST1H3G HIST1H2AD

HIST1H2BF HIST1H3J

HIST1H2BH HIST1H4D

0.010.020.030.040.050.060.070.080.090.0

100.0

LUAD-Specific hypermethylated genes in WGBS LUSC-Specific hypermethylated genes in WGBS SCLC-Specific hypermethylated genes in WGBS

NC-1NC-2NC-3NC-4

CC-1 (LUAD)

CC-2 (LUSC)

CC-3 (S

CLC)

NC-4

NC-3

NC-2

NC-1

CC-1 (LUAD)

CC-2 (LUSC)

CC-3 (S

CLC)

NC-1NC-2NC-3NC-4

CC-1 (LUAD)

CC-2 (LUSC)

CC-3 (S

CLC)

HIST1H2AGHIST3H2AHIST3H2BBHIST1H3F

0.010.020.030.040.050.060.070.080.090.0

100.0

Met

hyla

tion

leve

l (%

)

Met

hyla

tion

leve

l (%

)

Met

hyla

tion

leve

l (%

)

Met

hyla

tion

leve

l (%

) HIST1H4AHIST1H3AHIST1H2ALHIST1H3I

0.010.020.030.040.050.060.070.080.090.0

100.0

Met

hyla

tion

leve

l (%

) HIST1H2BLHIST2H3DHIST1H2AJH2AFJHIST1H2AIHIST1H1D

F

G****

Lung cance

r (n =

25)

Para-ca

ncer C

trl (n

= 25

)0

20

40

60

Met

hyla

tion

leve

l (%

)

Max-IF in primary tissueH

0 50 1000

50

100

100% -Specificity %

Sens

itivi

ty %

AUC = 0.98P value < 0.0001 Cutoff = 9.95%Sensitivity = 96.0%Specificity = 88.0%95%CI (0.95−1.00)

ROC:Max-IF in primary tissue

AHIST1H2BKH3F3BH1F0HIST2H2BFHIST1H2BGHIST1H3HHIST1H2AEHIST1H3BHIST1H2ABHIST1H4KHIST1H2AHH2AFZHIST2H2BEHIST1H2AMHIST1H2BJHIST1H2BOH2AFXHIST1H2BDHIST2H2ABHIST1H4J

CHIST1H2BCHIST1H4BHIST1H4HHIST1H2ACHIST1H1CHIST4H4HIST1H2AKHIST1H4CHIST1H2BNH1FXHIST1H1EH2AFVH3F3ACENPAHIST1H2BAH1FNTH2BFWTH2BFMH1FOOH2AFYH2AFY2HIST1H2AAH3F3CHIST1H1THIST1H4GHIST3H3HIST1H3EHIST1H4EHIST1H3DHIST1H4LHIST1H1BHIST1H4IHIST1H2BMHIST1H3CHIST1H4FHIST1H2BBHIST1H2BEHIST1H1AHIST1H2BIHIST1H3GHIST1H2ADHIST1H2BFHIST1H3JHIST1H2BHHIST1H4DHIST1H2AHIST3H2AHIST3H2BBHIST1H3FHIST1H4AHIST1H3AHIST1H2ALHIST1H3IHIST1H2BLHIST2H3DHIST1H2AJH2AFJHIST1H2AIHIST1H1D

0

20

40

60

80

Group 1

NC-1NC-2

NC-3NC-4

CC-1 (LUAD)

CC-2 (LUSD)

CC-3 (SCLC)

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

100

Methylation (%

)

Figure 2.

Histone genes are hypermethylated in lung cancer. A, Histone gene family are divided into seven groups according to different DNAmethylation pattern innormal cells and cancer cells of WGBS data. B, Fourteen histone genes in group 4 were hypermethylated in lung cancer cells inWGBS data. C, Fourteen histonegenes hypermethylated from group 4 were validated in TCGA lung cancer cohort, and nine of 14 were hypermethylated in both lung adenocarcinoma (LUAD)and LUSC. Box and whiskers plots, box represents the upper quartile, lower quartile, andmedian; whiskers represent minimum tomaximum. NS, not significant.��� , P < 0.001; ���� , P < 0.0001. P values were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. D, Four histonegenes in group 5 were specifically hypermethylated in lung adenocarcinoma sample inWGBS data. E, Four histone genes in group 6 were specificallyhypermethylated in LUSC sample inWGBS data. F, Six histone genes in group 7 were specifically hypermethylated in SCLC sample inWGBS data. G,Maximummethylation (Max-IF) of HIST1H4F and HIST1H4Iwas significantly hypermethylated in primary lung cancer tissues. Error bar represents upper quartile, lowerquartile, and median. P value was calculated using the two-tailed, paired, nonparametric, Wilcoxon matched-pair signed rank test by GraphPad Prism 7.0software. H, ROC analysis of Max-IF in primary lung cancer tissue, and the AUC was 0.98 (95% confidence interval, 0.95–1.00; P < 0.0001), with a specificity of88.0% and a sensitivity of 96.0%.

Dong et al.

Cancer Res; 79(24) December 15, 2019 Cancer Research6106

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 7: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

We collected 265 BALF samples consisting of 59 BLD controlsamples and 206 lung cancer samples. The BLD control groupcontain pneumonia, emphysema, tuberculosis samples, etc. Thelung cancer experimental group included 92 LUSC, 70 lungadenocarcinoma, and 44 SCLC samples. After obtaining the BALFsamples, we randomly divided the samples into the training set(n ¼ 133) and validation set (n ¼ 132; Table 1).

A bisulfite-PCR pyrosequencing assay was used to detectHIST1H4F and HIST1H4I methylation. To ensure the reproduc-ibility of pyrosequencing, three technical replications of bisulfite-PCR pyrosequencing were completed of a total of 30 BALFsamples including 10 low methylated (0% � methylation �5%), 10 middle methylated (5% < methylation < 20%), and10 high methylated (20% � methylation � 100%) samples, theresults showed an excellent performance in all low-, middle-, andhigh-methylated samples, with a methylation variation within5% (Supplementary Fig. S3A–S3C). Our analysis of clinicalsamples displayed that, in both the training set and the validationset,HIST1H4F andHIST1H4Iwere significantly hypermethylatedin different types of lung cancer (Supplementary Fig. S4A andS4B). Max-IF was also significantly higher in lung adenocarcino-ma, LUSC, SCLC, and all lung cancer samples (Fig. 3A). To assessthe potential for lung cancer diagnosis using HIST1H4I,HIST1H4F, or Max-IF, we first performed ROC analysis in thetraining dataset, where the AUC ROC curve was calculated and acut-off value was determined accordingly; sensitivity and speci-ficity were further calculated on the basis of this cutoff. Moreover,to robustly estimate the diagnostic accuracy, an independentevaluation using the validation set was performed, where anothersensitivity and specificity were calculated on the basis of the givencutoff (Fig. 3B and C; Supplementary Table S10). For LUSC andSCLC, Max-IF achieved AUCs of 0.94 and 0.97, respectively(Fig. 3B). For LUSC, with a methylation cutoff of 6.05%, thespecificity and sensitivity of Max-IF were 96.7% and 86.4% in thetraining set and were 96.5% and 85.4% in the validation set. ForSCLC, with the methylation cutoff of 7.75%, the specificity andsensitivity ofMax-IFwere 96.7%and95.5% in the training set andwere 96.5% and 95.7% in the validation set (Fig. 3C; Supple-mentary Table S10). Comparing with LUSC and SCLC, whichtend to bemore centrally located, lung adenocarcinoma is usuallyobserved peripherally in the lungs (37). Therefore, LUSC andSCLC BALF samples are more likely to contain cancer cells thanlung adenocarcinoma BALF samples (38), thus the sensitivity oflung adenocarcinoma should be lower than that in BALF samplesof LUSC and SCLC. As expected, in lung adenocarcinoma, the

specificity and sensitivity of Max-IF were 96.7% and 60.5% in thetraining set (cutoff¼ 6.3% and AUC¼ 0.84) andwere 96.5% and65.6% in the validation set. To improve thedetection sensitivity inlung adenocarcinoma, we combinedMax-IF with serumCEA. Thesensitivity of CEA alone as a lung cancer biomarker is very low forlung cancer diagnosis (39). In our study, the sensitivities of CEA(cutoff ¼ 5 ng/mL) in the training set and validation set were27.3%and30.7%, respectively. However, the sensitivity of CEA inlung adenocarcinoma is much higher than in LUSC or SCLC. Inthe training set, the sensitivities of lung adenocarcinoma, LUSC,and SCLC were 47.1%, 16.2%, and 14.3%, respectively. In thevalidation set, the sensitivities of lung adenocarcinoma, LUSC,and SCLC were 50%, 22.2%, and 26.1%, respectively. Therefore,we combined Max-IF with serum CEA for lung adenocarcinomadiagnosis, thefinal result of the sample can be positive by either ofthem, and the sensitivity increased from 60.5% to 77.8% in thetraining set and from 65.6% to 81.5% in the validation set(Fig. 3D). For all cancer samples, the specificity and sensitivitywere 96.7% and 86.0% in the training set and 96.5% and 87.0%in the validation set, indicating that histone gene methylation aslung cancer biomarker has excellent accuracy for lung cancerdiagnosis (Fig. 3E).

Methylation of HIST1H4F gene is a potential UCOM markerWe have demonstrated that many histone genes are abnor-

mally hypermethylated in lung cancer, and we wonder whetherhistone genes are also abnormally methylated in other typesof cancer. In total, 17 cancer cohorts from the TCGA wereanalyzed. They include bladder urothelial carcinoma(n ¼ 433),breast-invasive carcinoma (n ¼ 867), cervical squamouscell carcinoma and endocervical adenocarcinoma (n ¼ 310),cholangiocarcinoma (n ¼ 45), colon adenocarcinoma (n ¼335), esophageal carcinoma (n ¼ 201), head and neck squa-mous cell carcinoma (n ¼ 578), kidney renal clear cell carci-noma (n ¼ 479), liver hepatocellular carcinoma (n ¼ 427),lung cancer (n ¼ 907), pancreatic adenocarcinoma (n ¼ 194),prostate adenocarcinoma (n ¼ 548), rectum adenocarcinoma(n ¼ 106), skin cutaneous melanoma (n ¼ 476), stomachadenocarcinoma (n ¼ 398), thyroid carcinoma (n ¼ 563),and uterine corpus endometrioid carcinoma (n ¼ 477; Sup-plementary Table S12).

For each cancer type, we calculated the average methylationdifference in normal and cancer samples (Fig. 4A). We foundthat there are no methylation differences in most histone genes.However, some histone genes tended to be hypermethylated in

Table 1. Clinical information of the training set and validation set

BAFLTraining set Validation set

CharacteristicsBLD(n ¼ 30)

LUAD(n ¼ 38)

LUSC(n ¼ 44)

NSLC(n ¼ 21)

Total(n ¼ 103)

BLD(n ¼ 29)

LUAD(n ¼ 32)

LUSC(n ¼ 48)

NSLC(n ¼ 23)

Total(N ¼ 103)

Age (years)Mean � SEM 55.8 � 2.1 62.0 � 1.5 64.1 � 1.4 56.6 � 1.9 61.8 � 0.9 53.5 � 2.3 60.2 � 1.6 61.2 � 1.4 59.7 � 1.7 60.7 � 0.9Range 34–72 44–76 31–79 44–76 31–79 35–80 43–80 39–80 46–76 39–80

GenderFemale (%) 12 (40.0) 12 (31.6) 3 (6.8) 3 (14.3) 18 (17.5) 12 (41.4) 10 (31.2) 4 (8.3) 5 (21.7) 19 (18.4)Male (%) 18 (60.0) 26 (68.4) 41 (93.2) 18 (85.7) 85 (82.5) 17 (58.6) 22 (68.8) 44 (91.7) 18 (78.3) 84 (81.6)

StageStage I (%) — 10 (26.3) 13 (30.0) 8 (38.1) 31 (30.1) — 13 (40.6) 14 (29.2) 9 (39.1) 36 (35.0)Stage II (%) — 11 (28.9) 12 (27.3) 3 (14.3) 26 (25.2) — 7 (21.9) 16 (33.3) 4 (17.4) 27 (26.2)Stage III (%) — 10 (26.3) 13 (30.0) 7 (33.3) 30 (29.1) — 5 (15.6) 13 (27.1) 6 (26.1) 24 (23.3)Stage IV (%) — 7 (18.4) 6 (13.6) 3 (14.3) 16 (15.5) — 7 (21.9) 5 (10.4) 4 (17.4) 16 (15.5)

Abbreviation: LUAD, lung adenocarcinoma.

HIST1H4F as a Universal-Cancer-Only Methylation Marker

www.aacrjournals.org Cancer Res; 79(24) December 15, 2019 6107

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 8: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

96.7 96.586.0 87.0

0.010.020.030.040.050.060.070.080.090.0

100.0

Training set Validation set

Sens

itivi

ty/S

peci

ficity

(%)

Lung cancer

SpecificitySensitivity

96.5 96.5 96.5 96.5

65.6

85.495.7

82.5

0.010.020.030.040.050.060.070.080.090.0

100.0

LUAD LUSC SCLC Total

Sens

itivi

ty/S

peci

ficity

(%)

Max-IF in validation set

Specificity%Sensitivity%

96.7 96.7 96.7 96.7

60.5

86.495.2

78.6

0.010.020.030.040.050.060.070.080.090.0

100.0

LUAD LUSC SCLC Total

Sens

itivi

ty/S

peci

ficity

(%)

Max-IF in training set

0 50 1000

50

100

BLD (n =

30)

LUAD (n =

38)

LUSC (n =

44)

SCLC (n =

21)

Total (n

= 10

3)

BLD (n =

29)

LUAD (n =

32)

LUSC (n =

48)

SCLC (n =

23)

Total (n

= 10

3)

100%-Specificity%

Sens

itivi

ty %

TotalSCLC

LUADLUSC

95% CI0.74–0.930.89–1.000.92–1.000.86–0.97

P valueP < 0.0001P < 0.0001P < 0.0001P < 0.0001

0

20

40

60

80

100

Max-IF in training set

ROC of Max-IF in training set

Met

hyla

tion

leve

l (%

)****

********

****

0

20

40

60

80

100

Max-IF in validation set

Met

hyla

tion

leve

l (%

)

********

********

A

C

B

60.5

47.2

77.865.6

44.4

81.5

0.010.020.030.040.050.060.070.080.090.0

Max-IF CEA Max-IFcombined

CEA

Sens

itivi

ty (%

)

LUAD

Training setValidation set

D E

Figure 3.

HIST1H4F and HIST1H4Iwere used as lung cancer biomarkers in BALF samples. A,Maximummethylation (Max-IF) of HIST1H4F and HIST1H4Iwas significantlyhypermethylated in lung adenocarcinoma (LUAD), LUSC, SCLC, and total lung cancer in the BALF training set (left) and the validation set (right). Box andwhiskers plots, box represents the upper quartile, lower quartile, and median; whiskers represent minimum tomaximum. ���� , P < 0.0001. P values for all theanalyses were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. B, ROC analysis of Max-IF in training set: lungadenocarcinoma [AUC, 0.84; 95% confidence interval (CI), 0.74–0.93; P < 0.0001], LUSC (AUC, 0.94; 95% CI, 0.89–1.00; P < 0.0001), SCLC (AUC, 0.97; 95% CI,0.92–1.00; P < 0.0001), and total lung cancer (AUC, 0.91; 95% CI, 0.86–0.96; P < 0.0001). C, Sensitivity and specificity of lung adenocarcinoma, LUSC, SCLC, andtotal lung cancer in the training set (left) and validation set (right). D, The sensitivity of lung adenocarcinoma detected by Max-IF combined CEA was muchhigher than Max-IF or CEA individually. E, The comprehensive sensitivity and specificity for HIST1H4I and HIST1H4F as a lung cancer diagnosis marker in thetraining set and validation set. BLD containing pneumonia, emphysema, tuberculosis, etc. Total, total lung cancer includes lung adenocarcinoma, LUSC, andSCLC.

Dong et al.

Cancer Res; 79(24) December 15, 2019 Cancer Research6108

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 9: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

different types of cancer, including HIST1H4F, HIST1H3E,HIST1H2BB, HIST1H1A, HIST1H3C, and HIST1H4I. However,H2BFM and H2BFWT tended to be hypomethylated in varioustypes of cancer. Importantly, we found that HIST1H4F washypermethylated in all tumor types, except thyroid carcinoma(Fig. 4B). In thyroid carcinoma, even minor methylation dif-ference was observed between normal (median ¼ 6.1%) andcancer (median ¼ 5.4%) samples, and we showed HIST1H4Fdid hypermethylated in different stages of cancer than normalsamples (Supplementary Fig. S5). Therefore, we consideredHIST1H4F hypermethylation as a conserved feature across

almost all types of cancers and named it "UCOM". Further-more, we analyzed the relationship between HIST1H4F meth-ylation and tumor stages or patients' outcome in eight tumortypes with a larger sample size in the TCGA database (Supple-mentary Figs. S6A–S6C and S7A–S7G). The results showed thatHIST1H4F was even hypermethylated in stage I of all eighttypes of cancers without significant differences among stages ofcancer. Moreover, ROC analysis showed that the AUCs werealso similar in different stages (Supplementary Table S13).These results indicate that HIST1H4F locus is methylated inthe initiation process of cancer development. Furthermore, the

A

B

CHIST1H4FHIST1H3EHIST1H2BBHIST1H1AHIST1H3CHIST1H2BHHIST1H3JHIST1H2BMHIST1H2BIHIST1H3FHIST1H2BEHIST1H4LHIST1H4IHIST1H3GHIST1H3IHIST3H2BBHIST2H3DHIST1H2AJHIST1H4DHIST3H2AHIST1H1BHIST2H2BFHIST1H2AGHIST1H2BGHIST1H1DHIST1H3AHIST1H2AIHIST1H3BH2AFJHIST1H4JHIST1H4KHIST1H3HHIST1H2ALHIST1H4EHIST1H3DHIST1H2AEHIST1H2BOHIST1H2ADHIST1H2BFHIST1H2ABH1F0H3F3BHIST1H4AH3F3CH2AFY2HIST1H2BLHIST1H2BNHIST1H2AMHIST1H2AHHIST1H2ACHIST1H2BKHIST1H2BDH3F3AHIST1H2BJHIST4H4H2AFYHIST1H2AKH1FXHIST1H4CHIST1H4BHIST3H3HIST2H2ACHIST1H2BCH2AFZHIST1H4HH2AFXHIST2H2BEHIST1H1CHIST2H2ABHIST1H1EHIST1H4GH2AFVCENPAH2AFB3HIST1H1THIST1H2BAHIST1H2AAH1FNTH1FOOH2BFMH2BFWT

Methylation (%

)

45

30

15

0

-15

-30

**** *************

0

20

40

60

80

100

CESCHNSCESCACOADPAAD

READ

THCA

PRAD

LIHC

BLCA

LUNG

BRCAUCECCHOLSKCM

KIRC

STAD

Met

hyla

tion

leve

l (%

)

** ***

HIST1H4F methylation multi-types of cancer

******* ************************ ************

LIHC-N

ormal

(n = 5

0)

PAAD-Can

cer (n =

184)

LIHC-C

ance

r (n =

377)

LUNG-Can

cer (n =

830)

BRCA-Can

cer (n =

769)

LUNG-Norm

al (n

= 75)

Ctrl (n

= 20

)

Liver c

ance

r (n =

23)

Head an

d Nec

k can

cer (n =

10)

Ctrl (n

= 10

)

Breast

cance

r (n =

14)

Ctrl (n

= 14

)

Ctrl (n

= 9)

Pancre

atic c

ance

r (n =

9)

Normal

(n = 1

0)

Cervica

l can

cer (n =

10)

Gastri

c can

cer (n =

10)

Colorectal

cance

r (n =

12)

Esophag

us can

cer (n =

10)

Ctrl (n

= 10

)

Ctrl (n

= 10

)

Ctrl (n

= 12

)

PAAD-Norm

al (n

= 50)

PAAD-Can

cer (n =

498)

BLCA-Norm

al (n

= 21)

HNSC-Norm

al (n

= 50)

HNSC-Can

cer (n =

528)

UCEC-Norm

al (n

= 46)

COAD-Norm

al (n

= 38)

COAD-Can

cer (n =

297)

READ-Norm

al (n

= 7)

THCA-Can

cer (n =

507)

SK CM-N

ormal

(n = 8

)

SK CM-C

ance

r (n =

474)

STAD-Norm

al (n

= 12)

STAD-Can

cer (n =

396)

CESC-Can

cer (n =

307)

CESC-Norm

al (n

= 13)

THCA-Norm

al (n

= 56)

ESCA-Norm

al (n

= 16)

ESCA-Can

cer (n =

185)

READ-Can

cer (n =

307)

CHOL-Can

cer (n =

36)

CHOL-Norm

al (n

= 9)

UCEC-Can

cer (n =

431)

BLCA-Can

cer (n =

412)

KIRC-C

ance

r (n =

319)

KIRC-N

ormal

(n = 1

60)

PAAD-Norm

al (n

= 10)

BRCA-Norm

al (n

= 98)

0

20

40

60

80

100

Met

hyla

tion

leve

l (%

)

**** **** ***********

HIST1H4F methylation in TCGA database

Figure 4.

HIST1H4F as a UCOMmarker. A, Histone gene family methylation in 17 different types of cancer. For each histone gene in each cancer type, calculate the averagemethylation difference between normal and cancer samples in the corresponding cancer type. The color shows the degree of averagemethylation difference,the negative value means that histone gene is hypomethylated, and the positive value means that histone gene is hypermethylated. B, HIST1H4F ishypermethylated in different types of cancer in the TCGA data. Ten cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), 10 stomachadenocarcinoma (STAD), and six skin cutaneous melanoma (SKCM) para-cancer samples were collected from primary tissues by us, due to too few (n� 3)control samples in TCGA database. Box and whiskers plots, box, the upper quartile, lower quartile, and median; whiskers, minimum tomaximum; light-coloredbox, para-cancer control samples; and dark-colored box, cancer samples. NS, not significant. � , P < 0.1; �� , P < 0.01; ��� , P < 0.001; ���� , P < 0.0001. P values for allthe analyses were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. C, Validation of HIST1H4Fmethylation ineight other types of cancer besides lung cancer. Error bar, upper quartile, lower quartile, and median. P values for esophagus cancer, colorectal cancer,pancreatic cancer, and head and neck cancer were calculated using the two-tailed, paired, nonparametric, Wilcoxon matched-pair signed rank test by GraphPadPrism 7.0 software. P values for cervical cancer, gastric cancer, breast cancer, and liver cancer were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. HNSC, head and neck squamous cell carcinoma; ESCA, esophageal carcinoma; COAD, colon adenocarcinoma;READ, rectum adenocarcinoma; PAAD, pancreatic adenocarcinoma; KIRC, kidney renal clear cell carcinoma; THCA, thyroid carcinoma; LIHC, liver hepatocellularcarcinoma; PRAD, prostate adenocarcinoma; BLCA, bladder urothelial carcinoma; LUNG, lung cancer; BRCA, breast-invasive carcinoma; UCEC, uterine corpusendometrioid carcinoma; UCEC, cholangiocarcinoma; CHOL, cholangiocarcinoma.

HIST1H4F as a Universal-Cancer-Only Methylation Marker

www.aacrjournals.org Cancer Res; 79(24) December 15, 2019 6109

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 10: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

survival analysis in these eight cancer types showed there wereno significant differences for patients' outcome among the low,middle, high methylation group (Supplementary Table S14).Taken together, our results suggest that hypermethylation ofHIST1H4F can act as a useful early diagnostic marker for multi-types of cancers.

To further confirm HIST1H4F as a UCOM marker, we select-ed 243 cases of a total of nine types of clinical cancer samples,including 50 lung cancer samples as shown previously andanother 193 samples from eight different types of cancers(Supplementary Table S4). Methylation of HIST1H4F in thesesamples was detected by bisulfite-PCR pyrosequencing assay.The results showed that HIST1H4F was significantly hyper-methylated in all nine types of cancer (Fig. 4C). ROC analysisof HIST1H4F methylation in nine types of cancer was per-formed, and the results showed that the AUCs in all ninecancers were above 0.87, suggesting HIST1H4F as a dreamUCOMmarker (Supplementary Table S10). To further confirmHIST1H4F as a UCOM marker, we should expect that the DNAmethylation level of HIST1H4F should represent the ratio ofcancer cell mixed with noncancer cell in clinical samples. Toverify this point, we mixed normal cells (lung fibroblast cellline MRC5 or normal liver cells) within cancer cells (lungcancer cell line A549 or liver cancer cell line HepG2) by theproportion of 0%, 25%, 50%, 75%, and 100%. We thendetected the methylation level of each sample by bisulfite-PCR pyrosequencing assay. As expected, the final methylationlevel properly represented percentage of cancer cell DNAmixed with normal ones. These results indicate that HIST1H4Fis not only a UCOM marker, but also able to estimate thecancer cell ratio in clinical samples (Supplementary Fig. S8Aand S8B).

DNA methylation is usually correlated with gene expression,so we asked whether abnormal hypermethylation of HIST1H4Finfluenced gene expression. We analyzed HIST1H4F expressionin 15 tumor types in the TCGA database (tumor types withoutnormal controls were excluded), and the results showed thatin most types of tumors, HIST1H4F has no (or very low)gene expression in both normal controls as well as tumors(Supplementary Fig. S9A). We verified in cultured normal lungfibroblast cell line MRC5 and lung cancer cell line A549, inwhich we detected DNA methylation and gene expression ofHIST1H4F, the results showed thatHIST1H4Fwas hypermethy-lated in A549 cells and hypomethylated in MRC5 cells (Sup-plementary Fig. S9B), but has no gene expression in both ofthem (Supplementary Fig. S9C). These unexpected results indi-cate that the expression ofHIST1H4F itself may not be involvedin tumor genesis, but instead that the epigenetic status ofHIST1H4F loci may affect the chromatin information or struc-ture, which further alters the cancer-related gene expressionduring tumor imitation, which is further supported by thediscovery that the histone gene H4 genome sequence iscompletely different but generates almost the same amino acidpeptides (Supplementary Fig. S9D and S9E).

In summary, we collected nine types of cancer, and althoughmany other rare types of cancers have not yet been verified, wespeculate that HIST1H4F is hypermethylated in many othercancer types as well. Therefore, we conclude that HIST1H4F maybe a promising UCOM marker for the screening of patients withearly cancer, and its role in tumorigenesis awaits further study.

DiscussionWGBS is the most comprehensive method for detecting

genome-wide DNA methylation (23). However, few reportshave directly investigated methylation biomarkers in WGBSdataset. Here, we developed a new strategy to analyze WGBSdata and to efficiently screen for new methylation markersof lung cancer genome wide. These markers were also furtherverified by TCGA data and clinical cancer samples. Throughthese analyses, we unexpectedly found that many histonegenes were abnormally hypermethylated in lung cancer. Themethylation status of HIST1H4F and HIST1H4I in BALF sam-ples can be used as an effective approach for the early diag-nosis of lung cancer, with a specificity of 96.7% and a sensi-tivity of 87.0%.

The TCGA database provides us withmuch information on thestudy of tumors, especially for the investigation of pan-cancerouscharacteristics, and a series of high-level literature have beenpublished, including pan-cancer–related signaling pathway anal-ysis (40, 41), genetic alteration analysis (42–44), molecular-based tumor reclassification analysis (45), pan-cancerDNAmeth-ylation analysis (46), etc. These studies have given us informativeviews of cancer from different perspectives. However, thesereported pan-cancer–related markers combined lots of genestogether for cancer diagnosis, and there are few reports describingthat one gene or one locus can be used for all cancer typescreening. These may be due to the fact that methylation data inthe TCGA database were measured using the 450K methylationarray, covering only about 2% of all CpG sites in the genome, andmost information of the genome was missing. Therefore, com-bining WGBS data with TCGA data for analysis is an efficientstrategy for screening DNA methylation biomarkers across thegenome.

Histones are an important family of housekeeping genesexpressed in almost all organisms. To ensure the expressionstability of histone, each histone protein is encoded by manyhistone genes. The regulation of spatial and temporal expressionof the histone genes is very different from other genes (17, 19). Inaddition, the modification of histones has been extensively stud-ied. However, there is no systematic study on the methylationabnormality of the histone loci themselves. Alterations in thechromatin structure of the histone gene cluster 1 region have beenfound in breast cancer (20). By coincidence, it has been reportedthat the histone gene cluster 1 genomic region is abnormallyenriched of H3K27me3 in acute myeloid leukemia (47). Inter-estingly on our part, we found aberrant DNA methylation inmany histone loci located in the histone gene cluster 1.We furtheranalyzed the expression of HIST1H4F in 15 tumor types in theTCGAdatabase, and the results showed thatHIST1H4F has no (orvery low) expression in normal tissues and tumors of differentcancer types. We interpreted that these aberrant DNA methyla-tionsmay affectCCCTCbinding factor,whichwill further alter thechromatin structure of histone gene cluster 1 during cancerdevelopment (48, 49), and we could imagine that the epigeneticstatus or chromatin high-order structure of histone loci other thantheir expression themselves may involve in tumor-initiative pro-cess. More interestingly, the histone gene in cluster 1 is alsomethylated in different types of cancer, which suggests thataberrant DNA methylation in the region of histone gene cluster1 may also be involved in multiple types of cancer development,and it will be interesting for us to explore this in the near future.

Dong et al.

Cancer Res; 79(24) December 15, 2019 Cancer Research6110

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 11: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

To extend our unexpected findings, we analyzed 17 cohortsof cancer in the TCGA database and found that many histonegenes are not only hypermethylated in lung cancer but alsoabnormally hypermethylated in many other tumors. Moreover,wewere surprised to find thatHIST1H4F is hypermethylated in allcancer types and is both highly sensitive and specific as a potentialUCOMmarker,whichwas further verifiedby a total of 243 clinicalsamples, covering nine tumor types. Unlike most reported multi-gene panels for pan-cancer diagnosis (50–52), HIST1H4F is apotential UCOM marker, which was a completely unexpectedfinding and will be of great convenience and significance insubsequent clinical applications. Meanwhile, further exploringthe underlying mechanism of HIST1H4F in cancer developmentmay help us better understand the common feature of tumori-genesis. As a UCOMmarker, the epigenetics status and chromatinstructure of HIST1H4F loci will be of great significance forunderstanding the general mechanism of cancer development,and reversing DNAmethylation in specific histone locusmay be apotential common strategy for future cancer treatment.

Disclosure of Potential Conflicts of InterestW. Yu and Shihua Dong report having a pending patent application. No

potential conflicts of interest were disclosed by the other authors.

Authors' ContributionsConception and design: S. Dong, W. YuDevelopment of methodology: S. Dong, W. Li, W. YuAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): S. Dong, W. Li, L. Wang, J. Hu, Y. Song, B. Zhang,X. Ren, S. Ji, J. Li, P. Xu, Y. Liang, G. Chen, J.-T. Lou

Analysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): S. Dong, W. Li, W. YuWriting, review, and/or revision of the manuscript: S. Dong, W. Li, B. Zhang,X. Ren, J. Li, P. Xu, Y. Liang, W. YuAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): S. Dong, J. Hu, B. Zhang, X. Ren, S. Ji, J. Li, P. Xu,Y. Liang, J.-T. Lou, W. YuStudy supervision: W. Yu

AcknowledgmentsWe thank Yan Li, Lina Peng, Huaibing Luo, ZhiCong Chu, Yao Xiao, Min

Xiao, Ying Guo, Lu Chen, and Lan Zhang for experimental help.We thank RuituLv and Feizhen Wu for their help in bioinformatic analysis. We thank Yue Yu,Zhicong Yang, Ying Tong, and Zhiqiang Hu for editorial help and usefulcomments on the article. This work was supported by the National Key R&DProgram of China (grant no. 2018YFC1005004), the Science and TechnologyInnovation Action Plan of Shanghai (grant no. 17411950900), the NationalNatural Science Foundation of China (grant nos. 31671308, 31872814, and81272295), Major Special Projects of Basic Research of Shanghai Science andTechnology Commission (grant no. 18JC1411101), the Shanghai Science andTechnology Committee (grant no. 12ZR1402200), theMinistry of Education ofthe People's Republic of China (grant no. 2009CB825600), and the InnovationGroup Project of Shanghai Municipal Health Commission (grant no.2019CXJQ03).

The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be hereby markedadvertisement in accordance with 18 U.S.C. Section 1734 solely to indicatethis fact.

Received April 1, 2019; revised July 24, 2019; accepted September 27, 2019;published first October 1, 2019.

References1. Hirsch FR, Scagliotti GV, Mulshine JL, Kwon R, Curran WJ Jr, Wu YL, et al.

Lung cancer: current therapies and new targeted treatments. Lancet 2017;389:299–311.

2. Melosky B, Chu Q, Juergens R, Leighl N, McLeod D, Hirsh V. Pointedprogress in second-line advanced non-small-cell lung cancer: the rapidlyevolving field of checkpoint inhibition. J Clin Oncol 2016;34:1676–88.

3. Sozzi G, Boeri M. Potential biomarkers for lung cancer screening.Transl Lung Cancer Res 2014;3:139–48.

4. National Lung Screening Trial Research Team, Aberle DR, Adams AM,Berg CD, Black WC, Clapp JD, et al. Reduced lung-cancer mortality withlow-dose computed tomographic screening. N Engl J Med 2011;365:395–409.

5. KanodraNM, Silvestri GA, TannerNT. Screening and early detection effortsin lung cancer. Cancer 2015;121:1347–56.

6. Singhal S, Vachani A, Antin-Ozerkis D, Kaiser LR, Albelda SM. Prog-nostic implications of cell cycle, apoptosis, and angiogenesis biomar-kers in non-small cell lung cancer: a review. Clin Cancer Res 2005;11:3974–86.

7. Kathuria H, Gesthalter Y, Spira A, Brody JS, Steiling K. Updates andcontroversies in the rapidly evolving field of lung cancer screening, earlydetection, and chemoprevention. Cancers 2014;6:1157–79.

8. Risch A, Plass C. Lung cancer epigenetics and genetics. Int J Cancer 2008;123:1–7.

9. Mundbjerg K, Chopra S, Alemozaffar M, Duymich C, LakshminarasimhanR,Nichols PW, et al. Identifying aggressive prostate cancer foci using aDNAmethylation classifier. Genome Biol 2017;18:3.

10. Nguyen LV, Pellacani D, Lefort S, Kannan N, Osako T, Makarem M, et al.Barcoding reveals complex clonal dynamics of de novo transformed humanmammary cells. Nature 2015;528:267–71.

11. Li J, Li Y, Li W, Luo H, Xi Y, Dong S, et al. Guide positioningsequencing identifies aberrant DNA methylation patterns that altercell identity and tumor-immune surveillance networks. Genome Res2019;29:270–80.

12. Dor Y, Cedar H.Principles of DNA methylation and their implications forbiology and medicine. Lancet 2018;392:777–86.

13. Koch A, Joosten SC, Feng Z, de Ruijter TC, Draht MX, Melotte V, et al.Analysis of DNA methylation in cancer: location revisited. Nat Rev ClinOncol 2018;15:459–66.

14. Vargas AJ, Harris CC. Biomarker development in the precision medicineera: lung cancer as a case study. Nat Rev Cancer 2016;16:525–37.

15. Hu Y, Lai Y. Identification and expression analysis of rice histone genes.Plant Physiol Biochem 2015;86:55–65.

16. Bhasin M, Reinherz EL, Reche PA. Recognition and classification ofhistones using support vector machine. J Comput Biol 2006;13:102–12.

17. Isogai Y, Keles S, PrestelM,Hochheimer A, Tjian R. Transcription of histonegene cluster by differential core-promoter factors. Genes Dev 2007;21:2936–49.

18. Buschbeck M, Hake SB. Variants of core histones and their roles in cell fatedecisions, development and cancer. Nat Rev Mol Cell Biol 2017;18:299–314.

19. Braastad CD, Hovhannisyan H, van Wijnen AJ, Stein JL, Stein GS. Func-tional characterization of a human histone gene cluster duplication. Gene2004;342:35–40.

20. Fritz AJ, Ghule PN, Boyd JR, Tye CE, Page NA, Hong D, et al. Intranuclearand higher-order chromatin organization of themajor histone gene clusterin breast cancer. J Cell Physiol 2018;233:1278–90.

21. Xi Y, Li W. BSMAP: whole genome bisulfite sequence MAPping program.BMC Bioinformatics 2009;10:232.

22. Yong WS, Hsu FM, Chen PY. Profiling genome-wide DNA methylation.Epigenetics Chromatin 2016;9:26.

23. Chatterjee A, Rodger EJ, Morison IM, Eccles MR, Stockwell PA. Tools andstrategies for analysis of genome-wide and gene-specific DNAmethylationpatterns. Methods Mol Biol 2017;1537:249–77.

24. Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identi-fication of methylation haplotype blocks aids in deconvolution of

HIST1H4F as a Universal-Cancer-Only Methylation Marker

www.aacrjournals.org Cancer Res; 79(24) December 15, 2019 6111

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 12: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

heterogeneous tissue samples and tumor tissue-of-origin mapping fromplasma DNA. Nat Genet 2017;49:635–42.

25. Zhao CM, Hayakawa Y, Kodama Y, Muthupalani S, Westphalen CB,Andersen GT, et al. Denervation suppresses gastric tumorigenesis.Sci Transl Med 2014;6:250ra115.

26. Zahalka AH, Arnal-Estape A, MaryanovichM, Nakahara F, Cruz CD, FinleyLWS, et al. Adrenergic nerves activate an angio-metabolic switch in prostatecancer. Science 2017;358:321–6.

27. Magnon C, Hall SJ, Lin J, Xue X, Gerber L, Freedland SJ, et al. Autonomicnerve development contributes to prostate cancer progression. Science2013;341:1236361.

28. Ilse P, Biesterfeld S, Pomjanski N, Wrobel C, Schramm M. Analysis ofSHOX2 methylation as an aid to cytology in lung cancer diagnosis.Cancer Genomics Proteomics 2014;11:251–8.

29. Pradhan MP, Desai A, Palakal MJ. Systems biology approach to stage-wisecharacterization of epigenetic genes in lung adenocarcinoma. BMC SystBiol 2013;7:141.

30. Ooki A, Maleki Z, Tsay JJ, Goparaju C, Brait M, Turaga N, et al. A panel ofnovel detection and prognostic methylated DNAmarkers in primary non-small cell lung cancer and serumDNA. Clin Cancer Res 2017;23:7141–52.

31. Diaz-Lagares A, Mendez-Gonzalez J, Hervas D, Saigi M, Pajares MJ, GarciaD, et al. A novel epigenetic signature for early diagnosis in lung cancer.Clin Cancer Res 2016;22:3361–71.

32. Su J, Huang YH, Cui X,Wang X, Zhang X, Lei Y, et al. Homeobox oncogeneactivation by pan-cancer DNA hypermethylation. Genome Biol 2018;19:108.

33. Cedar H, Bergman Y. Linking DNAmethylation and histonemodification:patterns and paradigms. Nat Rev Genet 2009;10:295–304.

34. Hammond CM, Stromme CB, Huang H, Patel DJ, Groth A. Histonechaperone networks shaping chromatin function. Nat Rev Mol Cell Biol2017;18:141–58.

35. Wang H, Zhang X, Liu X, Liu K, Li Y, Xu H. Diagnostic value of bronch-oalveolar lavage fluid and serum tumor markers for lung cancer. J CancerRes Ther 2016;12:355–8.

36. Poletti V, Poletti G,Murer B, Saragoni L, ChilosiM. Bronchoalveolar lavagein malignancy. Semin Respir Crit Care Med 2007;28:534–45.

37. Collins LG, Haines C, Perkel R, Enck RE. Lung cancer: diagnosis andmanagement. Am Fam Physician 2007;75:56–63.

38. Sareen R, Pandey CL. Lung malignancy: diagnostic accuracies of bronch-oalveolar lavage, bronchial brushing, and fine needle aspiration cytology.Lung India 2016;33:635–41.

39. Holdenrieder S, Wehnl B, Hettwer K, Simon K, Uhlig S, Dayyani F.Carcinoembryonic antigen and cytokeratin-19 fragments for assessmentof therapy response in non-small cell lung cancer: a systematic review andmeta-analysis. Br J Cancer 2017;116:1037–45.

40. Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC, et al.Oncogenic signaling pathways in the cancer genome atlas. Cell 2018;173:321–37.

41. ChenH, LiC, PengX, ZhouZ,Weinstein JN,CancerGenomeAtlas ResearchNetwork, et al. A pan-cancer analysis of enhancer expression in nearly 9000patient samples. Cell 2018;173:386–99.

42. Korkut A, Zaidi S, Kanchi RS, Rao S, Gough NR, Schultz A, et al. Apan-cancer analysis reveals high-frequency genetic alterations in med-iators of signaling by the TGF-beta superfamily. Cell Syst 2018;7:422–37.

43. Huang KL, Mashl RJ, Wu Y, Ritter DI, Wang J, Oh C, et al. Pathogenicgermline variants in 10,389 adult cancers. Cell 2018;173:355–70.

44. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weera-singhe A, et al. Comprehensive characterization of cancer driver genes andmutations. Cell 2018;174:1034–5.

45. Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumorsfrom 33 types of cancer. Cell 2018;173:291–304.

46. Saghafinia S, Mina M, Riggi N, Hanahan D, Ciriello G. Pan-cancer land-scape of aberrant DNAmethylation across human tumors. Cell Rep 2018;25:1066–80.

47. Tiberi G, Pekowska A, Oudin C, Ivey A, Autret A, Prebet T, et al. PcGmethylation of the HIST1 cluster defines an epigenetic marker of acutemyeloid leukemia. Leukemia 2015;29:1202–6.

48. Bonev B, Cavalli G. Organization and function of the 3D genome. Nat RevGenet 2016;17:661–78.

49. Dixon JR, Xu J, Dileep V, Zhan Y, Song F, Le VT, et al. Integrative detectionand analysis of structural variation in cancer genomes. Nat Genet 2018;50:1388–98.

50. Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylationanalysis reveals cancer common and specific patterns. Brief Bioinform2017;18:761–73.

51. Hao X, Luo H, Krawczyk M, Wei W, Wang W, Wang J, et al. DNAmethylation markers for diagnosis and prognosis of common cancers.Proc Natl Acad Sci U S A 2017;114:7414–9.

52. Brena RM, Plass C, Costello JF. Mining methylation for early detection ofcommon cancers. PLoS Med 2006;3:e479.

Cancer Res; 79(24) December 15, 2019 Cancer Research6112

Dong et al.

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019

Page 13: Histone-Related Genes Are Hypermethylated in Lung Cancer ......ples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control

2019;79:6101-6112. Published OnlineFirst October 1, 2019.Cancer Res   Shihua Dong, Wei Li, Lin Wang, et al.   Biomarker

Could Serve as a Pan-CancerHIST1H4FHypermethylated Histone-Related Genes Are Hypermethylated in Lung Cancer and

  Updated version

  10.1158/0008-5472.CAN-19-1019doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2019/10/01/0008-5472.CAN-19-1019.DC1

Access the most recent supplemental material at:

   

   

  Cited articles

  http://cancerres.aacrjournals.org/content/79/24/6101.full#ref-list-1

This article cites 52 articles, 11 of which you can access for free at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerres.aacrjournals.org/content/79/24/6101To request permission to re-use all or part of this article, use this link

on January 20, 2021. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019