Supplementary Information for
Whole-genome sequence of a flatfish provides insights into ZW sex
chromosome evolution and adaptation to a benthic lifestyle
Songlin Chen1,10,11
, Guojie Zhang2,10
, Changwei Shao1,10
, Quanfei Huang2,10
, Geng Liu 2,10
, Pei Zhang2,10
, Wentao Song1, Na An
2, Domitille Chalopin
3, Jean-Nicolas Volff
3,
Yunhan Hong4, Qiye Li
2, Zhenxia Sha
1, Heling Zhou
2, Mingshu Xie
1, Qiulin Yu
2, Yang
Liu5, Hui Xiang
6, Na Wang
1, Kui Wu
2, Changgeng Yang
1, Qian Zhou
2, Xiaolin Liao
1,
Linfeng Yang2, Qiaomu Hu
1, Jilin Zhang
2, Liang Meng
1, Lijun Jin
2, Yongsheng Tian
1,
Jinmin Lian2, Jingfeng Yang
1, Guidong Miao
1, Shanshan Liu
1, Zhuo Liang
1, Fang Yan
1,
Yangzhen Li1, Bin Sun
1, Hong Zhang
1, Jing Zhang
1,Ying Zhu
1, Min Du
1, Yongwei
Zhao1, Manfred Schartl
7,11, Qisheng Tang
1,11& Jun Wang
2,8,9,11
1Yellow Sea Fisheries Research Institute, CAFS, Key Laboratory for Sustainable
Development of Marine Fisheries, Ministry of Agriculture, Qingdao 266071, China. 2BGI-Shenzhen, Shenzhen 518000, China.
3Institut de Génomique Fonctionnelle de Lyon,
Université de Lyon, CNRS, INRA, Ecole Normale Supérieure de Lyon, Lyon, France. 4Department of Biological Sciences, National University of Singapore, Science Drive 4,
Singapore 117543, Singapore.5Dalian Ocean University, Heishijiao Street 52, Dalian
116023, China.6State Key Laboratory of Genetic Resources and Evolution, Kunming
Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China. 7Physiologische Chemie I, University of Würzburg, Biozentrum, Am Hubland, and
Comprehensive Cancer Center, University Clinic Würzburg, Josef Schneider Straße 6,
D-97074 Würzburg, Germany.8Department of Biology, University of Copenhagen,
Universitetsparken 15, København, 2100, Denmark.9Princess Al Jawhara Center of
Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah,
Saudi Arabia. 10
Theseauthors contributed equally to this work.11
These authors jointly
directed this work.
Correspondence should be addressed to J. W. ([email protected]), S.C.
([email protected]), M.S.([email protected]) or Q.T.
Nature Genetics: doi:10.1038/ng.2890
2
Supplementary Information
Supplementary Figures 1-38 .................................................................................................. 6
Supplementary Figure 1. Distribution of sequencing depth of the assembled female
genome by reads from the female and male samples. ........................................................................ 6
Supplementary Figure 2. Distribution of 17-mers in the usable sequencing reads from the
female sample. ......................................................................................................................................... 7
Supplementary Figure 3. Distribution of 17-mers in the usable sequencing reads from the
male sample. ............................................................................................................................................. 8
Supplementary Figure 4. Phylogenetic tree of Cynoglossus semilaevis retroelements based
on reverse transcriptase alignment. ....................................................................................................... 9
Supplementary Figure 5. Phylogenetic tree of Cynoglossus semilaevis long terminal repeat
(LTR) retroelements based on reverse transcriptase alignment. .................................................... 10
Supplementary Figure 6. Phylogenetic tree of Cynoglossus semilaevis long interspersed
nuclear elements (LINE) retroelements based on reverse transcriptase alignment. .................... 11
Supplementary Figure 7. Distribution of divergence rate of each type of TEs in the tongue
sole genome. .......................................................................................................................................... 12
Supplementary Figure 8. Venn diagram showing supporting evidence for the reference
gene set. ................................................................................................................................................. 13
Supplementary Figure 9. Comparisons of gene parameters among tongue sole, medaka,
Takifugu, Tetraodon, stickleback and zebrafish genomes. .............................................................. 14
Supplementary Figure 10. Statistics of orthologous families for zebrafish, tongue sole,
Tetraodon, Takifugu, stickleback, and medaka (representing Osteichthyes), human
(representing mammals), and chicken (representing birds). ........................................................... 15
Supplementary Figure 11. Venn diagram showing shared orthologous groups for
Pleuronectiformes (tongue sole), Tetraodontidae (Takifugu and Tetraodon), Smegmamorpha
(medaka and stickleback), and Cypriniformes (zebrafish). ............................................................ 16
Supplementary Figure 12. Distribution of protein identities of orthologs between human
and fish species and chicken in all single-copy families. ............................................................... 17
Supplementary Figure 13. Dynamic evolution of gene families. ............................................... 17
Supplementary Figure 14. qRT-PCR analysis of positively selected genes and
differentially expressed genes between pre- and post-metamorphosis fish. ................................ 18
Supplementary Figure 15. Phylogenetic tree using all single-copy orthologs. ......................... 19
Supplementary Figure 16. Estimation of divergence time. ......................................................... 19
Supplementary Figure 17. Reconstructed vertebrate ancestral chromosomes. ......................... 20
Supplementary Figure 18. Model of teleost genome evolution. ................................................. 21
Supplementary Figure 19. Rectangular dot plots show chromosomal locations of
Z-orthologous genes. ............................................................................................................................ 22
Supplementary Figure 20. Structure of sex chromosomes. ......................................................... 23
Supplementary Figure 21. Distribution of Ks for Z-W gene pairs in the non-PAR region. .... 24
Supplementary Figure 22. Dosage compensation of the Z chromosome in tongue sole. ....... 25
Supplementary Figure 23. Up-regulation of Z gene expression in females. ............................. 26
Nature Genetics: doi:10.1038/ng.2890
3
Supplementary Figure 24. Methylation status across the differentially methylated region
(DMR) of dmrt1, sf-1, patched1, follistatin, and neurl3 genes. ..................................................... 27
Supplementary Figure 25. Gonad histological structure at different developmental stages
in Cynoglossus semilaevis. .................................................................................................................. 29
Supplementary Figure 26. Expression of Z chromosome sex-related genes. ........................... 30
Supplementary Figure 27. RT-PCR analysis of sf-1_chr.Z, dmrt1, patched1_chr.Z, and
follistatin expression from various tissues from female and male Cynoglossus semilaevis. ..... 30
Supplementary Figure 28. Expression pattern of sex-related genes (dmrt1, sf-1_chr.Z,
patched1_chr.Z, and follistatin) during the sex reversal period treatment with high
temperature. ........................................................................................................................................... 31
Supplementary Figure 29. Comparison of Z and W-linked sex-related genes. ........................ 32
Supplementary Figure 30. qPCR analysis of the dmrt1 gene in the whole fish. ...................... 33
Supplementary Figure 31. Location of dmrt1 gene in the tongue sole genome: metaphases
from male and female showing the hybridization signal of BAC probe containing dmrt1. ...... 34
Supplementary Figure 32. Gonad in situ hybridization using a sense RNA probe to dmrt1
and no RNA probe as a control. .......................................................................................................... 35
Supplementary Figure 33. Expression of the Z-linked E3 ubiquitin ligase gene, neurl3. ...... 36
Supplementary Figure 34. Gonad in situ hybridization using a neurl3 sense RNA probe. .... 37
Supplementary Figure 35. Apparent absence of W sperm from pseudo-males using
W-linked SSR marker. ......................................................................................................................... 37
Supplementary Figure 36. Gene expression profiling in sexual reversals. ............................... 38
Supplementary Figure 37. RT-PCR analysis of aqp1, gas8, ropn1l, nme5, tekt1, plcz1,
tbpl1, spag6, gal3st1, dnajb13, cldn11, gpr64 expression from three individuals of female
and pseudomale C. semilaevis. ........................................................................................................... 39
Supplementary Figure 38. Comparison of the assembled genome with four BAC
sequences. .............................................................................................................................................. 40
Supplementary Tables 1-13,15-43 and 45-55 ..................................................................... 41
Supplementary Table 1. Statistics for each Illumina libarary. ..................................................... 41
Supplementary Table 2. Summary of usable data of the tongue sole genome. ......................... 42
Supplementary Table 3. Summary result of the tongue sole genome assembly by
SOAPdenovo. ........................................................................................................................................ 43
Supplementary Table 4. Number of markers and total scaffold size for each chromosome. .. 43
Supplementary Table 5. Validation of the Z-linked scaffolds. .................................................... 44
Supplementary Table 6. Transposable elements families that are present in the genome of
the tongue sole. ..................................................................................................................................... 47
Supplementary Table 7. Percentage of the tongue sole genome masked as each class of
transposable elements. ......................................................................................................................... 47
Supplementary Table 8. Copy number of each TE family. .......................................................... 47
Supplementary Table 9. Summary of the copy number of each TE class. ................................ 48
Supplementary Table 10. Transcriptome sequencing data statistics. .......................................... 49
Supplementary Table 11. Statistics of homology-based gene sets using proteins from
different species as parent proteins. ................................................................................................... 49
Supplementary Table 12. General statistics of each gene set. ...................................................... 50
Supplementary Table 13. General statistics of non-coding RNA genes. .................................... 50
Nature Genetics: doi:10.1038/ng.2890
4
Supplementary Table 15. Enrichment of GO terms in differentially expressed genes between
pre-and post-metamorphosis. ............................................................................................................... 50
Supplementary Table 16. Enrichment of GO terms in down-regulated genes in
post-metamorphosis. ............................................................................................................................. 53
Supplementary Table 17. Enrichment of GO terms in up-regulated genes in
post-metamorphosis. ............................................................................................................................. 56
Supplementary Table 18. Metabolism pathways (KEGG) enrichment by DGEs between
pre-and post-metamorphosis. ............................................................................................................... 59
Supplementary Table 19. Positively selected genes involved in the benthic adaptation. ......... 61
Supplementary Table 20. Differentially expression of visual genes in tongue sole. ................. 62
Supplementary Table 21. Distribution of visual genes among different teleost species. .......... 63
Supplementary Table 22. Oxford grid showing the numbers of paralogues between all pairs of
tongue sole chromosomes. ................................................................................................................... 71
Supplementary Table 23. Oxford grid showing the numbers of orthologues between tongue
sole and Tetraodon chromosomes. ...................................................................................................... 71
Supplementary Table 24. Oxford grid showing the numbers of orthologues between tongue
sole and medaka chromosomes. .......................................................................................................... 72
Supplementary Table 25. Oxford grid showing the numbers of orthologues between tongue
sole and zebrafish chromosomes. ........................................................................................................ 72
Supplementary Table 26. List of DCSs. .......................................................................................... 73
Supplementary Table 27. Comparison of structural features of tongue sole Z and W with
autosomes. .............................................................................................................................................. 75
Supplementary Table 28. PAR genes and protein function........................................................... 76
Supplementary Table 29. Classification of Z and W genes in non-PAR region. ....................... 77
Supplementary Table 30. Distribution of pseudogenes on different chromosomes. ................. 77
Supplementary Table 31. Estimation of divergence rate and divergence time between Z and W
chromosomes. ........................................................................................................................................ 78
Supplementary Table 32. Percentage of genes expressed in testis. ............................................. 78
Supplementary Table 33. GO enrichment of chicken Z genes (P value<0.01, Fisher exact test).
.................................................................................................................................................................. 79
Supplementary Table 34. GO enrichment of tongue sole Z genes (P value<0.01, Fisher exact
test). ......................................................................................................................................................... 80
Supplementary Table 35. GO enrichment of tongue sole Z-specific (Z-S) genes (P value<0.01,
Fisher exact test). ................................................................................................................................... 81
Supplementary Table 36. GO enrichment of orthologous Z genes between chicken and tongue
sole (P value<0.01, Fisher exact test). ................................................................................................ 82
Supplementary Table 37. GO enrichment of tongue sole W genes (P value<0.01, Fisher exact
test). ......................................................................................................................................................... 83
Supplementary Table 38. Fisher’s exact test for compensated (comp) /uncompensated
(uncomp) Z genes in tongue sole and zebra finch. ........................................................................... 84
Supplementary Table 39. Fisher’s exact test for compensated (comp)/ uncompensated
(uncomp) Z genes in tongue sole and chicken. ................................................................................. 84
Supplementary Table 40. Sex reversal rate of different families including the pseudomale
families, normal families and temperature-induced families. ......................................................... 85
Nature Genetics: doi:10.1038/ng.2890
5
Supplementary Table 41. Sex ratio of offspring in the pseudomale families and normal
families. ................................................................................................................................................... 86
Supplementary Table 42. Paternal inheritance of Z chromosome in three WZ pseudomale
families determined by microsatellite analysis. ................................................................................ 87
Supplementary Table 43. Characterization and expression of sex-related genes in tongue sole.
.................................................................................................................................................................. 87
Supplementary Table 45. GO enrichment by DEGs up-regulated in normal female ovaries. . 91
Supplementary Table 46. GO enrichment by DEGs up-regulated in pseudomale testes.......... 94
Supplementary Table 47. Sex-biased GO. ....................................................................................... 96
Supplementary Table 48. Metabolism Pathway (KEGG) enriched by DEGs between female
ovaries and pseudomale testes. ............................................................................................................ 98
Supplementary Table 49. Data production and alignment statistic of smRNA-Seq. ................ 99
Supplementary Table 50. Differentially expressed miRNAs between female and reversed male
.................................................................................................................................................................. 99
Supplementary Table 51. Comparison of assembled scaffolds and independently finished 4
BACs of tongue sole genome. .......................................................................................................... 101
Supplementary Table 52. Comparison of assembled scaffolds and ESTs. .............................. 101
Supplementary Table 53. Enrichment of GO terms in expanded gene families of tongue sole
genome. ................................................................................................................................................ 102
Supplementary Table 54. Enrichment of GO terms in contracted gene families of tongue sole
genome. ................................................................................................................................................ 102
Supplementary Table 55. Oligonucleotide primers used in the study. ..................................... 103
Supplementary Note .............................................................................................................. 106
Supplementary URLs ............................................................................................................ 132
References ................................................................................................................................ 132
Nature Genetics: doi:10.1038/ng.2890
6
Supplementary Figures 1-38
Supplementary Figure 1. Distribution of sequencing depth of the assembled female
genome by reads from the female and male samples. The peak depth is 117 and 95 for
the female and male reads, respectively.
Nature Genetics: doi:10.1038/ng.2890
7
Supplementary Figure 2. Distribution of 17-mers in the usable sequencing reads
from the female sample. We used 185 × 106 sequence reads, corresponding to 18.2 Gb
of corrected data from the short insert-size libraries (≤800 bp), and obtained 15.3 × 109
17-mers. The peak depth is 28. The genome size (G) is correlated with the 17-mer
number (N) and the peak of 17-mer frequency (D). Their relationship can be expressed in
an empiric formula: G = N / D. The estimated genome size is 545 Mb. The sub-peak at
about 14-fold is likely due to the half sequence depth of the sex chromosome to
autosomes ratio.
Nature Genetics: doi:10.1038/ng.2890
8
Supplementary Figure 3. Distribution of 17-mers in the usable sequencing reads
from the male sample. We used 205 × 106 reads corresponding to 17.6 Gb of corrected
data from the short insert-size libraries (≤800 bp), and obtained 14.3 × 109 17-mers. The
peak depth is 29. The genome size (G) is correlated with the 17-mer number (N) and the
peak of 17-mer frequency (D). Their relationship can be expressed in an empiric formula:
G = N / D. The estimated genome size is 495 Mb.
Nature Genetics: doi:10.1038/ng.2890
9
Supplementary Figure 4. Phylogenetic tree of Cynoglossus semilaevis retroelements
based on reverse transcriptase alignment. Protein sequences were aligned using
ClustalW (244 amino acids) and the phylogenetic tree was constructed with the PhyML
package using maximum likelihood methods with default bootstrap calculation (shown at
the beginning of branches). Tongue sole elements are written in red.
Nature Genetics: doi:10.1038/ng.2890
10
Supplementary Figure 5. Phylogenetic tree of Cynoglossus semilaevis long terminal
repeat (LTR) retroelements based on reverse transcriptase alignment. Protein
sequences were aligned using ClustalW (180 amino acids) and the phylogenetic tree was
constructed with the PhyML package using maximum likelihood methods with default
bootstrap calculation (shown at the beginning of branches). Tongue sole elements are
written in red.
Nature Genetics: doi:10.1038/ng.2890
11
Supplementary Figure 6. Phylogenetic tree of Cynoglossus semilaevis long
interspersed nuclear elements (LINE) retroelements based on reverse transcriptase
alignment. Protein sequences were aligned using ClustalW (189 amino acids) and the
phylogenetic tree was constructed with the PhyML package using maximum likelihood
methods with default bootstrap calculation (shown at the beginning of branches). Tongue
sole elements are written in red.
Nature Genetics: doi:10.1038/ng.2890
12
Supplementary Figure 7. Distribution of divergence rate of each type of TEs in the
tongue sole genome. The divergence rate was calculated between the identified TE
elements in the genome and the consensus sequence in the de novo library we used.
Nature Genetics: doi:10.1038/ng.2890
13
Supplementary Figure 8. Venn diagram showing supporting evidence for the
reference gene set. A total of 99% (21,309 out of 21,516) of the reference genes were
supported by homology-based or RNA-seq genes. Only 207 genes were predicted by the
pure de novo method.
Nature Genetics: doi:10.1038/ng.2890
14
Supplementary Figure 9. Comparisons of gene parameters among tongue sole,
medaka, Takifugu, Tetraodon, stickleback and zebrafish genomes.
Nature Genetics: doi:10.1038/ng.2890
15
Supplementary Figure 10. Statistics of orthologous families for zebrafish, tongue
sole, Tetraodon, Takifugu, stickleback, and medaka (representing Osteichthyes),
human (representing mammals), and chicken (representing birds). Single-copy
orthologs represent single-copy genes in Osteichthyes, human, and chicken.
Multiple-copy orthologs represent genes with multiple copies in at least one genome out
of Osteichthyes, human, and chicken. Fish multiple-copy orthologs represent genes with
multiple copies in at least one Osteichthyes genome, but being single or absent in the
human and chicken genomes. Complex orthologs represent genes in other families.
Homologs represent genes that could not be clustered into any families, but could be
aligned to other genes with a cutoff E-value of <1e-20.
Nature Genetics: doi:10.1038/ng.2890
16
Supplementary Figure 11. Venn diagram showing shared orthologous groups for
Pleuronectiformes (tongue sole), Tetraodontidae (Takifugu and Tetraodon),
Smegmamorpha (medaka and stickleback), and Cypriniformes (zebrafish).
Nature Genetics: doi:10.1038/ng.2890
17
Supplementary Figure 12. Distribution of protein identities of orthologs between
human and fish species and chicken in all single-copy families.
Supplementary Figure 13. Dynamic evolution of gene families. The number of gene
families that expanded or contracted in each lineage after speciation is shown on the
corresponding branch, with “+” referring to expansion and “-” referring to contraction.
Nature Genetics: doi:10.1038/ng.2890
18
Supplementary Figure 14. qRT-PCR analysis of positively selected genes and
differentially expressed genes between pre- and post-metamorphosis fish. Vertical
bars showmean ± standard error (SE) (n=3).* P<0.05, ** P<0.01.
Nature Genetics: doi:10.1038/ng.2890
19
Supplementary Figure 15. Phylogenetic tree using all single-copy orthologs.
Phylogenetic tree was constructed using 4-fold degenerate sites from 2,426 single-copy
orthologs from tongue sole, zebrafish, medaka, stickleback, Takifugu, Tetraodon, human,
and chicken. The branch length represents the neutral divergence. Numbers on the branch
represent the dn/ds. The posterior probabilities (credibility of the topology) for each inner
branch are all 100%.
Supplementary Figure 16. Estimation of divergence time. The numbers on the nodes
are the divergence times from present (million years ago, Mya). Divergence times from
human-chicken (267–325 Mya), human-zebrafish (438–455 Mya), and zebrafish-medaka
(258–307 Mya) from the TimeTree database were used as the calibration times.
Nature Genetics: doi:10.1038/ng.2890
20
Supplementary Figure 17. Reconstructed vertebrate ancestral chromosomes. Ten
proto-chromosomes in the vertebrate ancestor shown at the top are assigned distinct
colors, and their duplication-derived chromosomes in the gnathostome ancestor are
distinguished by respective vertical bars. In the genomes of the osteichthyan, teleost, and
amniote ancestors, and chicken and tongue sole genomes, genomic regions are assigned
colors and vertical bars represent the correspondence of individual regions to the
proto-chromosomes in the gnathostome ancestor, from which respective regions
originated. White blocks represent the unknown original chromosomes in the chicken
genome. Unassigned blocks are shown in the rightmost chromosome (Un) in the
osteichthyan and amniote ancestors.
Nature Genetics: doi:10.1038/ng.2890
21
Supplementary Figure 18. Model of teleost genome evolution. The 13 ancestral
chromosomes are represented by different colored bars at the top of the figure. Regions
originating from the same ancestral chromosome were assigned the corresponding color.
The black arrows indicate fusion, fission, and duplication events, while gray arrows
represent translocations. Anc, ancestral chromosome; Cse, tongue sole chromosome; Ola,
medaka chromosome; Tni, Tetraodon chromosome; Hsa, human chromosome.
Nature Genetics: doi:10.1038/ng.2890
22
Supplementary Figure 19. Rectangular dot plots show chromosomal locations of
Z-orthologous genes. a, Tongue sole Z chromosome versus selected medaka
chromosomes. The tongue sole Z chromosome is not orthologous to the medaka
chromosome 1 which is considered as a sex chromosome, but is orthologous to large
portions of medaka autosome 9 (blue). At right: one-colour projection of dot plots onto a
unified schematic of the tongue sole Z chromosome, showing that orthology to medaka
chromosome 9 accounts for most of the Z chromosome. b, Tongue sole Z chromosome
versus selected human chromosomes. The tongue sole Z chromosome is orthologous to
large portions of human 9 (blue), 5 (yellow), 12 (green), 22 (purple) and several
segements of the human X chromosome (red). At right: five-colour projection of dot plots
onto a unified schematic of the tongue sole Z chromosome, showing that orthology to
human chromosome5, 9, 12, 22 and X. c, Chicken Z chromosome versus selected human
chromosomes. The Chicken Z chromosome is orthologous to large portions of human 5
(yellow), 9 (blue), and18 (purple). At right: three-colour projection of dot plots onto a
unified schematic of the chicken Z chromosome, showing that orthology to human
chromosome5, 9and 18.
Nature Genetics: doi:10.1038/ng.2890
23
Supplementary Figure 20. Structure of sex chromosomes. Purple curves represent the
TE content of the Z (top) and W (bottom) chromosomes in 5 kb windows. The two bars,
which are colored according to the classification of genes, represent the Z and W
chromosomes. Regions between two genes with the same classification have the same
color. Z-S, Z specific genes; W-S, W specific genes; Z-W, genes on both Z and W;
W-Z_random, W genes homologous to unplaced Z-linked genes; Z-A, Z genes with
paralogs on autosomes or unplaced scaffolds; W-A, W genes with paralogs on autosomes
or unplaced scaffolds; PAR, genes in pseudoautosomal regions.
Nature Genetics: doi:10.1038/ng.2890
24
Supplementary Figure 21. Distribution of Ks for Z-W gene pairs in the non-PAR
region. We used sliding windows with different sizes (1–50 genes) and the same step
(one gene) to calculate the weighted mean of Ks for all 297 Z-W gene pairs in the
non-PAR region. (These gene pairs include pseudogenes. If a Z gene is homologous to
multiple W genes, we chose the best matching W gene to calculate Ks.) Results are
plotted according to the Z-linked gene order (physical position). We found that most of
the calculated Ks values distribute around 0.15.
Nature Genetics: doi:10.1038/ng.2890
25
Supplementary Figure 22. Dosage compensation of the Z chromosome in tongue sole.
a, Density distribution of log2 (M:F) of gene expression in the entire tongue sole body
(without gonad). Black line denotes genes on autosomes, which follow a Gaussian
distribution with a mean value of zero (M:F=1), indicating that genes on autosomes are
expressed at a similar level in males and females. Orange line denotes genes on Z, with a
peak at 0.404 (M:F=1.323), indicating an incomplete dosage compensation in female
whole body compared with male whole body. b, log2 (M:F) of gene expression in whole
body for all chromosomes. Mean value of log2 (M:F) in every autosome (green) was
always around 0, but mean value of log2 (M:F) in the Z chromosome (blue) was
significantly larger than 0.
Nature Genetics: doi:10.1038/ng.2890
26
Supplementary Figure 23. Up-regulation of Z gene expression in females. a, log2
(M:F) of gene expression distribution across Z. log2 (M:F) was at almost the same level
across Z, indicating the compensated genes distributed randomly in Z, and were not
enriched in a specific region. b, Z:A distribution of female whole body across the Z
chromosome. c, Z:A distribution of male whole body across the Z chromosome. Blue bar
denotes the Z-specific region with a higher proportion of methylated cytosine, and green
bars denote other regions.
Nature Genetics: doi:10.1038/ng.2890
27
Supplementary Figure 24. Methylation status across the differentially methylated
region (DMR) of dmrt1, sf-1, patched1, follistatin, and neurl3 genes in male parent
(ZZ testis P), first generation pseudo-male (ZW testis F1), first generation female
Nature Genetics: doi:10.1038/ng.2890
28
(ZW ovary F1), second generation pseudo-male (ZW testis F2) and second
generation female (ZW ovary F2). Schematic diagram at the top shows the gene
structurein tongue sole. Exons are depicted as blue boxes, and 3′- and 5′-UTRs are
indicated by a white box. The black arrowhead shows the direction of the gene from the
transcriptional start site. Methylation levels of mCpGs identified on both DNA strands in
female and male are indicated by the vertical green lines. The gray shadow indicates the
DMR. Correspondingly, open and filled circles represent unmethylated and methylated
cytosines for two samples identified by Sanger sequence, respectively.
Nature Genetics: doi:10.1038/ng.2890
29
Supplementary Figure 25. Gonad histological structure at different developmental
stages in Cynoglossus semilaevis. A: 25 days; B: 48 days; C: 70 days; D: 160 days; E: 1
year; F: 2 years; G, gonium; PO, primary oocyte; OC, ovarian cavity; OG, oogonium; OL,
ovarian lamellae; NU, nucleolus; YK, yolk; FO, follicle; SL, seminiferous lobula; SG,
spermatogonia; SC, spermatocyte; SP, spermatid; ST, spermatozoa.
Nature Genetics: doi:10.1038/ng.2890
30
Supplementary Figure 26. Expression of Z chromosome sex-related genes.
Reverse-transcription polymerase chain reaction (RT-PCR) analysis of sex related genes
during developmental stages in female and male tongue sole. Vertical bars represent mean
± standard error (SE) (n = 3).
Supplementary Figure 27. RT-PCR analysis of sf-1_chr.Z, dmrt1, patched1_chr.Z,
and follistatin expression from various tissues from female and male Cynoglossus
semilaevis.
Nature Genetics: doi:10.1038/ng.2890
31
Supplementary Figure 28. Expression pattern of sex-related genes (dmrt1, sf-1_chr.Z,
patched1_chr.Z, and follistatin) during the sex reversal period treatment with high
temperature. Dmrt1 is upregulated in ZW temperature-induced female to male sex
reversal at the sex determination stage, while the other three genes have no obvious
function in the sex reversal. 25d, pretreated by temperature on day 25; 60d-T, treated by
temperature on day 60; 60d-C, untreated control; NC, water control; M, DL2000 marker.
Nature Genetics: doi:10.1038/ng.2890
32
Supplementary Figure 29. Comparison of Z and W-linked sex-related genes. a,
Comparison of dmrt1 genes. Homologous regions between Z and W sequences are linked
by green lines. Dmrt1 on Z is intact and has five exons. The DM domain is located on the
first and second exons. The W copy on a short scaffold (scaffold1544, ≈18 kb) has no
DM domain and just two exon domains with a Ka/Ks=0.6726. b, Validation of W-linked
dmrt1 by genomic PCR. The discontinuous segments covering almost the entire
scaffold1544 region were sequenced by ABI 3730 sequencer using 20 pairs of primers
that were designed from scaffold1544, revealing the sequence of the incomplete remnant
Nature Genetics: doi:10.1038/ng.2890
33
of dmrt1 on W. c, Comparison of sf-1 genes. The incomplete sf-1 on W-linked
scaffold260 with no expression in different tissues reveals that it is a pseudogene. d,
Comparison of patched1genes. The structure of the two patched1 genes on the Z and W
chromosomes is very similar, other than the loss of one intron by the W ortholog.
0
1
2
3
4
5
6
WW ZZ ZW
Supplementary Figure 30. qPCR analysis of the dmrt1 gene in the whole fish. Male
(ZZ) and female (ZW) fish at the sex determination stage were cultured under normal
temperatures (28°C) and the embryo from the super-female (WW) was produced by
gynogenesis. The Y-coordinates are relative normalized expression levels. All data are
mean ± S.D. (n=3). β-Actin was used for normalization.
Nature Genetics: doi:10.1038/ng.2890
34
Supplementary Figure 31. Location of dmrt1 gene in the tongue sole genome:
metaphases from male and female showing the hybridization signal of BAC probe
containing dmrt1. A, metaphase of the male; B, karyotype of the male; C, metaphase of
the female; D, karyotype of the female. Scale bar: 5μm. Note the order of the
chromosome numbers is not consistent with the genome.
Nature Genetics: doi:10.1038/ng.2890
35
Supplementary Figure 32. Gonad in situ hybridization using a sense RNA probe to
dmrt1 and no RNA probe as a control. a, 56 days. b, 83 days. c, 150 days. OC, ovarian
cavity.
Nature Genetics: doi:10.1038/ng.2890
36
Supplementary Figure 33. Expression of the Z-linked E3 ubiquitin ligase gene,
neurl3. a, RT-PCR analysis of neurl3 expression in various tissues from female and male
tongue sole. G: gonad; Mu: muscle; L: liver; K: kidney; B: brain; I: intestine; P: pituitary;
S: spleen. b, RT-PCR analysis of neurl3 during developmental stages in female and male
of tongue sole. Vertical bars represent mean ± standard error (SE) (n = 3). c, RT-PCR
analysis of neurl3 in the whole fish. Male (ZZ) and female (ZW) fish at the sex
determination stage were cultured under normal temperatures (28°C) and the embryo
from the super-female (WW) was produced by gynogenesis. The Y-coordinates are
relative normalized expression level. All data are mean ± S.D. (n=3). β-Actin was used for
normalization.
Nature Genetics: doi:10.1038/ng.2890
37
Supplementary Figure 34. Gonad in situ hybridization using a neurl3 sense RNA
probe. a, Testis of normal male. b, Testis of pseudo-male. c, Ovary of normal female. OC,
ovarian cavity; SL, seminiferous lobula; SZ, spermatozoon.
Supplementary Figure 35. Apparent absence of W sperm from pseudo-males using
W-linked SSR marker. Fertile sperm DNA from pseudo-males: 1, 3, 5, 7, 9, 11, 13, 15,
17, 19, 21, 23; Fin DNA from corresponding pseudo-males: 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24; Fertile sperm DNA from normal males: 25, 26, 27; Fin DNA from
corresponding normal males: 28, 29, 30; Fin DNA from normal females: 31, 32.
Nature Genetics: doi:10.1038/ng.2890
38
Supplementary Figure 36. Gene expression profiling in sexual reversals. a, GO
categories with significantly different expression (P< 0.05) in the female (red) and in the
pseudomale (blue) gonad are highlighted. Data points represent pairs of female and
pseudomale log2 mean GO RPKM. A complete list of categories is provided in
Supplementary Tables 45 and 46. b, miRNAs with significantly different expression (P <
0.05) in the ovary (red) and in the testis (blue) are highlighted. Data points represent
pairs of ovary and testis log2 RPM. A complete list of miRNA is provided in
Supplementary Table 50.
Nature Genetics: doi:10.1038/ng.2890
39
Supplementary Figure 37. RT-PCR analysis of aqp1, gas8, ropn1l, nme5, tekt1, plcz1,
tbpl1, spag6, gal3st1, dnajb13, cldn11, gpr64 expression from three individuals of
female and pseudomale C. semilaevis.
Nature Genetics: doi:10.1038/ng.2890
40
Supplementary Figure 38. Comparison of the assembled genome with four BAC
sequences.
Nature Genetics: doi:10.1038/ng.2890
41
Supplementary Tables 1-13,15-43 and 45-55
Supplementary Table 1. Statistics for each Illumina libarary. All reads were
generated by Illumina paired-end sequencing. After filtering low quality data, we
obtained usable data.
Sample Library ID
Insert
size
(bp)
Lane
s
GC
(%)
Avg.
read
length
(bp)*
Raw
reads
(M)
Raw
bases
(G)
Avg.
read
length
(bp)**
Usabl
e
reads
(M)
Usable
bases
(G)
Female
(ZW)
BHSciuRBFDBAA
PEI-4
164 1 42.4 100 76.5 7.65 100 72.2 7.22
BHSciuRAADBAA
PE
165 1 40.7 100 56.2 5.62 100 52.6 5.26
BHSciuRAODBAA
PE
175 1 39.9 100 61.6 6.16 100 57.5 5.75
BHSciuRAODBAB
PE
172 1 40.1 100 57.6 5.76 100 47.6 4.76
BHSciuRADDIAAP
E
471 2 39.8 100 119.9 11.99 100 103.2 10.32
BHSciuRBFDIAAP
EI-5
471 1 42.4 100 68.6 6.86 100 57.1 5.71
BHSciuRACDMAA
PE
765 2 39.2 100 87.7 8.77 100 68.8 6.88
BHSciuRAODWAA
PE
2,208 2 42.2 44 125.7 5.53 44 111.9 4.93
BHSciuRAODWBB
PE
2,424 1 43.3 44 55.9 2.46 44 49.7 2.19
BHSciuRAODLAA
PE
4,936 2 42.7 44 108.1 4.76 44 97.9 4.31
BHSciuRAODTAA
PE
8,546 1 43.0 44 47.8 2.10 44 35.6 1.57
BHSciuRAADTAA
PEI-1
9,104 1 44.1 49 159.1 7.80 49 27.5 1.35
BHSciuRAADUAA
PE
19,912 1 45.5 44 53.0 2.33 44 19.3 0.85
CYNcumDAVDVA
APE
34,419 1 41.4 49 138.8 6.80 49 45.9 2.25
BHSciuRAADVAA
PE
34,467 1 39.3 49 138.3 6.78 49 10.5 0.51
All libraries / 19 41.6 67 1354.7 91.35 74 857.5 63.86
Male
(ZZ)
BHScqzDAPDBBA
PE
164 2 41.7 100 116.58 11.66 95 101.6 9.62
BHScqzDAPDBBC
PE
155 1 40.8 100 75.53 7.55 81 63.5 5.14
BHScqzDAPDIAA
PE
477 2 39.8 100 135.44 13.54 83 104.3 8.66
BHScqzDAPDIBA
PE
501 1 39.6 100 68.44 6.84 83 51.2 4.22
BHScqzDAPDMAA
PE
752 1 39.4 100 50.92 5.09 93 40.4 3.74
BHScqzDAPDWA
APE
2,041 2 41.8 44 118.91 5.23 39 104.9 4.09
BHScqzDAPDWB
APE
2,249 1 41.8 44 59.57 2.62 44 53.0 2.33
BHScqzDAQDLAA
PE
5,029 3 40.4 44 175.92 7.74 41 128.7 5.23
BHScqzDAQDTAA
PE
10,045
1 40.9 44 62.63 2.76 44 43.6 1.92
BHScqzDBCDUAA
PE
20,648
1 39.7 44 56.08 2.47 44 14.0 0.62
BHScqzDBCDUBB
PE
22,336
1 42.2 44 53.57 2.36 44 24.9 1.10
All libraries / 16 40.8 70 973.58 67.86 64 730.0 46.67
* Average length of raw reads
** Average length of usable reads
Nature Genetics: doi:10.1038/ng.2890
42
Supplementary Table 2. Summary of usable data of the tongue sole genome. All
reads were generated by Illumina paired-end sequencing. For our calculation of sequence
coverage and physical coverage, we assumed a genome size of 545 Mb and 495 Mb for
female and male genome, respectively.
Sample
Paired-end
libraries
(bp)
Paired-end
insert size
(bp)
Librarie
s Lanes
Average
reads
length
(bp)
Sequence
coverage
(X)
Physical
coverage
(X)
Female
(ZW)
200 164~175 4 4 100.0 42 36
500 ~471 2 3 100.0 29 69
800 ~765 1 2 100.0 13 48
2000 2,204~2,424 2 3 44.0 13 337
5000 4,934~4,938 1 2 44.0 8 443
10000 8,546~9,104 2 2 46.0 5 509
20000 ~19,912 1 1 44.0 2 352
40000 ~34,467
34,419~34,46
7
2 2 49.0 5 1,781
Total 164~34,467 15 19 74.5 117 3,575
Male
(ZZ)
200 155~169 2 3 89.0 30 27
500 475~501 2 3 83.0 26 76
800 ~752 1 1 92.0 8 31
2000 2,033~2,249 2 3 41.0 13 337
5000 5,024~5,037 1 3 41.0 11 654
10000 ~10,045 1 1 44.0 4 442
20000 20,648~22,33
6
2 2 44.0 3 854
Total 155~22,336 11 16 63.9 95 2,421
Nature Genetics: doi:10.1038/ng.2890
43
Supplementary Table 3. Summary result of the tongue sole genome assembly by
SOAPdenovo.
Type Contig Scaffold
Size (bp) Num. Size (bp) Num.
N95 1,318 28,715 38,905 726
N90 4,419 20,032 272,602 526
N80 9,564 13,208 493,436 400
N70 14,668 9,388 627,811 315
N60 20,264 6,762 734,563 244
N50 26,524 4,806 867,956 185
N40 33,954 3,295 993,045 133
N30 43,078 2,108 1,132,664 88
N20 55,534 1,179 1,409,670 50
N10 75,365 467 1,818,425 20
Longest 194,815 1 4,694,140 1
Total 453,103,890 113,432 477,207,161 80,677
Supplementary Table 4. Number of markers and total scaffold size for each
chromosome. We anchored Z-linked scaffolds from male assembly and autosomal
scaffolds from female assembly onto Z chromosome and autosomes (chr1~20).
Chr. # SSR #
RAD-tag
Contig Scaffold # Genes
# Len.(bp) # Len.(bp) Source
1 81 1,184 2,410 32,791,084 53 34,529,112 Female 1,490
2 40 1,288 1,227 19,259,417 29 20,052,734 Female 911
3 29 810 1,189 15,467,848 25 16,253,993 Female 596
4 85 949 1,263 19,377,156 31 20,014,501 Female 871
5 43 777 1,147 18,609,661 29 19,279,693 Female 706
6 30 865 1,270 18,113,957 29 18,841,016 Female 978
7 54 642 993 13,185,383 15 13,814,722 Female 645
8 53 825 2,144 28,615,567 37 30,153,790 Female 1,397
9 50 703 1,314 18,790,677 31 19,618,599 Female 1,029
10 46 454 1,507 20,081,642 33 21,015,569 Female 1,037
11 42 553 1,428 19,676,390 34 20,528,432 Female 1,047
12 40 517 1,349 17,485,432 35 18,398,590 Female 745
13 43 323 1,518 20,959,882 34 21,922,143 Female 946
14 50 639 1,782 27,668,722 47 28,847,931 Female 1,228
15 46 430 1,478 19,132,837 32 20,094,621 Female 779
16 40 484 1,252 17,874,443 29 18,785,820 Female 814
Nature Genetics: doi:10.1038/ng.2890
44
17 38 246 1,333 15,583,495 25 16,472,647 Female 984
18 28 226 1,092 14,404,870 22 15,207,555 Female 783
19 33 54 1,108 17,115,378 24 17,747,288 Female 847
20 34 89 1,036 14,355,002 18 15,234,830 Female 881
Z 37 53 2,044 20,757,346 26 21,915,962 Male 926
W NA NA 2,436 13,020,023 306 16,461,726 Female 317
Total 942 12,111 32,320 422,326,212 944 445,191,274 NA 19,957
Supplementary Table 5. Validation of the Z-linked scaffolds. 1-4 genes of each
scaffold, which preliminarily were taken as putative Z-linked, were used to confirm the
depth ratio between male and female. Oligonucleotide primers were designed by primer
premier5.
Scaffold ID Gene ID M:F Avg. Primer Sequence 5’-3’
scaffold100
Cse_R017172 2.134
2.092
1109-F: TCACAGCAGGGCTCACTTCAT
1109-R: ACATTGTGGCTGCGGTTGG
Cse_R016987 2.418 1067-F:TGTTCGTCCCAGCCAAACC
1067-R:CTGCTCCCTCCTTCTGTCCC
Cse_R016598 1.723 1068-F:GGCTCAATGTCAGAACACCAAA
1068-R:GTCCGAGCAGAAGGAGGTAAAT
scaffold569 Cse_R016528 2.135 2.135 1865-F:CCCTGGCTGTCAGCACGATA
1865-R:GGTGGAGGCTTGCAGATGTTA
scaffold55
Cse_R016709 2.130
2.211
2555-F: CAGATTTCACAGTCATCCACCAA
2555-R: AGCATTCCCGCAGTTTCGT
Cse_R017120 2.561 2631-F: ATCATCCAAGGACTGCCTCAAA
2631-R: CGGTCCCTACGTGGGAGTAAA
Cse_R016880 1.943
2626-F:GGTCGCCATCTTCCAGTTCC
2626-R:CTCCACCCTGGTCGTTCCTC
scaffold116
Cse_R016422 2.082
2.119
3668-F: CTTATCAGCCGCTCCAAACAG
3668-R: AGAGGCCATCCTCATCTACCAT
Cse_R016983 2.143 3675-F: TCACAACCCAACCAACGACG
3675-R: ACCCTCAGACCCTCCACGAA
Cse_R016526 2.134 3685-F: TGGGAGGGAAATCACAGGTC
3685-R: TCTGGAGCGCATTTAGGGAC
Scaffold710 Cse_R022132 1.997
2.023
1971-F:CGAGAGGCTGCGAGACAAACT
1971-R:GCGTCTGGGATGGCTCTTTT
Cse_R016996 1.851
1981-F:GTAAACATTCCCACAAACAACA
1981-R:AGTCCTGGAGGTGAAGGCAC
Cse_R017284 2.179
1983-F:TGCGAGTAAGACGGAACCAA
1983-R:GTGTTGCGAGTGAAAGGAGA
Cse_R016667 2.068 1984-F:CTTCAAACAAGCCTTTCCTG
Nature Genetics: doi:10.1038/ng.2890
45
1984-R:TTCGTCCGAGTCTATGTGCC
Scaffold676 Cse_R016625 2.310
2.327
5311-F:GAGGCCCTGCATCAATCTGTAT
5311-R:TCTAGGTGGAGTCTGGTGCGTAT
Cse_R017228 2.344 5333-F:TCCACAGCCTGCTTAGTCTTGC
5333-R:CCACATCCTTTGACTGCTGCTC
scaffold246
Cse_R017132 1.817
2.085
0018-F: GGACTCCTGGTCCAGCAGTAAGT
0018-R: CAGCTATGAAGGCAGATTGTCTTTT
Cse_R017325 2.138 0021-F: ACCACTGAGAAGGAGGGTTCG
0021-R: CGGGTTGAATTTGGCAAGAGT
Cse_R016499 1.824 0050-F: AGTCCTGGGTCAAAGCATTATCT
0050-R: CCTGTAGCCTCCTGAATCTCCT
Cse_R017033 2.562 0109-F: CTTTATTGCCAGACCTCAACATG
0109-R: TTACAAGCCACTGAAAGGATTACC
scaffold317
Cse_R017274 2.117
2.144
0546-F: GGATACTCCTGCTTGACACCAA
0546-R: GTGATGAGTTTACTTTCACCCTCC
Cse_R016718 2.172 0568-F:TGATCGTAGTGTTCCTGCCTCT
0568-R:GTACCTGGCGACAACATAGAGTTT
scaffold553
Cse_R016442 2.077
2.086
0745-F: CCGTCGGAGAAGAACTTGAGC
0745-R: TGGTGGAGGAGATGGTGTCG
Cse_R017143 2.096 0750-F:CCAGACACTCAGGGACAATGAA
0750-R:CCACTGTTGGCTTTGGACGA
Scaffold76 Cse_R016569 2.047 2.047 0723-F:GCGCTGTCTCGGTAAATCTCA
0723-R:GTAGAGTGACATACGGAGGCTGAC
Scaffold813 Cse_R016596 2.234
2.153
9160-F:GTTCAGGATGGATGGAGGCAG
9160-R:AGGAAGCTCTACACCACCGAGA
Cse_R016573 2.072 9165-F:CCTCCTTATGTTTGCCATCTCC
9165-R:CAATGAGGAAGGCTCCAGTGAT
Scaffold351 Cse_R016747 1.976 1.976
9827-F:GGTGGGCTTTCAGATGGGAT
9827-R:TGGTGGCGATGATTCTACTGG
Scaffold96 Cse_R017130 2.114
2.036
2278-F:GTAGGGATCGACCTGGTGTAGT
2278-R:GTGATTGGCGATGCGTAGAT
Cse_R016841 2.281
2294-F:TTCACATCTTCCCTTTCGTCAT
2294-R:CTTTGGTAGCTTTGCAGTCC
Cse_R017352 1.718
2306-F:AATGGCTCGTTGTATGACTTCC
2306-R:TCCGTTCCTCTTCACCAGTATG
Scaffold893 Cse_R016550 2.117
2.010
0424-F:ACGCTCGTGTTCATTAGATGTGG
0424-R:CAGTTTTCTCCTGTCCTCGGTC
Cse_R016660 1.903 0428-F:CCTGGTTCGTTCCTGCTTTG
0428-R:CTATTGGGATGCCCGCTTT
scaffold980 Cse_R016633 2.182 2.204 2687-F: TGGATGCCTGTAAAGTTATGGGTA
2687-R: TGAGTTCGCTTGGTTCTGCTG
Nature Genetics: doi:10.1038/ng.2890
46
Cse_R017235 2.300 2703-F: CCTCTTATGTGATGAAGGTGGATG
2703-R: CAGGGAAGGTGTCTTCTGGATAT
Cse_R017263 1.964 2693-F: GCAGGAGAAGGAGGTGAAGAAAT
2693-R: TGGTAGGAGCCTGTGATGATGTT
Cse_R016555 2.370 2717-F: CTTCATCCAGCAGTTCAGTCAGT
2717-R: TGGTCAGAGCCTTTCATTATCTC
Scaffold831 Cse_R017231 2.140
2.206
3354-F:CTGTCTGTCCGTTTACTCCTGAAT
3354-R:AGGTGCTGTCTTCTAACGCCTAC
Cse_R016515 2.272 3357-F:TAAACCTTTCTCCTTCCTGCTTC
3357-R:TTCAGATGACATCAGGGACTGC
Scaffold631 Cse_R016917 2.038
2.105
4929-F:TTGTCAACCTTTCCCACTTCCA
4929-R:TCAACTGAGGCCGGTCTGC
Cse_R016613 1.996 4932-F:CCCTCGTGCAGACTTATCAAAC
4932-R:GGCTTCCGCAACTTCAGTGTA
Cse_R016592 2.281 4933-F:TTGGTAAAGAACAGCTATGTTACCAG
4933-R:AACTCACCTAGCAGCCTTGACC
Scaffold636 Cse_R016897 2.038 2.038
8200-F:CCAGTGTTTCAGCCTTTAACCT
8200-R:GACAGACCAGCGAGTCATTAGG
Scaffold753
Cse_R016867 1.983
1.956
2382-F:CACATCCAGAGGAAACCGCA
2382-R:CATTTGGCCCAAGTCCACAG
Cse_R017111 1.934
2391-F:AAGTCCAATGAGATTCTCCCTCC
2391-R:ACTCCAAACTGAGCCACAACAC
Scaffold1068 Cse_R016793 2.169
1.924
9008-F:GCCCAGACTCAGTGGAGATGC
9008-R:CAAAGCCAACGAGCCAATAAC
Cse_R016933 1.678 9015-F:CTGTGGCAACCCGATTTCTC
9015-R:CCTCCTGTTCTCCCTGCTCC
scaffold120
Cse_R022129 2.382
2.167
4429-F: TCTCCGCTCCATCACGCTC
4429-R: GAAATGACGACGGCCACGA
Cse_R022130 2.152 4430-F:GTGACCCACCAGGACACCC
4430-R:CAGGTTCTCCCGCAGGATC
Cse_R016874 1.966
4442-F:ACCCTATCACCAAAGCCAAGA
4442-R:GGATTTCACAGCCATCACTCA
Scaffold677 Cse_R017185 1.897 1.897
3140-F:ATCCACGACGGTCTGGGTAG
3140-R:GGCTCAAAGCGTTCAAGGG
Scaffold757 Cse_R017286_
E3
1.063
1.038
1296-F:TGAAGCAGGTCAGCAGCAGG
1296-R:GTGGAAGCCAACGAAGGGA
Cse_R016652 1.014 1297-F:ACTGGATTTGGAGGACAGAAGC
1297-R:TTAGCAGATTTGGTCGTGGATT
Scaffold589 Cse_R017095 1.228
1.174
6755-F:GCTGCTGGCGTTCTGCTACA
6755-R:CAGGACTTGCGTGCATTTGTC
Cse_R016437 1.121
6757-F:CTGTATGTGACTCCAGACCTCCAC
6757-R:CAGACCCTGACACTCAGTTCCTC
Nature Genetics: doi:10.1038/ng.2890
47
Supplementary Table 6. Transposable elements families that are present in the
genome of the tongue sole.
LTR families LINE families TIRs families Other families
Gypsy/Sushi R2 (REL endonuclease) TcMariner/Tc1 Penelope
Gypsy/Rodin RTE/Rex3-BovB Hat/Ac
Gypsy/LreO3 Rex/Babar Hat/Tol2
BEL/Suzu CR1 Buster
Supplementary Table 7. Percentage of the tongue sole genome masked as each class
of transposable elements.
Class SINE LINE LTR DNA
transposons
Unclassifie
d Total
Percentage in
genome 0.22% 1.04% 0.08% 2.45% 2.06% 5.85%
Supplementary Table 8. Copy number of each TE family. Result 2 column: the
number of copy but being careful that each copy is counted just one time. Result 4
column: the number of copy that are 80% long of the reference size. The reference size is
the size of each element in the de novo library. Result 5 column: threshold 50%. 6 column:
threshold 30%.
Family Result2
(all)
Result4
(80%)
Yield
(80%)
Result5
(50%)
Yield
(50%)
Result6
(30%)
Yield
(30%)
DNA 1,430 36 2.52 53 3.71 75 5.24
DNA/En-Spm 1,884 581 30.84 917 48.67 1,156 61.36
DNA/Harbinger 302 155 51.32 233 77.15 283 93.71
DNA/Hat 3,062 693 22.63 1,086 35.47 1,472 48.07
DNA/Hat-Ac 976 443 45.39 677 69.36 834 85.45
DNA/Hat-Charlie 39,152 11,032 28.18 21,582 55.12 29,898 76.36
DNA/Hat-Tip100 673 162 24.07 334 49.63 498 74.00
DNA/Hat-Tol2 155 53 34.19 84 54.19 114 73.55
DNA/Helitron 91 52 57.14 74 81.32 80 87.91
DNA/PiggyBac 33 19 57.58 24 72.73 28 84.85
DNA/Sola 849 250 29.45 379 44.64 511 60.19
DNA/TcMar-Fot1 43 15 34.88 29 67.44 38 88.37
DNA/TcMar-Tc1 5,489 1,064 19.38 1,989 36.24 3,061 55.77
DNA/TcMar-Tc2 8,260 2,643 32.00 4,568 55.30 5,904 71.48
DNA/TcMar-Tigger 6,301 1,783 28.30 3,421 54.29 4,665 74.04
LINE/L2 1,237 290 23.44 437 35.33 616 49.80
Nature Genetics: doi:10.1038/ng.2890
48
LINE/Penelope 4,594 636 13.84 1,090 23.73 1,858 40.44
LINE/R2 44 20 45.45 34 77.27 40 90.91
LINE/Rex1 291 14 4.81 27 9.28 47 16.15
LINE/Rex-Babar 2,748 875 31.84 1,502 54.66 1,999 72.74
LINE/RTE 5,040 3,159 62.68 4,119 81.73 4,603 91.33
LINE/RTE-BovB 8,019 780 9.73 1,705 21.26 3,137 39.12
Low_complexity 142,314 141,677 99.55 142,171 99.90 142,253 99.96
LTR 12 11 91.67 11 91.67 11 91.67
LTR/Copia 22 16 72.73 19 86.36 21 95.45
LTR/ERV 174 0 0.00 0 0.00 26 14.94
LTR/ERV1 420 221 52.62 313 74.52 367 87.38
LTR/ERVK 39 1 2.56 1 2.56 1 2.56
LTR/ERVL 19 15 78.95 17 89.47 19 100.00
LTR/Gypsy 977 303 31.01 439 44.93 512 52.41
LTR/Pao 21 17 80.95 20 95.24 20 95.24
LTR/Viper 20 7 35.00 9 45.00 11 55.00
Satellite 263 4 1.52 4 1.52 8 3.04
Simple_repeat 198,234 187,463 94.57 190,866 96.28 192,615 97.17
SINE 7,384 2,833 38.37 4,485 60.74 6,282 85.08
SINE? 2,276 224 9.84 1,238 54.39 1,712 75.22
SINE/Alu 17 5 29.41 9 52.94 15 88.24
SINE/Trna-Lys 160 11 6.88 143 89.38 159 99.38
SINE/V 31 2 6.45 11 35.48 31 100.00
snRNA 100 13 13.00 26 26.00 55 55.00
Unknown 79,164 31,238 39.46 50,920 64.32 65,857 83.19
Supplementary Table 9. Summary of the copy number of each TE class. Result 2
column: the number of copy but being careful that each copy is counted just one time.
Result 4 column: the number of copy that are 80% long of the reference size. The
reference size is the size of each element in the de novo library. Result 5 column:
threshold 50%. Result 6 column: threshold 30%.
Class Result2
(all)
Result4
(80%)
Yield
(80%)
Result5
(50%)
Yield
(50%)
Result6
(30%)
Yield
(30%)
DNA 68,700 18,981 27.63 35,450 51.60 48,617 70.77
LINE 21,973 5,774 26.28 8,914 40.57 12,300 55.98
LTR 1,704 591 34.68 829 48.65 988 57.98
SINE 9,868 3,075 31.16 5,886 59.65 8,199 83.09
Nature Genetics: doi:10.1038/ng.2890
49
Supplementary Table 10. Transcriptome sequencing data statistics.
Sample
Accession
no. # of
library
# of read
pairs
Read1
len.
(bp)
Read2
len.
(bp)
# of
basepairs
(G)
% mapped
on
genome
Testis (ZZ testis P) SRX106096 1 8,847,831 73 75 1.31 90.18
Ovary (ZW ovary F1) SRX106097 1 8,897,059 73 75 1.32 88.68
Testis (ZW testis F1) SRX106098 1 6,671,850 90 90 1.20 79.01
Testis (ZW testis F2) SRX106099 1 9,827,201 90 90 1.77 73.79
Whole body (female
pre-)
SRX106100 1 11,983,356 90 90 2.16 77.72
Whole body (female
post-)
SRX106101 1 11,989,388 90 90 2.16 75.48
Whole body (male) SRX106103 1 13,657,075 90 90 2.46 89.98
Total/Average - 7 71,873,760 - - 12.38 82.12
Supplementary Table 11. Statistics of homology-based gene sets using proteins from
different species as parent proteins.
Medaka Takifugu Tetraodon Stickleback Zebrafish Human
# of parent proteins from
Ensembl 19,671 18,507 19,583 20,772 24,046 22,402
# of transcripts after rough
alignment 488,444 418,643 403,897 431,484 592,590 543,494
# of transcripts after precise
alignment 138,720 103,190 103,719 116,755 163,582 117,280
# of transcripts after transcript
clustering 17,228 16,876 17,262 17,675 16,593 13,609
# of transcripts after filtering
pseudogenes 15,191 16,221 16,751 17,158 16,237 13,263
Nature Genetics: doi:10.1038/ng.2890
50
Supplementary Table 12. General statistics of each gene set. Gene length included the
exon and intron regions but excluded UTRs.
Gene set Number
Average
transcript
length (bp)
Average
CDS length
(bp)
# of exons
per gene
Average
exon
length (bp)
Average
intron
length (bp)
Homology-based 18,284 9,252 1,595 9.4 169 910
RNA-seq 30,253 5,383 1,054 5.6 189 945
De novo 27,327 11,052 1908 11.8 161 844
Reference 21,516 8,575 1,462 8.7 168 925
Supplementary Table 13. General statistics of non-coding RNA genes.
Type Copy # Average
length(bp)
Total length
(bp)
% of genome
miRNA 285 91 25,898 0.005
tRNA 674 77 52,204 0.109
rRNA
Total 104 107 11,175 0.002
18S 39 118 4,604 0.001
28S 32 107 3,432 0.001
5.8S 1 40 40 0.000
5S 32 97 3,099 0.001
snRNA
snRNA 221 128 28,381 0.006
CD-box 105 97 10,142 0.002
HACA-box 46 152 7,004 0.001
splicing 62 161 9,953 0.002
Supplementary Table 14. Differentially expressed genes between pre- and
post-metamorphosis. (see Excel file ‘Supplementary Table 14.xls’)
Supplementary Table 15. Enrichment of GO terms in differentially expressed genes
between pre-and post-metamorphosis.
GO_ID GO_Term GO_Class Gene no. Pvalue AdjustedPv
GO:0030246 carbohydrate binding MF 48 6.25E-33 3.09E-30
GO:0006959 humoral immune response BP 20 7.80E-17 8.68E-15
GO:0004866 endopeptidase inhibitor activity MF 26 2.12E-16 2.30E-14
GO:0042627 chylomicron CC 8 1.41E-12 8.87E-11
GO:0016491 oxidoreductase activity MF 53 3.34E-12 1.84E-10
GO:0004867 serine-type endopeptidase inhibitor MF 15 4.65E-11 2.28E-09
Nature Genetics: doi:10.1038/ng.2890
51
activity
GO:0043691 reverse cholesterol transport BP 10 4.89E-11 2.37E-09
GO:0005579 membrane attack complex CC 7 5.03E-11 2.39E-09
GO:0030300
regulation of intestinal cholesterol
absorption BP 7 5.03E-11 2.39E-09
GO:0005506 iron ion binding MF 21 1.19E-10 5.31E-09
GO:0046486 glycerolipid metabolic process BP 26 1.64E-10 7.07E-09
GO:0042157 lipoprotein metabolic process BP 18 1.65E-10 7.07E-09
GO:0006721 terpenoid metabolic process BP 13 1.72E-10 7.30E-09
GO:0016064
immunoglobulin mediated immune
response BP 14 2.21E-10 9.06E-09
GO:0031526 brush border membrane CC 12 2.47E-10 1.00E-08
GO:0034361 very-low-density lipoprotein particle CC 8 4.22E-10 1.66E-08
GO:0034369 plasma lipoprotein particle remodeling BP 9 5.24E-10 1.99E-08
GO:0001523 retinoid metabolic process BP 12 6.15E-10 2.32E-08
GO:0042445 hormone metabolic process BP 19 1.51E-07 3.05E-06
GO:0007596 blood coagulation BP 34 1.82E-07 3.63E-06
GO:0002449 lymphocyte mediated immunity BP 15 1.91E-07 3.80E-06
GO:0015908 fatty acid transport BP 12 2.01E-07 3.98E-06
GO:0032052 bile acid binding MF 6 2.09E-07 4.11E-06
GO:0030195 negative regulation of blood coagulation BP 8 2.33E-07 4.54E-06
GO:0045923
positive regulation of fatty acid
metabolic process BP 9 2.64E-07 5.10E-06
GO:0006644 phospholipid metabolic process BP 22 2.91E-07 5.56E-06
GO:0006766 vitamin metabolic process BP 17 2.94E-07 5.59E-06
GO:0009629 response to gravity BP 4 4.74E-07 6.46E-05
GO:0051234 establishment of localization BP 150 6.30E-07 1.13E-05
GO:0042439
ethanolamine-containing compound
metabolic process BP 11 6.42E-07 1.14E-05
GO:0050878 regulation of body fluid levels BP 38 6.60E-07 1.15E-05
GO:0010817 regulation of hormone levels BP 27 7.31E-07 1.25E-05
GO:0050996
positive regulation of lipid catabolic
process BP 8 7.58E-07 1.29E-05
GO:0010743
regulation of macrophage derived foam
cell differentiation BP 7 7.88E-07 1.34E-05
GO:0000302 response to reactive oxygen species BP 7 8.53E-07 9.81E-05
GO:0009416 response to light stimulus BP 9 1.04E-06 9.81E-05
GO:0010898
positive regulation of triglyceride
catabolic process BP 5 2.33E-06 3.50E-05
GO:0014070 response to organic cyclic compound BP 9 2.72E-06 0.000230196
GO:0009612 response to mechanical stimulus BP 7 3.17E-06 0.000255324
GO:0046581 intercellular canaliculus CC 5 4.17E-06 6.04E-05
GO:0051918 negative regulation of fibrinolysis BP 5 4.17E-06 6.04E-05
GO:0009628 response to abiotic stimulus BP 13 4.21E-06 0.000310857
Nature Genetics: doi:10.1038/ng.2890
52
GO:0032870 cellular response to hormone stimulus BP 11 4.38E-06 0.000310857
GO:0042730 fibrinolysis BP 6 5.27E-06 7.45E-05
GO:0019430 removal of superoxide radicals BP 6 5.27E-06 7.45E-05
GO:0030449 regulation of complement activation BP 4 5.88E-06 8.18E-05
GO:0051156 glucose 6-phosphate metabolic process BP 4 1.34E-05 0.000170447
GO:0009743 response to carbohydrate stimulus BP 16 1.35E-05 0.000170447
GO:0006711 estrogen catabolic process BP 3 1.62E-05 0.000193963
GO:0007565 female pregnancy BP 6 2.07E-05 0.001110553
GO:0019433 triglyceride catabolic process BP 6 3.02E-05 0.000335585
GO:0009991 response to extracellular stimulus BP 9 4.24E-05 0.002214245
GO:0031100 organ regeneration BP 4 0.000110264 0.004657322
GO:0060674 placenta blood vessel development BP 3 0.000136057 0.005613122
GO:0006694 steroid biosynthetic process BP 13 0.000186357 0.001615211
GO:0002755
MyD88-dependent toll-like receptor
signaling pathway BP 4 0.000215013 0.00794651
GO:0048545 response to steroid hormone stimulus BP 23 0.00022706 0.0019231
GO:0070412 R-SMAD binding MF 3 0.000231981 0.008398676
GO:0034142 toll-like receptor 4 signaling pathway BP 4 0.000288057 0.00964035
GO:0004181 metallocarboxypeptidase activity MF 6 0.000306543 0.002474005
GO:0005577 fibrinogen complex CC 4 0.000333554 0.002682283
GO:0042542 response to hydrogen peroxide BP 4 0.000447005 0.013440461
GO:0009888 tissue development BP 15 0.000462133 0.013663731
GO:0009725 response to hormone stimulus BP 38 0.000466229 0.003637563
GO:0055085 transmembrane transport BP 53 0.000626006 0.004700989
GO:0003051 angiotensin-mediated drinking behavior BP 2 0.000641572 0.004700989
GO:0002019
regulation of renal output by
angiotensin BP 2 0.000641572 0.004700989
GO:0060136
embryonic process involved in female
pregnancy BP 2 0.000653734 0.018275894
GO:0042573 retinoic acid metabolic process BP 4 0.000746605 0.005373383
GO:0043065 positive regulation of apoptotic process BP 9 0.001033357 0.024701008
GO:0012501 programmed cell death BP 15 0.001063105 0.024701008
GO:0001661 conditioned taste aversion BP 2 0.001124652 0.024701008
GO:0032496 response to lipopolysaccharide BP 5 0.00114176 0.024701008
GO:0007612 learning BP 4 0.001249489 0.026441915
GO:0048146
positive regulation of fibroblast
proliferation BP 3 0.001266946 0.026441915
GO:0071383
cellular response to steroid hormone
stimulus BP 4 0.001367844 0.0275745
GO:0001938
positive regulation of endothelial cell
proliferation BP 3 0.001405245 0.02777252
GO:0061113 pancreas morphogenesis BP 2 0.001892322 0.011692505
GO:0046983 protein dimerization activity MF 10 0.002031371 0.035679729
GO:0051403 stress-activated MAPK cascade BP 5 0.002055217 0.035744651
Nature Genetics: doi:10.1038/ng.2890
53
GO:0008146 sulfotransferase activity MF 3 0.002748172 0.042028079
GO:0048731 system development BP 27 0.002790516 0.042310897
GO:0045087 innate immune response BP 6 0.003042499 0.045356244
GO:0003707 steroid hormone receptor activity MF 3 0.003203879 0.046244792
GO:0048856 anatomical structure development BP 29 0.003258511 0.046244792
GO:0043401
steroid hormone mediated signaling
pathway BP 3 0.003448075 0.047830647
GO:0006916 anti-apoptosis BP 5 0.003471181 0.047830647
GO:0006915 apoptotic process BP 14 0.003657897 0.049159912
GO:0019934 Cgmp-mediated signaling BP 3 0.005871181 0.02999554
GO:0006107 oxaloacetate metabolic process BP 3 0.005871181 0.02999554
GO:0015671 oxygen transport BP 3 0.005871181 0.02999554
GO:0019841 retinol binding MF 3 0.005871181 0.02999554
GO:0050308 sugar-phosphatase activity MF 3 0.005871181 0.02999554
GO:0009235 cobalamin metabolic process BP 2 0.00609787 0.030353083
GO:0009268 response to Ph BP 4 0.006630306 0.03263869
GO:0009651 response to salt stress BP 4 0.008361563 0.03969594
GO:0004745 retinol dehydrogenase activity MF 3 0.008451289 0.039799676
GO:0005859 muscle myosin complex CC 3 0.009953381 0.045386196
GO:0006111 regulation of gluconeogenesis BP 3 0.009953381 0.045386196
Supplementary Table 16. Enrichment of GO terms in down-regulated genes in
post-metamorphosis.
GO_ID GO_Term GO_Class Gene no. Pvalue AdjustedPv
GO:0051412 response to corticosterone stimulus BP 6 4.03E-09 3.77E-06
GO:0048545 response to steroid hormone stimulus BP 13 7.24E-09 3.77E-06
GO:0051384 response to glucocorticoid stimulus BP 9 9.17E-09 3.77E-06
GO:0032570 response to progesterone stimulus BP 6 1.09E-08 3.77E-06
GO:0010035 response to inorganic substance BP 13 1.10E-08 3.77E-06
GO:0071277 cellular response to calcium ion BP 5 3.77E-08 8.37E-06
GO:0051591 response to cAMP BP 7 5.99E-08 1.18E-05
GO:0042493 response to drug BP 12 8.45E-08 1.50E-05
GO:0009314 response to radiation BP 11 2.94E-07 4.74E-05
GO:0009605 response to external stimulus BP 18 3.95E-07 5.84E-05
GO:0009629 response to gravity BP 4 4.74E-07 6.46E-05
GO:0009725 response to hormone stimulus BP 15 7.62E-07 9.65E-05
GO:0000302 response to reactive oxygen species BP 7 8.53E-07 9.81E-05
GO:0009416 response to light stimulus BP 9 1.04E-06 9.81E-05
GO:0046022
positive regulation of transcription from
RNA polymerase II promoter during
mitosis
BP 3 1.11E-06 9.81E-05
Nature Genetics: doi:10.1038/ng.2890
54
GO:0014070 response to organic cyclic compound BP 9 2.72E-06 0.000230196
GO:0009612 response to mechanical stimulus BP 7 3.17E-06 0.000255324
GO:0010033 response to organic substance BP 19 3.60E-06 0.000277463
GO:0009628 response to abiotic stimulus BP 13 4.21E-06 0.000310857
GO:0032870 cellular response to hormone stimulus BP 11 4.38E-06 0.000310857
GO:0051592 response to calcium ion BP 6 4.95E-06 0.000337525
GO:0070887 cellular response to chemical stimulus BP 16 8.23E-06 0.000540913
GO:0070482 response to oxygen levels BP 8 1.14E-05 0.000672302
GO:0010038 response to metal ion BP 8 1.75E-05 0.000969469
GO:0007565 female pregnancy BP 6 2.07E-05 0.001110553
GO:0009991 response to extracellular stimulus BP 9 4.24E-05 0.002214245
GO:0051726 regulation of cell cycle BP 12 4.38E-05 0.002220981
GO:0001666 response to hypoxia BP 7 5.90E-05 0.002692604
GO:0034097 response to cytokine stimulus BP 8 5.92E-05 0.002692604
GO:0042221 response to chemical stimulus BP 24 7.17E-05 0.003181564
GO:0003690 double-stranded DNA binding MF 6 7.84E-05 0.003392193
GO:0031100 organ regeneration BP 4 0.000110264 0.004657322
GO:0060674 placenta blood vessel development BP 3 0.000136057 0.005613122
GO:0002756 MyD88-independent toll-like receptor
signaling pathway BP 4 0.00016522 0.006661359
GO:0034130 toll-like receptor 1 signaling pathway BP 4 0.000174431 0.006876443
GO:0034134 toll-like receptor 2 signaling pathway BP 4 0.000204289 0.00771083
GO:0034138 toll-like receptor 3 signaling pathway BP 4 0.000204289 0.00771083
GO:0002755 MyD88-dependent toll-like receptor
signaling pathway BP 4 0.000215013 0.00794651
GO:0070412 R-SMAD binding MF 3 0.000231981 0.008398676
GO:0008063 Toll signaling pathway BP 4 0.000237668 0.008432454
GO:0060395 SMAD protein signal transduction BP 3 0.000280021 0.00964035
GO:0034142 toll-like receptor 4 signaling pathway BP 4 0.000288057 0.00964035
GO:0005667 transcription factor complex CC 7 0.000289006 0.00964035
GO:0043565 sequence-specific DNA binding MF 11 0.000293449 0.00964035
GO:0045655 regulation of monocyte differentiation BP 2 0.00030738 0.009914399
GO:0045944 positive regulation of transcription from
RNA polymerase II promoter BP 10 0.000332291 0.010526515
GO:0046332 SMAD binding MF 4 0.000411156 0.012796334
GO:0042542 response to hydrogen peroxide BP 4 0.000447005 0.013440461
GO:0009888 tissue development BP 15 0.000462133 0.013663731
GO:0044092 negative regulation of molecular function BP 10 0.000609477 0.01743892
GO:0060136 embryonic process involved in female
pregnancy BP 2 0.000653734 0.018275894
GO:0060711 labyrinthine layer development BP 3 0.000659333 0.018275894
GO:0031668 cellular response to extracellular stimulus BP 5 0.000699928 0.019102655
GO:0045088 regulation of innate immune response BP 5 0.000790853 0.021257177
GO:0071310 cellular response to organic substance BP 12 0.00084939 0.022159084
Nature Genetics: doi:10.1038/ng.2890
55
GO:0035767 endothelial cell chemotaxis BP 2 0.000954015 0.023978928
GO:0001077
RNA polymerase II core promoter proximal
region sequence-specific DNA binding
transcription factor activity involved in
positive regulation of transcription
MF 3 0.000959698 0.023978928
GO:0044451 nucleoplasm part CC 10 0.00099865 0.024605634
GO:0031099 regeneration BP 5 0.001021702 0.024701008
GO:0043065 positive regulation of apoptotic process BP 9 0.001033357 0.024701008
GO:0010628 positive regulation of gene expression BP 12 0.001060828 0.024701008
GO:0012501 programmed cell death BP 15 0.001063105 0.024701008
GO:0003700 sequence-specific DNA binding
transcription factor activity MF 12 0.001111446 0.024701008
GO:2000278 regulation of DNA biosynthetic process BP 2 0.001124652 0.024701008
GO:0001661 conditioned taste aversion BP 2 0.001124652 0.024701008
GO:0032496 response to lipopolysaccharide BP 5 0.00114176 0.024701008
GO:0007612 learning BP 4 0.001249489 0.026441915
GO:0048146 positive regulation of fibroblast
proliferation BP 3 0.001266946 0.026441915
GO:2000108 positive regulation of leukocyte apoptosis BP 2 0.001308811 0.026998038
GO:0006357 regulation of transcription from RNA
polymerase II promoter BP 12 0.001334601 0.027213589
GO:0071383 cellular response to steroid hormone
stimulus BP 4 0.001367844 0.0275745
GO:0001938 positive regulation of endothelial cell
proliferation BP 3 0.001405245 0.02777252
GO:0002573 myeloid leukocyte differentiation BP 4 0.001408978 0.02777252
GO:0040029 regulation of gene expression, epigenetic BP 4 0.001537556 0.029329297
GO:0032501 multicellular organismal process BP 37 0.001597897 0.030144325
GO:0060255 regulation of macromolecule metabolic
process BP 27 0.001627004 0.030144325
GO:0003008 system process BP 17 0.00163126 0.030144325
GO:0045893 positive regulation of transcription,
DNA-dependent BP 11 0.001790929 0.032419476
GO:0009636 response to toxin BP 4 0.001868987 0.033490731
GO:0007049 cell cycle BP 13 0.001974191 0.035022146
GO:0046983 protein dimerization activity MF 10 0.002031371 0.035679729
GO:0051403 stress-activated MAPK cascade BP 5 0.002055217 0.035744651
GO:0005654 nucleoplasm CC 13 0.002103813 0.035886194
GO:0060716 labyrinthine layer blood vessel
development BP 2 0.002178608 0.036808097
GO:0050679 positive regulation of epithelial cell
proliferation BP 4 0.002304991 0.038001282
GO:0007275 multicellular organismal development BP 30 0.002307141 0.038001282
GO:0004879 ligand-activated sequence-specific DNA MF 3 0.002334915 0.038001282
Nature Genetics: doi:10.1038/ng.2890
56
binding RNA polymerase II transcription
factor activity
GO:0043154
negative regulation of cysteine-type
endopeptidase activity involved in
apoptotic process
BP 3 0.002334915 0.038001282
GO:0046697 decidualization BP 2 0.002691962 0.0415904
GO:0007568 aging BP 5 0.002696108 0.0415904
GO:0008146 sulfotransferase activity MF 3 0.002748172 0.042028079
GO:0048731 system development BP 27 0.002790516 0.042310897
GO:0045087 innate immune response BP 6 0.003042499 0.045356244
GO:0003707 steroid hormone receptor activity MF 3 0.003203879 0.046244792
GO:0071478 cellular response to radiation BP 3 0.003203879 0.046244792
GO:0005634 nucleus CC 32 0.003209925 0.046244792
GO:0048856 anatomical structure development BP 29 0.003258511 0.046244792
GO:0043401 steroid hormone mediated signaling
pathway BP 3 0.003448075 0.047830647
GO:0006916 anti-apoptosis BP 5 0.003471181 0.047830647
GO:0007184 SMAD protein import into nucleus BP 2 0.003557786 0.048179487
GO:0006915 apoptotic process BP 14 0.003657897 0.049159912
GO:0031323 regulation of cellular metabolic process BP 28 0.00375143 0.049664457
Supplementary Table 17. Enrichment of GO terms in up-regulated genes in
post-metamorphosis.
GO_ID GO_Term
GO
_Cl
ass
Gene
No. Pvalue AdjustedPv
GO:0008202 steroid metabolic process BP 43 4.19E-39 3.73E-36
GO:0010876 lipid localization BP 40 1.66E-38 1.23E-35
GO:0030246 carbohydrate binding MF 48 6.25E-33 3.09E-30
GO:0071702 organic substance transport BP 63 1.04E-29 3.84E-27
GO:0072376 protein activation cascade BP 23 7.00E-29 2.23E-26
GO:0042180 cellular ketone metabolic process BP 75 6.26E-27 1.86E-24
GO:0044255 cellular lipid metabolic process BP 75 1.58E-26 4.39E-24
GO:0019752 carboxylic acid metabolic process BP 73 2.41E-26 5.65E-24
GO:0006956 complement activation BP 18 8.87E-26 1.97E-23
GO:0006066 alcohol metabolic process BP 55 3.67E-21 5.64E-19
GO:0000267 cell fraction CC 87 8.89E-18 1.04E-15
GO:0006959 humoral immune response BP 20 7.80E-17 8.68E-15
GO:0030299 intestinal cholesterol absorption BP 11 9.60E-15 8.91E-13
GO:0005792 microsome CC 29 1.97E-14 1.68E-12
GO:0017127 cholesterol transporter activity MF 11 1.55E-13 1.17E-11
Nature Genetics: doi:10.1038/ng.2890
57
GO:0044243 multicellular organismal catabolic process BP 13 2.05E-13 1.49E-11
GO:0043498 cell surface binding MF 15 2.27E-13 1.58E-11
GO:0005903 brush border CC 17 2.52E-13 1.72E-11
GO:0009308 amine metabolic process BP 49 2.62E-13 1.77E-11
GO:0065008 regulation of biological quality BP 122 7.95E-13 5.21E-11
GO:0016491 oxidoreductase activity MF 53 3.34E-12 1.84E-10
GO:0050778 positive regulation of immune response BP 26 7.03E-12 3.77E-10
GO:0042221 response to chemical stimulus BP 126 3.31E-11 1.64E-09
GO:0005506 iron ion binding MF 21 1.19E-10 5.31E-09
GO:0001523 retinoid metabolic process BP 12 6.15E-10 2.32E-08
GO:0005496 steroid binding MF 15 3.22E-09 1.01E-07
GO:0006706 steroid catabolic process BP 9 5.69E-09 1.66E-07
GO:0031406 carboxylic acid binding MF 20 5.97E-09 1.73E-07
GO:0006805 xenobiotic metabolic process BP 16 8.82E-09 2.49E-07
GO:0006775 fat-soluble vitamin metabolic process BP 13 1.33E-08 3.39E-07
GO:0032374 regulation of cholesterol transport BP 10 1.40E-08 3.54E-07
GO:0005902 microvillus CC 14 1.43E-08 3.59E-07
GO:0008237 metallopeptidase activity MF 18 1.88E-08 4.57E-07
GO:0019835 cytolysis BP 9 2.55E-08 6.01E-07
GO:0019439 aromatic compound catabolic process BP 9 3.56E-08 8.22E-07
GO:0034754 cellular hormone metabolic process BP 15 5.51E-08 1.23E-06
GO:0006982 response to lipid hydroperoxide BP 5 6.02E-08 1.31E-06
GO:0007597 blood coagulation, intrinsic pathway BP 6 1.07E-07 2.21E-06
GO:0031210 phosphatidylcholine binding MF 6 1.07E-07 2.21E-06
GO:0006776 vitamin A metabolic process BP 10 1.43E-07 2.92E-06
GO:0042445 hormone metabolic process BP 19 1.51E-07 3.05E-06
GO:0006766 vitamin metabolic process BP 17 2.94E-07 5.59E-06
GO:0051234 establishment of localization BP 150 6.30E-07 1.13E-05
GO:0042439 ethanolamine-containing compound metabolic
process BP 11 6.42E-07 1.14E-05
GO:0042574 retinal metabolic process BP 6 6.49E-07 1.14E-05
GO:0033194 response to hydroperoxide BP 6 6.49E-07 1.14E-05
GO:0050878 regulation of body fluid levels BP 38 6.60E-07 1.15E-05
GO:0048037 cofactor binding MF 24 7.01E-07 1.22E-05
GO:0010817 regulation of hormone levels BP 27 7.31E-07 1.25E-05
GO:0005788 endoplasmic reticulum lumen CC 15 7.91E-07 1.34E-05
GO:0000303 response to superoxide BP 7 1.11E-06 1.84E-05
GO:0006576 cellular biogenic amine metabolic process BP 17 1.23E-06 2.03E-05
GO:0009617 response to bacterium BP 20 1.27E-06 2.09E-05
GO:0009986 cell surface CC 28 1.93E-06 3.02E-05
GO:0030141 secretory granule CC 18 2.14E-06 3.29E-05
GO:0032101 regulation of response to external stimulus BP 22 3.18E-06 4.74E-05
GO:0008206 bile acid metabolic process BP 9 3.34E-06 4.94E-05
GO:0031983 vesicle lumen CC 9 3.98E-06 5.87E-05
Nature Genetics: doi:10.1038/ng.2890
58
GO:0051918 negative regulation of fibrinolysis BP 5 4.17E-06 6.04E-05
GO:0010885 regulation of cholesterol storage BP 5 4.17E-06 6.04E-05
GO:0006950 response to stress BP 104 4.65E-06 6.69E-05
GO:0015294 solute:cation symporter activity MF 16 4.77E-06 6.84E-05
GO:0042730 fibrinolysis BP 6 5.27E-06 7.45E-05
GO:0019430 removal of superoxide radicals BP 6 5.27E-06 7.45E-05
GO:0019842 vitamin binding MF 17 5.82E-06 8.15E-05
GO:0007584 response to nutrient BP 19 6.46E-06 8.97E-05
GO:0001798 positive regulation of type IIa hypersensitivity BP 3 1.62E-05 0.000193963
GO:0006711 estrogen catabolic process BP 3 1.62E-05 0.000193963
GO:0071944 cell periphery CC 145 1.64E-05 0.000195379
GO:0009607 response to biotic stimulus BP 27 1.69E-05 0.000201647
GO:0042572 retinol metabolic process BP 6 3.02E-05 0.000335585
GO:0052548 regulation of endopeptidase activity BP 16 4.85E-05 0.000516727
GO:0002576 platelet degranulation BP 10 5.13E-05 0.00054259
GO:0005501 retinoid binding MF 6 6.02E-05 0.000628552
GO:0042744 hydrogen peroxide catabolic process BP 5 8.88E-05 0.000867574
GO:0010575 positive regulation vascular endothelial growth
factor production BP 5 8.88E-05 0.000867574
GO:0019825 oxygen binding MF 6 0.000132992 0.001219092
GO:0016298 lipase activity MF 13 0.000134788 0.001230493
GO:0006694 steroid biosynthetic process BP 13 0.000186357 0.001615211
GO:0002526 acute inflammatory response BP 9 0.000188245 0.0016275
GO:0032501 multicellular organismal process BP 195 0.000206683 0.001767317
GO:0048545 response to steroid hormone stimulus BP 23 0.00022706 0.0019231
GO:0051181 cofactor transport BP 5 0.000236341 0.001975423
GO:0005507 copper ion binding MF 8 0.000282264 0.002303086
GO:0006548 histidine catabolic process BP 3 0.000305933 0.002473563
GO:0032488 Cdc42 protein signal transduction BP 3 0.000305933 0.002473563
GO:0005577 fibrinogen complex CC 4 0.000333554 0.002682283
GO:0008201 heparin binding MF 11 0.000509559 0.003961755
GO:0016829 lyase activity MF 14 0.000517595 0.004017219
GO:0033627 cell adhesion mediated by integrin BP 7 0.000568752 0.004338684
GO:0060613 fat pad development BP 2 0.000641572 0.004700989
GO:0055004 atrial cardiac myofibril development BP 2 0.000641572 0.004700989
GO:0042573 retinoic acid metabolic process BP 4 0.000746605 0.005373383
GO:0010744 positive regulation of macrophage derived
foam cell differentiation BP 3 0.001213816 0.008180862
GO:0001527 microfibril CC 3 0.001701511 0.011131031
GO:0004771 sterol esterase activity MF 2 0.001892322 0.011692505
GO:0044258 intestinal lipid catabolic process BP 2 0.001892322 0.011692505
GO:0060975 cardioblast migration to the midline involved in
heart field formation BP 2 0.001892322 0.011692505
GO:0055005 ventricular cardiac myofibril development BP 3 0.002295769 0.013655073
Nature Genetics: doi:10.1038/ng.2890
59
GO:0009235 cobalamin metabolic process BP 2 0.00609787 0.030353083
GO:0031904 endosome lumen CC 2 0.00609787 0.030353083
GO:0006032 chitin catabolic process BP 2 0.00609787 0.030353083
GO:0090197 positive regulation of chemokine secretion BP 2 0.00609787 0.030353083
GO:0061302 smooth muscle cell-matrix adhesion BP 2 0.00609787 0.030353083
GO:0009268 response to pH BP 4 0.006630306 0.03263869
GO:0000302 response to reactive oxygen species BP 9 0.007174399 0.034854904
GO:0009651 response to salt stress BP 4 0.008361563 0.03969594
GO:0006909 phagocytosis BP 7 0.008366888 0.03969594
GO:0031016 pancreas development BP 12 0.008711503 0.04093855
GO:0042542 response to hydrogen peroxide BP 7 0.008864281 0.041612615
GO:0005859 muscle myosin complex CC 3 0.009953381 0.045386196
GO:0042359 vitamin D metabolic process BP 3 0.009953381 0.045386196
Note: The full lists can be obtained from the authors upon request.
Supplementary Table 18. Metabolism pathways (KEGG) enrichment by DGEs
between pre-and post-metamorphosis.
KO_ID Pvalue Gene Num Drscription
ko04610 1.36E-32 28 Complement and coagulation cascades
ko04974 3.84E-15 16 Protein digestion and absorption
ko01120 7.52E-07 14 Microbial metabolism in diverse environments
ko04975 6.47E-18 14 Fat digestion and absorption
ko01110 1.49E-03 13 Biosynthesis of secondary metabolites
ko04976 2.12E-10 13 Bile secretion
ko03320 5.92E-09 11 PPAR signaling pathway
ko04151 4.07E-02 10 PI3K-Akt signaling pathway
ko04972 5.90E-07 10 Pancreatic secretion
ko05322 1.75E-08 9 Systemic lupus erythematosus
ko01200 2.13E-04 8 Carbon metabolism
ko05150 1.29E-08 8 Staphylococcus aureus infection
ko05168 6.22E-03 8 Herpes simplex infection
ko00380 7.97E-06 7 Tryptophan metabolism
ko00500 1.02E-06 7 Starch and sucrose metabolism
ko00830 4.87E-07 7 Retinol metabolism
ko00983 4.87E-07 7 Drug metabolism - other enzymes
ko04973 7.87E-08 7 Carbohydrate digestion and absorption
ko04977 1.31E-07 7 Vitamin digestion and absorption
ko05020 7.13E-07 7 Prion diseases
ko05146 4.05E-04 7 Amoebiasis
ko00010 1.64E-04 6 Glycolysis / Gluconeogenesis
ko00600 1.64E-04 6 Sphingolipid metabolism
ko02010 1.64E-04 6 ABC transporters
Nature Genetics: doi:10.1038/ng.2890
60
ko04145 1.25E-02 6 Phagosome
ko04380 1.50E-02 6 Osteoclast differentiation
ko04910 1.25E-02 6 Insulin signaling pathway
ko04920 7.60E-04 6 Adipocytokine signaling pathway
ko05164 3.30E-02 6 Influenza A
ko00260 1.60E-03 5 Glycine, serine and threonine metabolism
ko00330 2.70E-03 5 Arginine and proline metabolism
ko00340 5.92E-05 5 Histidine metabolism
ko00980 8.45E-06 5 Metabolism of xenobiotics by cytochrome P450
ko00982 6.52E-07 5 Drug metabolism - cytochrome P450
ko04260 8.38E-03 5 Cardiac muscle contraction
ko04514 3.52E-02 5 Cell adhesion molecules (CAMs)
ko04530 2.35E-02 5 Tight junction
ko04614 8.45E-06 5 Renin-angiotensin system
ko04668 3.52E-02 5 TNF signaling pathway
ko04670 2.82E-02 5 Leukocyte transendothelial migration
ko04950 2.82E-04 5 Maturity onset diabetes of the young
ko05160 3.72E-02 5 Hepatitis C
ko05204 3.00E-05 5 Chemical carcinogenesis
ko00052 1.35E-03 4 Galactose metabolism
ko00140 3.32E-03 4 Steroid hormone biosynthesis
ko00480 2.38E-03 4 Glutathione metabolism
ko00520 1.18E-02 4 Amino sugar and nucleotide sugar metabolism
ko00561 5.16E-03 4 Glycerolipid metabolism
ko00564 3.48E-02 4 Glycerophospholipid metabolism
ko00591 6.84E-06 4 Linoleic acid metabolism
ko00680 5.15E-04 4 Methane metabolism
ko04512 3.72E-02 4 ECM-receptor interaction
ko04640 1.89E-02 4 Hematopoietic cell lineage
ko04917 3.48E-02 4 Prolactin signaling pathway
ko04918 2.61E-02 4 Thyroid hormone synthesis
ko04930 5.89E-03 4 Type II diabetes mellitus
ko04978 1.65E-03 4 Mineral absorption
ko05031 1.89E-02 4 Amphetamine addiction
ko05133 1.73E-02 4 Pertussis
ko05140 2.42E-02 4 Leishmaniasis
ko00030 1.10E-02 3 Pentose phosphate pathway
ko00051 9.38E-03 3 Fructose and mannose metabolism
ko00120 3.34E-03 3 Primary bile acid biosynthesis
ko00270 3.01E-02 3 Cysteine and methionine metabolism
ko00350 2.44E-02 3 Tyrosine metabolism
ko00590 1.48E-02 3 Arachidonic acid metabolism
ko00620 2.44E-02 3 Pyruvate metabolism
ko00860 1.48E-02 3 Porphyrin and chlorophyll metabolism
Nature Genetics: doi:10.1038/ng.2890
61
ko04612 4.73E-02 3 Antigen processing and presentation
ko05130 2.44E-02 3 Pathogenic Escherichia coli infection
ko05219 4.73E-02 3 Bladder cancer
ko05321 4.00E-02 3 Inflammatory bowel disease (IBD)
ko00053 1.74E-02 2 Ascorbate and aldarate metabolism
ko00360 3.25E-02 2 Phenylalanine metabolism
ko00592 1.33E-02 2 alpha-Linolenic acid metabolism
ko00627 1.74E-02 2 Aminobenzoate degradation
ko00760 4.45E-02 2 Nicotinate and nicotinamide metabolism
ko00770 2.20E-02 2 Pantothenate and CoA biosynthesis
ko00920 3.25E-02 2 Sulfur metabolism
ko00626 2.63E-02 1 Naphthalene degradation
Supplementary Table 19. Positively selected genes involved in the benthic adaptation.
We detected the significance of positive selection using branch-site likelihood ratio tests
(LRT) for differentially expressed genes between pre- and post metamorphosis fish. In
order to reduce false positive results, we further filtered genes without positive selected
site with Bayesian empirical Bayes posterior probability >0.95. After that, we identified
15 positive selected tongue sole genes (FDR<0.05).
ID Gene
name
P value q value Expression
in
pre-metamo
rphosis
Expression in
post-metamorp
hosis
Pvalue
Cse_R005509 xdh 5.14E-04 1.17E-02 2.12 15.64 0.00478
Cse_R006646 cd74 6.87E-06 7.04E-06 1.38 57.02 0.00282
Cse_R006647 cdhr2 1.81E-03 2.26E-02 0.29 24.56 5.82E-07
Cse_R007133 slc15a2 6.21E-04 1.22E-02 1.72 19.23 0.000714
Cse_R010110 cp 3.38E-05 3.63E-05 2.89 192.85 2.00E-06
Cse_R010893 mep1b 4.23E-03 3.94E-02 1.54 37.81 0.000106
Cse_R011343 hnf4a 2.35E-03 2.68E-02 0.13 13.55 0.00840
Cse_R011953 Mgam 4.27E-03 1.12E-02 1.05 69.54 1.08E-07
Cse_R012520 fbn1 4.65E-06 4.89E-06 2.04 11.80 0.0227
Cse_R012542 pepd 5.99E-05 6.21E-05 15.23 80.23 0.0215
Cse_R017122 gda 1.02E-05 1.18E-05 0.70 10.13 0.0480
Cse_R017410 itih2 7.24E-05 7.40E-05 7.82 634.28 1.01E-07
Cse_R019232 ace2 4.06E-05 4.31E-05 0.64 127.63 2.39E-08
Cse_R021489 cpb1 1.98E-03 2.39E-02 22.36 1147.71 2.15E-06
Cse_R004565 tmem67 1.55E-11 1.74E-11 14.43 1.19 0.00372
Nature Genetics: doi:10.1038/ng.2890
62
Supplementary Table 20. Differentially expression of visual genes in tongue sole.
Gene ID Gene name
Pre-metamorp
hosis
Post-meta
morphosis
Deseq
Pvalue
Cuffdiff
Pvalue
Cse_R012558 arl6 6.59 2.70
Cse_R021804 arr3a 5.29 0.00
Cse_R009217 arrb2a 20.36 25.25
Cse_R013257 cngb1a 0.79 0.22
Cse_R009643 crx 8.13 6.53
Cse_R005505 cryaa 7.66 9.67
Cse_R009471 cryba1 22.03 56.49
Cse_R006443 cryba1-2 6.63 5.97
Cse_R007509 cryba2-2 53.91 83.75
Cse_R005516 cryba4 36.03 61.36
Cse_R005643 crybb1 48.15 111.86
Cse_R022133 crygm1 45.77 217.69
Cse_R022134 crygm2b 79.00 143.78
Cse_R022135 crygm2d1 0.33 0.00
Cse_R022136 crygm3 105.71 195.36
Cse_R022137 crygm4 141.11 365.55
Cse_R022138 crygm6 93.50 123.73
Cse_R022139 crygm7 29.50 71.71
Cse_R022140 crygmx 17.05 25.63
Cse_R011703 crygn1 63.38 105.15
Cse_R007152 crygs1 1.10 0.00
Cse_R020157 gnb1b 55.26 59.07
Cse_R014457 gprc5c 1.92 6.54
Cse_R006986 grk1a 0.63 1.48
Cse_R009520 grk1b 4.62 1.01
Cse_R010769 grk7a 2.94 0.94
Cse_R020372 guca1a 0.93 1.61
Cse_R012153 gucy2f 0.36 0.59
Cse_R003506 lws-1 10.02 72.00 1.65E-01 6.87E-03
Cse_R004508 nr2f5 7.42 6.39
Cse_R020265 nrl 0.00 1.75
Cse_R007130 nyx 4.17 0.21
Cse_R018786 opn3 1.06 2.02
Cse_R004111 opn4a 1.04 0.67
Cse_R012671 pax6b 1.26 1.46
Cse_R006160 pde6a 0.52 1.27
Cse_R009845 pde6d 8.00 7.27
Cse_R007917 pde6h 89.67 32.60
Cse_R021007 prph2b 3.58 1.65
Cse_R001520 prph2l 0.80 0.24
Nature Genetics: doi:10.1038/ng.2890
63
Cse_R022101 rbp4l 58.95 46.96
Cse_R021726 rcv1 17.41 4.09
Cse_R001743 rdh12 5.52 8.48
Cse_R003214 rdh5 0.89 9.58
Cse_R004398 rgra 12.68 9.88
Cse_R002190 rh1 32.40 202.00 1.72E-01 1.62E-02
Cse_R002696 rh1-2 1.32 0.25
Cse_R003604 rh2-3 116.42 0.66 1.46E-06 1.33E-01
Cse_R003115 rh2-4 5.67 1.36
Cse_R012426 rlbp1a 3.47 2.07
Cse_R012949 rlbp1b 9.33 11.19
Cse_R020449 rom1a 0.00 0.89
Cse_R010366 rpe65a 4.21 4.45
Cse_R010235 rpe65b 3.09 4.33
Cse_R001395 rrh 0.00 12.51 2.19E-02 5.11E-01
Cse_R012478 slc17a6b 3.97 3.34
Cse_R012973 slc24a1 0.09 1.25
Cse_R003307 sws2 0.55 1.50
Cse_R021296 tmt opsin 5.27 1.48
Cse_R020410 unc119b 4.82 10.12
Cse_R021776 val-opsin 0.30 0.22
Cse_R004132 vsx1 2.65 1.97
Cse_R001732 vsx2 0.77 0.72
Supplementary Table 21. Distribution of visual genes among different teleost
species.
Nature Genetics: doi:10.1038/ng.2890
64
Gene name position of zebrafish position of tongue sole position of medaka position of
stickleback position of takifugu position of tetradon
arl6 chr1:34630127:34637106 chr5:7137306:7138772 chr3:15631376:15633
349
chrII:8144964:814627
4 chrUn:5950961:5952185
chr5:7965139:796633
1
arr3a chr10:22210250:22214264 chr15:7865396:7868083 chr14:7947483:79497
87
chrIV:5743631:57462
65
chrUn:257902508:25790
5103
chr7:10308243:10310
311
arrb2a chr10:23288618:23304120 chr19:16572349:16599076 chr14:24197757:2420
8045
chrUn:29402686:2941
7341
chrUn:303685493:30369
3065
chr7:10573105:10580
512
cngb1a chr18:46076831:46110055 chr6:2061655:2072145 chr20:9640800:96628
66
chrII:22694544:22706
165
chrUn:306614647:30662
1499
chr5:13085214:13090
204
crx chr5:35041102:35043309 chr19:5803489:5805460 chr13:17569000:1757
0552
chrVII:12798906:1280
0819
chrUn:172424089:17242
5882
chr7:1171440:117342
2
cryaa chr1:22268452:22271596 chr14:4858664:4860180 NA chrI:27017447:270183
51
chrUn:217123687:21712
5074
chr3:7534674:753586
6
cryaba chr15:16791823:16796052 NA chr14:13383276:1338
3930 NA NA NA
cryabb chr5:58618320:58620950 chr19:17025715:17025826 NA chrVII:21681057:2168
1180
chrUn:26471838:264721
50
chr7:4487432:448774
5
cryba1 chr15:28615131:28620923 chr19:14128450:14129422 chr13:16481334:1648
3413 chrI:7614263:7615844
chrUn:35523910:355258
02
chr7:8298276:829926
3
cryba1-2 chr1:42024656:42033958 chr15:9864342:9865758 chr10:10377305:1037
8365
chrIV:9405971:94071
83
chrUn:223925480:22392
6459
chr1:7105519:710648
7
cryba2 chr6:15673295:15675736 NA chr14:8524577:85251
81 NA NA NA
cryba2-2 chr9:11131802:11137423 chr16:17922641:17923681 NA chrXVI:14978970:149 chrUn:155219702:15522 chr2:17695776:17697
Nature Genetics: doi:10.1038/ng.2890
65
80571 1085 141
cryba4 chr14:48367386:48372475 chr14:17892038:17894070 NA chrVII:19097782:1909
8722
chrUn:74765633:747666
65
chr4:3449657:345071
6
crybb1 chr10:43061615:43066563 chr14:17897036:17899608 chr12:1196622:11994
01
chrIV:9409132:94100
24
chrUn:74761122:747632
99
chr4:3452912:345484
2
crybb2 chr8:44340649:44346685 NA chr9:21418050:21419
130
chrXIII:14810946:148
12077
chrUn:29139679:291406
51
chr12:11778378:1177
9338
crybb3 chr5:4046262:4061457 NA chr5:17098228:17099
427
chrXIII:14806389:148
07775
chrUn:188989243:18899
0331
chr11:1309064:13100
05
crygm1 chr9:21365809:21366459 chr16:2421646:2422326 NA chrXVI:17433612:174
34243
chrUn:397245533:39724
6165
chr2:9027954:902859
0
crygm2a chr9:21429273:21430015 NA chr9:864770:865364 chrXVI:17415572:174
16177 NA
chr2:9042505:904569
9
crygm2b chr9:21422138:21422798 chr16:2436089:2436699 NA chrUn:6802526:68031
41
chrUn:83016035:830166
25
chr2:9040268:904085
6
crygm2c chr9:21375831:21376491 NA NA NA NA NA
crygm2d1 chr9:21692397:21693001 chrW:2540941:2541560 chr6:372787:373433 chrXVI:17420627:174
24128
chrUn:83001430:830047
47
chr2:9034905:903789
8
crygm2d2 chr9:21671332:21671999 NA NA NA chrUn:83008738:830093
66 NA
crygm3 chr9:21452416:21559110 chr16:2424421:2425582 NA NA NA NA
crygm4 chr6:33688:34417 chr16:2433575:2434176 NA chrXVI:17405304:174
09304
chrUn:83019584:830202
08
chr2:9029746:903036
6
crygm6 chr9:21354619:21355403 chr16:2427241:2428202 NA NA NA NA
crygm7 chr9:21346935:21347528 chrZ:12490586:12491202 NA chrXVI:17438402:174 chrUn:217290278:21729 chr3:7688566:768916
Nature Genetics: doi:10.1038/ng.2890
66
39131 0877 9
crygmx chr12:29716014:29718879 chr8:1700678:1701431 chr19:8685105:86857
06
chrV:7963350:796397
7
chrUn:78951321:789519
80
chr2_random:234222
:234896
crygn1 chr24:39452487:39456180 chr3:11087047:11088304 chr20:2703841:27061
45
chrIII:11471667:11473
075
chrUn:137833855:13783
4986
chr15_random:20041
99:2005300
crygs1 chr22:12263776:12266898 chr16:2450946:2458805 NA NA NA NA
crygs2 chr9:27530684:27530959 NA chr19:8687483:86877
05
chrXVI:17677440:176
77712
chrUn:217298046:21729
8289
chr2:9025887:902612
7
crygs3 chr9:21705154:21714358 chr8:1702178:1703179 chr10:10380534:1038
0642
chrXVI:17444639:174
45744
chrUn:82997227:829978
78
chr2:9049313:904994
2
crygs4 chr9:21718929:21719857 NA NA NA NA NA
exorhodopsin chr5:3289911:32901252 C18048053:0:132 NA NA chrUn:100317234:10056
3867
chr9:6675526:667795
4
gnb1b chr6:56918432:56934276 chr10:6957465:6964184 chr5:6191224:619460
9
chrXII:5927076:59362
08
chrUn:214305327:21431
3356
chr11:5409500:54115
64
gprc5c chr12:35420963:35428566 chr8:14767304:14769095 chr19:20498779:2050
2965
chrUn:25840490:2584
2902
chrUn:200648387:20065
0714
chr2:2277187:227893
6
grk1a chr1:118715:129410 chr16:3035456:3039078 chr3:14912611:14918
790
chrII:7743346:774845
1 chrUn:5580656:5583387
chr5:8313086:831559
5
grk1b chr5:54855883:54888962 chr19:17064421:17067690 NA chrVII:18425527:1843
6420
chrUn:25266325:252689
39
chr7:5698768:570135
5
grk7a chr2:15562539:15568697 chr20:4393705:4397167 chr17:5644952:56492
12
chrIII:9026311:903145
3
chrUn:227707038:22771
0945
chr15:3157486:31614
65
grk7b chr18:40733846:40746948 NA NA NA NA NA
guca1a chr4:22136311:22142451 chr8:19759018:19761342 chr23:17668993:1767
1643 NA
chrUn:63168264:631697
04
chr19:1664665:16655
72
Nature Genetics: doi:10.1038/ng.2890
67
gucy2f chr15:29896697:29912292 chr4:2154308:2160233 chr13:28006805:2801
8225 chrI:3129415:3136785
chrUn:201555218:20156
3682
chrUn_random:19059
923:19069042
lws-1 chr11:25243074:25244840 chr11:13019711:13021894 chr5:27015550:27017
094
chrXVII:10627215:106
29057
chrUn:151405983:15140
7509
chr11:10122886:1012
4420
lws-2 chr11:25246618:25248121 NA NA NA NA NA
melanopsin chr2:30821271:30822770 NA NA NA NA NA
nr2f5 Zv8_scaffold1983:39478:7
0574 chr13:17583519:17596933
chr16:26934074:2695
2893
chrXIII:10982581:109
88486
chrUn:230315105:23032
5107
chrUn_random:29073
690:29082306
nrl chr20:679387:687840 chr2:16689639:16692983 chr4:4467562:447035
2
chrXIX:15625005:156
25472
chrUn:45260212:452627
94
chr1:12102639:12105
214
nyx chr9:33757710:33761079 chr16:903013:905666 chr21:13997418:1400
2239
chrXVI:16092902:160
94781
chrUn:153909047:15391
1803
chr2:8833914:883590
5
opn3 chr13:43477486:43498878 chr12:11340305:11345446 chr15:2447345:24618
11
chrVI:2329590:23353
42
chrUn:116719299:11672
4962
chrUn_random:53788
012:53814378
opn4a chr13:23043402:23086206 chr12:1859621:1866421 chr15:11524970:1153
7721
chrVI:15314511:1531
8456
chrUn:193506712:19350
9497
chr17:12081248:1208
3233
opn4b chr12:21759249:21827465 chr8:2508389:2515823 NA NA NA NA
opn4d Zv8_NA655:10839:11037 NA NA NA NA NA
opn4xa Zv8_scaffold3005:11589:3
6184 chr14:21630852:21635640
chr12:12936170:1294
2529
chrXIV:6298337:6304
189
chrUn:156719206:15672
1931
chr4:2148619:215174
9
pax6b chr7:15139477:15157263 chr5:13259747:13274639 chr3:22095052:22104
761
chrII:12855506:12865
591
chrUn:292688922:29269
8404
chr5:4249477:425881
5
pde6a chr14:26497193:26525473 chr15:11714942:11738426 chr12:14585041:1459
2731
chrIV:4187015:42019
95
chrUn:159149146:15915
3473
chr1_random:742637
:751021
Nature Genetics: doi:10.1038/ng.2890
68
pde6d chr6:31612913:31624502 chr2:8289672:8295840 chr4:5917997:592164
7
chrUn:12645200:1265
3807
chrUn:136520885:13652
6092
chrUn_random:97000
19:9702642
pde6h chr6:297076:297612 chr17:12893019:12893489 chr1:14878202:14878
498 chrIX:585765:586081
chrUn:133985019:13398
5345
chr1:9679461:967978
9
prph2a chr12:34128364:34131833 NA NA chrV:2546241:254999
4
chrUn:300569865:30057
3269 chr2:302665:305548
prph2b chr13:3179895:3184398 chr8:6100658:6103500 chr15:12169467:1218
5745
chrVI:16880111:1688
1793
chrUn:309697503:30969
9893
chr17:7941048:79433
76
prph2l chr20:27026320:27031560 chr1:4746754:4748889 chr22:12108231:1210
9881
chrXV:5358746:53609
25
chrUn:160681709:16068
3450
chr10:5763182:57646
82
rbp4l chr21:21626947:21628781 chr14:2861227:2862229 chr12:14599160:1460
0382
chrXIV:7331671:7332
831
chrUn:268901711:26890
2547
chr4:1357247:135807
2
rcv1 chr16:12621814:12630231 chr13:116279:131800 chr16:16772746:1677
5866
chrXX:12101113:1210
3944
chrUn:249252638:24925
5091
chr8:8417368:841987
0
rdh12 chr13:32552398:32562215 chr1:15632409:15635405 chr22:1285332:12908
45
chrXV:12254941:1225
9032
chrUn:324578526:32458
1199
chr10:11977602:1198
0953
rdh5 chr22:10202816:10207270 chr11:18375517:18378022 chr5:11670763:11672
453
chrXVII:4130715:4132
561
chrUn:108967023:10896
8429
chrUn_random:10473
2706:104733737
rgra chr13:28942048:28951369 chr12:6247212:6250653 chr15:21191229:2119
5707
chrVI:9578464:95822
32
chrUn:159082004:15908
5928
chr17:5859132:58624
67
rgrb chr12:46181534:46186396 chr8:9263190:9264701 chr19:10354791:1036
0151
chrV:1041957:104410
0
chrUn:126204382:12620
7456
chr2_random:135851
6:1360332
rgs9a chr3:33364845:33544686 chr8:6135569:6192105 chr8:25558944:25566
994
chrV:2578466:259720
7
chrUn:300538227:30054
8989 chr2:325346:346546
rh1 chr8:55021585:55022646 chr10:19960637:19961692 chr7:17099105:17100
166 NA
chrUn:236948934:23695
1379
chr9:6477145:647820
3
rh1-2 chr11:18457297:18458361 chr10:19704632:19710318 chr7:17427109:17431 chrXII:809524:810585 chrUn:329632812:32963 NA
Nature Genetics: doi:10.1038/ng.2890
69
438 3870
rh2-1 chr6:44758736:44760163 NA NA chrUn:4146636:41481
63
chrUn:116235050:11624
1405
chr11:5969516:59711
95
rh2-2 chr6:44763173:44765699 NA NA chrUn:4159229:41607
11
chrUn:121175430:12117
6839 NA
rh2-3 chr6:44768320:44769925 chr11:1838439:1839836 NA NA NA NA
rh2-4 chr6:44777361:44779083 chr11:1832872:1834306 NA NA NA NA
rlbp1a chr7:13783337:13797933 chr5:11935601:11939095 NA chrII:18541351:18543
730 chrUn:3546250:3548552
chr5:10250288:10252
112
rlbp1b chr25:19784470:19794521 chr6:5183885:5186084 chr6:3016355:301932
0
chrXIX:8198698:8200
593
chrUn:177166614:17716
7920
chr13:12922108:1292
3430
rom1a chr5:69548675:69563648 chr19:3496846:3500860 chr14:20464466:2046
9166
chrVII:15435309:1543
8336
chrUn:268607997:26861
0395
chr1:8145042:814756
1
rom1b chr14:47702028:47724669 NA NA NA NA NA
rpe65a chr18:32417354:32431572 chr20:6990336:6993526 chr17:27878325:2788
1385
chrIII:16242210:16245
102
chrUn:180255831:18025
8488
chr15_random:56733
4:570132
rpe65b chr8:17010806:17034228 chr2:12003333:12009540 NA NA NA NA
rrh chr13:12684316:12696453 chr1:26469191:26474035 chr22:24528456:2469
1117
chrUn:28395602:2840
2886
chrUn:313296421:31331
0193
chrUn_random:84538
671:84549916
slc17a6b chr7:33140640:33160060 chr5:5687819:5698346 chr6:10458327:10466
126
chrII:8899523:890958
5 chrUn:6579519:6587249
chr13:11543805:1154
9215
slc24a1 chr18:19367141:19397385 chr6:12425077:12429331 chr6:11551320:11558
370
chrXIX:6447101:6452
379 chrUn:8377228:8383521
chr13:7978598:79818
51
sws1 chr4:13183278:13185192 NA NA chrUn:22274430:2227
5993 NA NA
sws2 chr11:25238223:25240517 chr11:12958047:12960079 chr5:27005056:27006 chrXVII:10617511:106 chrUn:151401100:15140 chr11:10118271:1012
Nature Genetics: doi:10.1038/ng.2890
70
280 18964 3145 0350
tmt opsin chr24:25845993:25890266 scaffold774:17374:32411 chr20:22126027:2215
4963
chrIII:10225105:10244
263
chrUn:277070438:27708
9218
chr15:2765956:27776
37
tmtopsa chr2:3163071:3247880 NA NA chrXXI:11213300:112
25660 NA NA
unc119b chr21:40981379:41036043 chr4:10969487:10982802 chr14:8502824:85197
33
chrVII:19085243:1909
2983
chrUn:142281722:14228
7312
chr7:8290900:829550
4
valopb chr12:24446255:24463083 NA NA NA NA NA
val-opsin Zv8_scaffold3004:58940:9
0833 chr12:11916080:11925922 chr15:767496:797320
chrVI:1692943:17063
49
chrUn:121171792:12117
265
chrUn_random:94902
109:94915058
vsx1 chr17:15568595:15570735 chr12:3663321:3664736 chr15:11908728:1191
0809
chrVI:16670946:1667
3485
chrUn:333035705:33303
7533
chr17:7840128:78418
68
vsx2 chr17:29320140:29326566 chr1:6241492:6246999 chr24:17331779:1733
6544
chrXV:2100271:21043
65
chrUn:67613215:676166
32
chr10:4896479:49001
16
Note: The pseudogenes were marked by red color.
Nature Genetics: doi:10.1038/ng.2890
71
Supplementary Table 22. Oxford grid showing the numbers of paralogues between
all pairs of tongue sole chromosomes. Red cells imply paralogous chromosomes with
more than 20 common paralogous chromosomes, while other cells to conclude that two
chromosomes are non-paralogous.
Supplementary Table 23. Oxford grid showing the numbers of orthologues between
tongue sole and Tetraodon chromosomes. Chromosome numbers are shown in the order
used in Supplementary Table 22, so that paralogous chromosomes are placed in proximity.
Cells with more than 100 orthologues are highlighted in red and those with more than 20
are in yellow, except for cells labeled “Un”, which represents sequences unplaced to
chromosomes.
Nature Genetics: doi:10.1038/ng.2890
72
Supplementary Table 24. Oxford grid showing the numbers of orthologues between
tongue sole and medaka chromosomes. Chromosome numbers are shown in the order
used in Supplementary Table 22, so that paralogous chromosomes are placed in proximity.
Cells with more than 100 orthologues are highlighted in red and those with more than 20
are in yellow, except for cells labeled “Un”, which represents sequences unplaced to
chromosomes.
Supplementary Table 25. Oxford grid showing the numbers of orthologues between
tongue sole and zebrafish chromosomes. Chromosome numbers are shown in the order
used in Supplementary Table 22, so that paralogous chromosomes are placed in proximity.
Cells with more than 100 orthologues are highlighted in red and those with more than 20
are in yellow, except for cells labeled “Un”, which represents sequences unplaced to
chromosomes.
Nature Genetics: doi:10.1038/ng.2890
73
Supplementary Table 26. List of DCSs. Each row shows the values associated with one
DCS. The columns display, from left to right, the human chromosome number,
chromosome numbers of two tongue sole duplicate chromosomes (Cse-a and Cse-b),
numbers of orthologous genes on the two tongue sole duplicate chromosomes in the DCS.
Hsa Cse-a Cse-b Cse-a
orthologues
Cse-b
orthologues
1 2 20 101 59
1 10 11 93 74
1 13 18 45 66
2 1 7 33 36
2 4 20 10 2
2 8 12 7 30
2 9 12 8 19
2 14 16 43 125
2 14 Z 30 101
3 2 20 12 9
3 3 20 2 2
3 4 20 22 2
3 10 11 29 127
3 13 18 22 13
4 1 9 19 46
4 9 15 43 42
4 14 Z 7 25
5 13 18 10 11
5 14 Z 26 73
5 15 19 89 19
6 1 7 22 35
6 9 12 4 33
6 10 11 8 18
6 13 18 6 15
7 3 20 2 4
7 4 19 9 11
7 6 8 8 28
7 8 17 13 20
7 13 18 34 38
8 3 20 32 24
8 13 18 43 20
8 14 Z 31 19
9 1 9 3 7
9 9 15 19 4
Nature Genetics: doi:10.1038/ng.2890
74
9 14 Z 24 74
10 3 20 38 12
10 6 8 2 5
10 8 12 52 75
11 4 19 51 21
11 5 6 52 80
12 6 8 15 74
12 10 11 56 33
12 14 Z 13 54
13 4 19 5 16
13 14 16 14 30
14 1 7 200 26
15 1 7 46 4
15 5 6 92 39
16 5 6 44 71
16 8 17 42 57
16 9 17 15 23
17 4 19 13 13
17 8 17 50 47
17 9 17 40 69
18 2 20 5 8
18 3 20 7 17
19 2 20 56 10
19 5 6 12 3
19 9 17 39 17
20 9 12 10 15
20 10 11 68 32
21 4 19 6 8
21 14 16 5 15
22 6 8 3 17
22 9 17 14 27
22 14 Z 5 22
X 14 16 8 9
X 15 19 52 18
Nature Genetics: doi:10.1038/ng.2890
75
Supplementary Table 27. Comparison of structural features of tongue sole Z and W
with autosomes.
Chr. Genes per
megabase
Total
Tes
DNA
Tes LINE LTR SINE
Unclassified
Tes
Average
gene
size
(bp)
chr1 43 5.11% 2.41% 0.69% 0.01% 0.32% 1.68% 8,930
chr2 45 4.39% 2.15% 0.45% 0.01% 0.23% 1.55% 9,101
chr3 37 4.97% 2.15% 0.83% 0.06% 0.17% 1.76% 10,650
chr4 44 4.82% 2.17% 0.51% 0.02% 0.20% 1.92% 9,351
chr5 37 3.35% 1.55% 0.33% 0.01% 0.21% 1.25% 10,264
chr6 52 4.10% 1.73% 0.31% 0.01% 0.18% 1.87% 8,039
chr7 47 4.31% 1.90% 0.62% 0.04% 0.12% 1.63% 8,556
chr8 49 5.05% 2.42% 0.68% 0.05% 0.22% 1.68% 9,428
chr9 52 4.65% 2.16% 0.55% 0.02% 0.24% 1.68% 7,833
chr10 49 4.72% 2.34% 0.57% 0.03% 0.24% 1.54% 8,500
chr11 51 3.42% 1.42% 0.40% 0.01% 0.22% 1.37% 8,612
chr12 40 3.46% 1.70% 0.29% 0.01% 0.14% 1.32% 10,820
chr13 43 4.30% 2.15% 0.52% 0.03% 0.20% 1.40% 9,080
chr14 43 4.49% 1.93% 0.64% 0.02% 0.28% 1.62% 9,550
chr15 39 5.08% 2.04% 0.58% 0.08% 0.20% 2.18% 8,594
chr16 43 4.99% 2.42% 0.53% 0.02% 0.19% 1.83% 9,839
chr17 60 4.56% 2.23% 0.50% 0.01% 0.17% 1.65% 7,061
chr18 51 3.55% 1.62% 0.27% 0.01% 0.13% 1.52% 7,630
chr19 48 3.05% 1.32% 0.33% 0.03% 0.18% 1.19% 8,782
chr20 58 2.32% 0.91% 0.21% 0.01% 0.08% 1.11% 7,733
Autosomes 46 4.33% 1.99% 0.51% 0.02% 0.21% 1.60% 8,876
chrZ 42 13.13% 4.74% 3.95% 0.23% 0.43% 3.78% 9,857
chrW 19 29.94% 8.74% 9.39% 1.09% 0.46% 10.26% 12,156
Nature Genetics: doi:10.1038/ng.2890
76
Supplementary Table 28. PAR genes and protein function. PAR region includes two
scaffolds, scaffold589 (398,660 bp) and scaffold757 (243,113 bp), which are anchored to
the distal of Z by and have the same coverage depth in both male and female samples. We
identified 22 protein-coding genes and 1 pseudo gene on PAR, and inferred their function
by BLAST searching against SwissProt (E-value<1e-5) and kept the best hit. Furthermore,
we presented the human orthologue loci, if any.
Gene ID Scaffold Functional Gene
Name
Human
chr. Protein
CSZ00000142.4 scaffold589 Yes pbx3 9 Pre-B-cell leukemia transcription
factor 3
CSZ00000940.4 scaffold589 Yes unknown
CSZ00000660.4 scaffold589 Yes fam125b 9 Multivesicular body subunit 12B
CSZ00000791.4 scaffold589 Yes lmx1b 9 LIM homeobox transcription factor
1-beta
CSZ00000041.4 scaffold589 Yes zbtb34 9 Zinc finger and BTB
domain-containing protein 34
CSZ00000311.4 scaffold589 Yes angptl2 9 Angiopoietin-related protein 2
CSZ00000543.4 scaffold589 Yes stat2 12 Signal transducer and activator of
transcription 2
CSZ00000762.4 scaffold589 Yes hmcn2 9 Hemicentin-2
CSZ00000433.4 scaffold589 Yes ncs1 9 Neuronal calcium sensor 1
CSZ00000899.4 scaffold589 Yes adamts13 9 A disintegrin and metalloproteinase
with thrombospondin motifs 13
CSZ00000859.4 scaffold757 No pbx3 9 Pre-B-cell leukemia transcription
factor 3
CSZ00000288.4 scaffold757 Yes unknown
CSZ00000490.4 scaffold757 Yes gapvd1 9 GTPase-activating protein and VPS9
domain-containing protein 1
CSZ00000020.4 scaffold757 Yes c9orf172 9 Uncharacterized protein
CSZ00000272.4 scaffold757 Yes syn1 X Synapsin-1
CSZ00000758.4 scaffold757 Yes vgll4 3 Transcription cofactor vestigial-like
protein 4
CSZ00000040.4 scaffold757 Yes slc20a1a 2 Sodium-dependent phosphate
transporter 1-A
CSZ00000664.4 scaffold757 Yes dtx1 11 Protein deltex-1
CSZ00000508.4 scaffold757 Yes rasal1 12 RasGAP-activating-like protein 1
CSZ00000897.4 scaffold757 Yes rasal1 12 RasGAP-activating-like protein 1
CSZ00000423.4 scaffold757 Yes dgcr6 22 Protein DGCR6
CSZ00000596.4 scaffold757 Yes slc7a4 22 Cationic amino acid transporter 4
CSZ00000458.4 scaffold757 Yes rnf34 12 E3 ubiquitin-protein ligase RNF34
Nature Genetics: doi:10.1038/ng.2890
77
Supplementary Table 29. Classification of Z and W genes in non-PAR region. Z-W,
genes both on Z and W; Z-A, Z genes which have paralogues on autosomes or unplaced
scaffolds; Z-S, Z specific genes; W-Z_random, W genes homologous to unplaced
Z-linked genes; W-A, W genes which have paralogues on autosomes or unplaced
scaffolds; W-S, W specific genes.
Type
Z (non-PAR) W (non-PAR)
Functional
genes
Pseudo
genes Total
Functional
genes
Pseudo
genes Total
Z-W 286 11 297 272 67 339
Z-A 248 10 258 NA* NA NA
Z-S 370 12 382 NA NA NA
W-Z_random NA NA NA 17 7 24
W-A NA NA NA 26 4 30
W-S NA NA NA 2 0 2
Total 904 33 937 317 78 395
*NA, not available.
Supplementary Table 30. Distribution of pseudogenes on different chromosomes.
Type # Functional
genes # Pseudogenes % Pseudogenes
Z 926 34 3.54
W
(non-PAR)* 317 78 19.74
Autosomal 18,714 475 2.48
Unplaced 1,559 18 1.14
Total 21,516 605 2.73
*Genes in non-PAR region.
Nature Genetics: doi:10.1038/ng.2890
78
Supplementary Table 31. Estimation of divergence rate and divergence time
between Z and W chromosomes. We assume the Z and W chromosome evolutionary
rate both being equal to the lineage specific rate. MY, million years.
Type Ks Time/Rate Min Mean Max
Lineage 0.47 Rate(/Site/Year) 2.76E-09 2.39E-09 2.14E-09
Time(MY) 170 197 220
Z-W 0.15 Rate(/Site/Year) 5.53E-09 4.77E-09 4.27E-09
Time(MY) 27 31 35
Supplementary Table 32. Percentage of genes expressed in testis. “Autosomes
(ortholog)” means autosomal genes that have reciprocal best orthologs on chicken
autosomes, while “chrZ (ortholog)” means chrZ genes that have reciprocal best orthologs
on chicken Z.
Categories % of gene expressed in testis
(RPKM>=1)
% of gene expressed in testis
(RPKM>=10)
Autosomes 87.52 59.47
chrZ 86.73 (0.9646*) 59.46 (0.8866*)
Autosomes(ortholog) 93.73 71.30
chrZ(ortholog) 92.26(0.8964**) 74.23(0.7592**)
*P value of Chi-square test between “chrZ” and “Autosomes”.
**P value of Chi-square test between “chrZ (ortholog)” and “Autosomes (ortholog)”.
Nature Genetics: doi:10.1038/ng.2890
79
Supplementary Table 33. GO enrichment of chicken Z genes (P value<0.01, Fisher
exact test). We annotated the motifs and domains of chicken Z genes by InterProScan
against publicly available databases including Pfam, PRINTS, PROSITE, ProDom,
SMART and PANTHER, and then retrieved Gene Ontology (GO) annotation from the
results of InterProScan.
Terms Definition P value
Z
genes
in GO
All Z
gene
s
All
genes
in GO
All
gene
Enrich
rate
Ontolog
y
GO:0006950 response to stress 0.0005 19 489 180 10843 2.3 BP
GO:0015057 thrombin receptor activity 0.0016 3 489 6 10843 11.1 MF
GO:0004465 lipoprotein lipase activity 0.0020 2 489 2 10843 22.2 MF
GO:0000149 SNARE binding 0.0020 2 489 2 10843 22.2 MF
GO:0015204 urea transmembrane
transporter activity 0.0020 2 489 2 10843 22.2 MF
GO:0008892 guanine deaminase activity 0.0020 2 489 2 10843 22.2 MF
GO:0019905 syntaxin binding 0.0020 2 489 2 10843 22.2 MF
GO:0008531 riboflavin kinase activity 0.0020 2 489 2 10843 22.2 MF
GO:0018342 protein prenylation 0.0020 2 489 2 10843 22.2 BP
GO:0006771 riboflavin metabolic process 0.0020 2 489 2 10843 22.2 BP
GO:0042887 amide transporter activity 0.0020 2 489 2 10843 22.2 MF
GO:0042886 amide transport 0.0020 2 489 2 10843 22.2 BP
GO:0042727 riboflavin and derivative
biosynthetic process 0.0020 2 489 2 10843 22.2 BP
GO:0042726 riboflavin and derivative
metabolic process 0.0020 2 489 2 10843 22.2 BP
GO:0009231 riboflavin biosynthetic
process 0.0020 2 489 2 10843 22.2 BP
GO:0015840 urea transport 0.0020 2 489 2 10843 22.2 BP
GO:0008152 metabolic process 0.0023 160 489 2924 10843 1.2 BP
GO:0033554 cellular response to stress 0.0028 10 489 79 10843 2.8 BP
GO:0000003 reproduction 0.0028 4 489 14 10843 6.3 BP
GO:0022414 reproductive process 0.0028 4 489 14 10843 6.3 BP
GO:0003684 damaged DNA binding 0.0028 4 489 14 10843 6.3 MF
GO:0051716 cellular response to stimulus 0.0037 10 489 82 10843 2.7 BP
GO:0008318 protein prenyltransferase
activity 0.0059 2 489 3 10843 14.8 MF
GO:0055102 lipase inhibitor activity 0.0059 2 489 3 10843 14.8 MF
GO:0004859 phospholipase inhibitor
activity 0.0059 2 489 3 10843 14.8 MF
GO:0006281 DNA repair 0.0082 9 489 78 10843 2.6 BP
GO:0034984 cellular response to DNA 0.0082 9 489 78 10843 2.6 BP
Nature Genetics: doi:10.1038/ng.2890
80
damage stimulus
GO:0006974 response to DNA damage
stimulus 0.0089 9 489 79 10843 2.5 BP
GO:0003824 catalytic activity 0.0097 176 489 3368 10843 1.2 MF
Supplementary Table 34. GO enrichment of tongue sole Z genes (P value<0.01,
Fisher exact test).
GO Terms Definition P
value
Z
gen
es in
GO
All Z
gene
s
All
gen
es
in
GO
All
genes
Enric
h rate Ontology
GO:0003887 DNA-directed DNA polymerase
activity 0.0015 5 682 20 15403 5.6 MF
GO:0034061 DNA polymerase activity 0.0019 5 682 21 15403 5.4 MF
GO:0004375 glycine dehydrogenase
(decarboxylating) activity 0.0020 2 682 2 15403 22.6 MF
GO:0005185 neurohypophyseal hormone
activity 0.0020 2 682 2 15403 22.6 MF
GO:0016642
oxidoreductase activity, acting on
the CH-NH2 group of donors,
disulfide as acceptor
0.0020 2 682 2 15403 22.6 MF
GO:0016638 oxidoreductase activity, acting on
the CH-NH2 group of donors 0.0035 4 682 15 15403 6.0 MF
GO:0050896 response to stimulus 0.0040 25 682 318 15403 1.8 BP
GO:0005730 nucleolus 0.0057 2 682 3 15403 15.1 CC
GO:0006281 DNA repair 0.0059 10 682 89 15403 2.5 BP
GO:0034984 cellular response to DNA damage
stimulus 0.0059 10 682 89 15403 2.5 BP
GO:0004091 carboxylesterase activity 0.0060 5 682 27 15403 4.2 MF
GO:0006974 response to DNA damage
stimulus 0.0064 10 682 90 15403 2.5 BP
GO:0006950 response to stress 0.0077 17 682 200 15403 1.9 BP
GO:0004623 phospholipase A2 activity 0.0082 3 682 10 15403 6.8 MF
GO:0003006 reproductive developmental
process 0.0082 3 682 10 15403 6.8 BP
GO:0007548 sex differentiation 0.0082 3 682 10 15403 6.8 BP
GO:0033554 cellular response to stress 0.0093 10 682 95 15403 2.4 BP
Nature Genetics: doi:10.1038/ng.2890
81
Supplementary Table 35. GO enrichment of tongue sole Z-specific (Z-S) genes (P
value<0.01, Fisher exact test).
GO Terms Definition P value
Z-S
genes
in GO
All
Z-S
genes
All
genes
in GO
All
genes
Enrich
rate Ontology
GO:0006281 DNA repair 0.00003 9 275 89 15403 5.7 BP
GO:0034984 cellular response to
DNA damage stimulus 0.00003 9 275 89 15403 5.7 BP
GO:0006974 response to DNA
damage stimulus 0.00003 9 275 90 15403 5.6 BP
GO:0033554 cellular response to
stress 0.00005 9 275 95 15403 5.3 BP
GO:0050896 response to stimulus 0.00006 17 275 318 15403 3.0 BP
GO:0051716 cellular response to
stimulus 0.00006 9 275 98 15403 5.1 BP
GO:0006259 DNA metabolic process 0.00011 12 275 183 15403 3.7 BP
GO:0004375
glycine dehydrogenase
(decarboxylating)
activity
0.00032 2 275 2 15403 56.0 MF
GO:0005185 neurohypophyseal
hormone activity 0.00032 2 275 2 15403 56.0 MF
GO:0016642
oxidoreductase activity,
acting on the CH-NH2
group of donors,
disulfide as acceptor
0.00032 2 275 2 15403 56.0 MF
GO:0007548 sex differentiation 0.00062 3 275 10 15403 16.8 BP
GO:0003006 reproductive
developmental process 0.00062 3 275 10 15403 16.8 BP
GO:0005730 nucleolus 0.00094 2 275 3 15403 37.3 MF
GO:0006950 response to stress 0.00095 11 275 200 15403 3.1 BP
GO:0005184 neuropeptide hormone
activity 0.00186 2 275 4 15403 28.0 MF
GO:0008009 chemokine activity 0.00377 3 275 18 15403 9.3 MF
GO:0042379 chemokine receptor
binding 0.00377 3 275 18 15403 9.3 MF
GO:0000003 reproduction 0.00512 3 275 20 15403 8.4 BP
GO:0022414 reproductive process 0.00512 3 275 20 15403 8.4 BP
GO:0001664 G-protein-coupled
receptor binding 0.00512 3 275 20 15403 8.4 MF
GO:0003887 DNA-directed DNA
polymerase activity 0.00512 3 275 20 15403 8.4 MF
GO:0034061 DNA polymerase activity 0.00590 3 275 21 15403 8.0 MF
GO:0006955 immune response 0.00607 6 275 92 15403 3.7 BP
Nature Genetics: doi:10.1038/ng.2890
82
GO:0004659 prenyltransferase
activity 0.00629 2 275 7 15403 16.0 MF
GO:0002376 immune system process 0.00674 6 275 94 15403 3.6 BP
Supplementary Table 36. GO enrichment of orthologous Z genes between chicken
and tongue sole (P value<0.01, Fisher exact test). We aligned tongue sole genes to
chicken genes using BLASTP (E-value<1e-5) and identified reciprocal best orthologues
between tongue sole Z and chicken Z. Then we performed GO enrichment of these
orthologous genes on tongue sole Z.
GO Terms Definition P value
Orthologo
us genes
in GO
All
Orthologou
s genes
All
genes
in GO
All
genes
Enrich
rate Ontology
GO:0007548 sex differentiation 0.0001 3 127 10 15403 36.4 BP
GO:0003006
reproductive
developmental
process
0.0001 3 127 10 15403 36.4 BP
GO:0000003 reproduction 0.0006 3 127 20 15403 18.2 BP
GO:0022414 reproductive
process 0.0006 3 127 20 15403 18.2 BP
GO:0006281 DNA repair 0.0008 5 127 89 15403 6.8 BP
GO:0034984
cellular response to
DNA damage
stimulus
0.0008 5 127 89 15403 6.8 BP
GO:0006974 response to DNA
damage stimulus 0.0009 5 127 90 15403 6.7 BP
GO:0033554 cellular response to
stress 0.0011 5 127 95 15403 6.4 BP
GO:0051716 cellular response to
stimulus 0.0013 5 127 98 15403 6.2 BP
GO:0006950 response to stress 0.0014 7 127 200 15403 4.2 BP
GO:0005739 mitochondrion 0.0036 5 127 124 15403 4.9 CC
GO:0008152 metabolic process 0.0042 47 127 4017 15403 1.4 BP
GO:0008892 guanine
deaminase activity 0.0083 1 127 1 15403 121.3 MF
GO:0008410 CoA-transferase
activity 0.0083 1 127 1 15403 121.3 MF
GO:0043566 structure-specific
DNA binding 0.0085 2 127 17 15403 14.3 MF
Nature Genetics: doi:10.1038/ng.2890
83
Supplementary Table 37. GO enrichment of tongue sole W genes (P value<0.01,
Fisher exact test).
GO Terms Definition P
value
W genes in
GO
All W
genes
All
genes
in GO
All
genes
Enrich
rate Ontology
GO:0015103
inorganic anion
transmembrane
transporter
activity
0.0012 4 234 31 15403 8.5 MF
GO:0016192 vesicle-mediated
transport 0.0024 7 234 120 15403 3.8 BP
GO:0015301 anion:anion
antiporter activity 0.0043 3 234 22 15403 9.0 MF
GO:0005452
inorganic anion
exchanger
activity
0.0043 3 234 22 15403 9.0 MF
GO:0015108
chloride
transmembrane
transporter
activity
0.0043 3 234 22 15403 9.0 MF
GO:0015106
bicarbonate
transmembrane
transporter
activity
0.0043 3 234 22 15403 9.0 MF
GO:0015380 anion exchanger
activity 0.0043 3 234 22 15403 9.0 MF
GO:0009266
response to
temperature
stimulus
0.0095 2 234 10 15403 13.2 BP
GO:0009408 response to heat 0.0095 2 234 10 15403 13.2 BP
Nature Genetics: doi:10.1038/ng.2890
84
Supplementary Table 38. Fisher’s exact test for compensated (comp)
/uncompensated (uncomp) Z genes in tongue sole and zebra finch.
Tongue sole
Fisher’s exact test Comp Uncomp
Zebra finch(d1)
Comp 5 9 0.1007
Uncomp 17 9
Zebra finch(d25)
Comp 9 9 0.7504
Uncomp 13 9
Zebra finch(d45)
Comp 6 10 0.1064
Uncomp 16 8
Zebra finch(adult)
Comp 7 9 0.3345
Uncomp 15 9
Supplementary Table 39. Fisher’s exact test for compensated (comp)/
uncompensated (uncomp) Z genes in tongue sole and chicken.
Tongue sole
Fisher’s exact test Comp Uncomp
Chicken(heart)
Comp 19 28 0.3206
Uncomp 27 26
Chicken(brain)
Comp 19 20 0.6858
Uncomp 27 34
Chicken(liver)
Comp 24 30 0.841
Uncomp 22 24
Nature Genetics: doi:10.1038/ng.2890
85
Supplementary Table 40. Sex reversal rate of different families including the
pseudomale families, normal families and temperature-induced families.
Group #Family #Sampl
e
Genetic sex
(♀/♂)
ratio of
genotype
(♀)
Physical sex
(♀/♂)
ratio of
phenotype(♀)
ratio of sex
reversal
Pseudo-male
family 3 87 35/52 0.4 1/86 0.0116 0.9714
6 87 39/48 0.448 5/82 0.057 0.872
56 54 24/30 0.44 0/54 0 1
60 67 48/19 0.71 1/66 0.0149 0.979
78 96 58/38 0.604 5/91 0.052 0.913
Total 391 204/187 12/379 0.9412
Normal family 5 58 27/31 0.465 24/34 0.413 0.111
38 44 18/26 0.409 13/31 0.295 0.277
39 102 45/57 0.441 45/57 0.441 0
57 75 36/39 0.48 31/44 0.413 0.139
65 168 61/107 0.363 50/118 0.297 0.1803
Total 447 187/260 163/284 0.14146
Temperature-indu
ced family 1 90 39/51 0.433 14/76 0.156 0.641
2 90 34/56 0.377 7/83 0.07 0.794
3 87 48/39 0.555 6/81 0.068 0.875
4 52 32/26 0.615 7/45 0.134 0.865
5 70 35/35 0.5 15/55 0.21 0.57
Total 389 188/201 49/340 0.734
Nature Genetics: doi:10.1038/ng.2890
86
Supplementary Table 41. Sex ratio of offspring in the pseudomale families and
normal families.
#family #genetic
females
#total Ratio of genetic
female
normal
family
5# 27 58 46.55%
28# 75 184 40.76%
30# 29 55 52.73%
38# 49 102 48.04%
39# 68 163 41.72%
40# 49 88 55.68%
44# 67 161 41.61%
57# 80 159 50.31%
61# 100 156 64.10%
69# 30 58 51.72%
total 574 1184 48.48%
Pseudomale
family
(2010)
2# 72 156 46.16%
3# 35 87 40.23%
4# 216 384 56.25%
6# 39 87 44.83%
7# 77 152 50.66%
9# 36 57 63.20%
13# 48 96 50.00%
56# 68 138 49.28%
60# 168 236 71.19%
Pseudomale
family
(2011)
7# 32 64 50%
18# 27 58 46.60%
33# 35 79 44.30%
49# 32 60 53.30%
50# 30 60 50%
56# 50 98 51%
total 965 1812 53.26%
Nature Genetics: doi:10.1038/ng.2890
87
Supplementary Table 42. Paternal inheritance of Z chromosome in three WZ
pseudomale families determined by microsatellite analysis. We selected Z specific
microsatellite markers to determine the genotype of parents (pseudo-male and normal
female) in pseudo-male families. For the pseudo-male families with the different
genotype between parents, we next determined the F1 individuals of these families. The
results of microsatellite analysis on F1 individuals showed that about 84%-90% ZW
individual is inherited from the pseudomale.
#Family genotype ZZ ZW
4# marker cyse548 cyse282 shared cyse548 cyse282 shared
homozygous 0 0 0 33 33 31
heterozygous 50 50 50 3 2 1
No Result 0 0 0 0 1 0
total 50 50 50 36 36 32
Ratio of Z inheritance of pseudomale: 90%
6# marker cyse188 cyse203 share cyse188 cyse203 share
homozygous 11 8 7 64 63 63
heterozygous 3 6 1 8 9 8
No Result 0 0 0 0 0 0
total 14 14 8 72 72 71
Ratio of Z inheritance of pseudomale: 86%
20# marker cyse054 cyse167 shared cyse054 cyse167 shared
homozygous 0 0 0 30 28 28
heterozygous 49 47 47 7 7 7
No Result 0 2 0 0 2 0
total 49 49 47 37 37 32
Ratio of Z inheritance of pseudomale: 84%
Supplementary Table 43. Characterization and expression of sex-related genes in
tongue sole. ZW_f1, whole body female (pre-); ZW_f2, whole body female (post-);
ZW_f3, ZW ovary F1; ZW_m1, ZW testis F1; ZW_m2, ZW testis F2; ZZ_m1, whole
body male; ZZ_m2, ZZ testis P.
Nature Genetics: doi:10.1038/ng.2890
88
Gene name Gene ID Chr. Start End ZW_f1 ZW_f2 ZW_f3 ZW_m1 ZW_m2 ZZ_m1 ZZ_m2
amh Cse_R020243 chr2 17672760 17675590 2.56 0.89 0.11 14.44 11.23 1.08 150.3
amhr2 Cse_R019270 chr11 12363596 12365593 0 0 1.39 7.06 12.3 0 31.82
aqp1 Cse_R010791 chr20 8760186 8761902 41.56 26.83 0 2.26 201.28 49 26.62
aqp1o Cse_R010765 chr20 8766748 8768041 1.55 0 828.88 39.33 0 0.97 9.44
arx Cse_R018649 chr3 5193761 5199755 8.5 7.78 0.24 1.51 0 9.76 2
atrx Cse_R006810 chr15 10672523 10695167 35.75 35.68 34.86 44.31 62.18 42.37 44.6
cbx2 Cse_R008247 chr17 567183 571221 18.48 8.91 55.35 49.87 24 16.56 24.05
cd220 Cse_R018728 chr20 9612452 9641156 22.4 27.18 1.64 4.33 3.51 16.62 22.2
cd220 Cse_R009842 chr2 825861 860099 13.36 8.62 1.59 3.49 4.58 5.97 6.18
ctnnb1 Cse_R005015 chr13 6042544 6047398 153.56 128.19 82.98 107.41 97.91 178.64 68.2
cxcr4a Cse_R019874 chr20 5928026 5930142 4.33 1.08 0.17 5.87 8.9 3.33 1.94
cyp19a1a Cse_R012562 chr5 4551572 4554029 0 0 1.03 2.31 6.26 0.1 8.11
cyp19a1b Cse_R021368 chr6 16298965 16303214 1.86 2.62 0 0 0.61 2.33 0.78
dax1 Cse_R021479 chr16 4687139 4688694 4.41 2.55 3.5 3.69 3.34 5.09 16.96
daz1 Cse_R008770 chr18 8407158 8408584 0 0.5 20.9 188.1 312.51 0 212.57
dhh Cse_R002534 chr10 9735734 9739753 1.96 4.42 0 1.6 2.9 1.11 20.74
dkk1 Cse_R004204 chr12 11745215 11747451 2.92 2.68 0.19 1.88 19.97 4.05 0.27
dkk2 Cse_R020696 chr9 4865191 4877866 2.06 4.95 0 0 0.54 3.3 2.43
dmrt1 Cse_R022120 chrZ 8547598 8568446 0 0 0 39.56 10.84 0 71.17
emx2 Cse_R021546 chr12 424796 428876 11.45 14.73 0.7 1.19 2.15 15.14 8.08
fgf20 Cse_R006206 chr15 14686469 14687686 6.01 2.26 4.43 14.35 27.05 1.51 12.1
fgf20 Cse_R015779 chr9 12042006 12044193 0.44 0.33 0.25 0.32 0.29 0.67 0.56
figalpha Cse_R022133 chrZ 1768621 1769856 0 0 0 0.7 0.32 0 0.23
figalpha Cse_R016079 chrW 14879739 14885716 0 0 34.23 2.97 11.26 0 0
follistatin Cse_R017224 chrZ 9741019 9742894 7.66 8.37 0.8 0.76 1.6 8.2 4.54
foxl2 Cse_R021526 chr4 14132731 14133654 1.32 5.19 3.02 1.67 1.08 8.73 2.77
Nature Genetics: doi:10.1038/ng.2890
89
gata4 Cse_R013761 chr7 3907315 3913489 1.7 14.67 0.33 6.79 4.64 0.43 7.98
gsdf Cse_R005415 chr14 21583496 21584102 1.54 1.93 18 318.7 327.1 0 597.83
lgr8 Cse_R009697 chr19 2894064 2924937 0.99 1.81 0.33 1.13 1.21 0.64 4.41
lhx9 Cse_R009838 chr2 3444511 3454533 23.03 18.08 1.37 18.06 13.72 42.27 21.81
lim1 Cse_R011784 chr4 2061610 2068877 11.82 4.73 0.45 0.38 1.55 11.31 1.22
patched1 Cse_R016021 chrW 14123173 14205242 3.8 4.33 0.47 0.64 0.67 0 0
patched1 Cse_R018277 Z_scaffold1319 13876 113068 2.01 4.05 0.17 0.84 0.99 3.89 3.66
pax2a Cse_R004397 chr12 459096 485494 4.46 1.67 0 0.44 0.27 3.25 2.47
pax2b Cse_R014998 chr8 20343835 20371684 3.44 1.86 0 0 0.13 3.25 0.24
pdgfrb Cse_R006353 chr15 2952085 2971257 8.65 10.32 3.55 4.36 7.52 6.61 20.47
pgd2 Cse_R011327 chr3 2242183 2243193 445.36 880.88 4.77 8.85 10.9 531.75 320.5
pod1 Cse_R018196 scaffold1282 223127 224338 4.06 6.96 57.81 369.28 196.46 0.58 346.6
rspo1 Cse_F003435 chr13 2856464 2866399 2.53 7.22 0.41 1.55 2.36 2.61 4.13
rspo2 Cse_R004637 chr13 13688188 13739758 5.47 6.56 0.48 0 0 3.06 0.35
rspo3 Cse_R005061 chr13 11648629 11653403 2 2.4 0.92 0.29 0.26 3.4 0.5
rspo4 Cse_R003828 chr11 4270345 4275630 1.81 1.36 1.73 0.44 0 5.74 2.03
sdf1a Cse_R018932 chr9 15011785 15015943 3.27 17.19 0 0 23.6 3.27 59.12
sf-1 Cse_R016234 chrW 3519616 3570157 0.91 0.68 0 0.5 1.94 0 0
sf-1 Cse_R017371 chrZ 8587385 8596522 0.64 0.48 0 3.07 5 1.27 20.47
sf-1 Cse_R005325 chr14 12590621 12593382 0 0 0 0 0.6 0 0
sox8 Cse_R007785 chr17 13747785 13749575 22.79 16.79 2.12 1.73 0.57 18.37 5.2
sox9a Cse_R008386 chr17 13941510 13944875 15.48 28.5 0 3.54 6.68 22.62 13.11
sox9b Cse_R014685 chr8 12734601 12736657 5.99 12.86 0 0.25 0.34 5.06 2.02
srd5a1 Cse_R008713 chr18 11947508 11950539 17.53 8.87 6.78 3.6 1 16.41 9.45
srd5a2 Cse_R015835 chr9 2972058 2973152 7.52 5.95 0.68 10.64 16.37 6.74 30.14
vasa Cse_R005517 chr14 8303652 8305695 0 0.35 68.01 99.6 83.28 0 135.5
Nature Genetics: doi:10.1038/ng.2890
90
wnt4a Cse_R019600 chr11 10692050 10697716 0.29 0.66 0 0.21 0.19 1.32 0.74
wnt4b Cse_F007376 chr13 2091153 2093238 2.61 1.06 0 0.17 0 1.59 0
wt1a Cse_R013026 chr6 2198125 2215107 1.6 3.61 3.83 22.86 19.43 0.13 35.45
wt1b Cse_R012523 chr5 7208403 7211392 0 0.22 1.17 5.71 2.87 0 7.1
zp3a Cse_R010502 chr20 11812163 11816744 0 0 2010.07 130.69 0 0 0.68
zp3b Cse_R018460 chr17 6833492 6835390 0 0.29 4474.61 302.65 1.16 0.29 2.06
Nature Genetics: doi:10.1038/ng.2890
91
Supplementary Table 44. Differentially expressed genes between normal female
ovaries and pseudomale testes. (see Excel file ‘Supplementary Table 44.xls’)
Supplementary Table 45. GO enrichment by DEGs up-regulated in normal female
ovaries.
GO_ID GO_Term GO_Class P-value Gene number
GO:2000194 regulation of female gonad development BP 3.88E-07 6
GO:0002081 outer acrosomal membrane CC 3.88E-07 5
GO:2000368 positive regulation of acrosomal vesicle exocytosis BP 3.88E-07 5
GO:0001809 positive regulation of type IV hypersensitivity BP 3.88E-07 5
GO:2000388 positive regulation of antral ovarian follicle growth BP 3.88E-07 5
GO:0010513 positive regulation of phosphatidylinositol biosynthetic process BP 3.88E-07 5
GO:2000386 positive regulation of ovarian follicle development BP 3.88E-07 5
GO:0035803 egg coat formation BP 1.15E-06 5
GO:0032190 acrosin binding MF 2.51E-06 5
GO:2000360 negative regulation of binding of sperm to zona pellucida BP 2.51E-06 5
GO:0071421 manganese ion transmembrane transport BP 4.75E-06 5
GO:0048599 oocyte development BP 7.52E-06 8
GO:0032753 positive regulation of interleukin-4 production BP 7.52E-06 5
GO:0005384 manganese ion transmembrane transporter activity MF 7.52E-06 5
GO:0090280 positive regulation of calcium ion import BP 1.22E-05 5
GO:2000344 positive regulation of acrosome reaction BP 1.76E-05 5
GO:0002455 humoral immune response mediated by circulating
immunoglobulin BP 6.49E-05 6
GO:0032236 positive regulation of calcium ion transport via store-operated
calcium channel activity BP 1.21E-04 5
GO:0002922 positive regulation of humoral immune response BP 1.21E-04 5
GO:0032729 positive regulation of interferon-gamma production BP 2.05E-04 5
Nature Genetics: doi:10.1038/ng.2890
92
GO:0070528 protein kinase C signaling cascade BP 3.17E-04 5
GO:0001825 blastocyst formation BP 3.30E-04 6
GO:0005771 multivesicular body CC 3.81E-04 5
GO:0016064 immunoglobulin mediated immune response BP 4.14E-04 7
GO:0032609 interferon-gamma production BP 4.20E-04 6
GO:0045921 positive regulation of exocytosis BP 5.34E-04 6
GO:0002687 positive regulation of leukocyte migration BP 9.81E-04 6
GO:2000242 negative regulation of reproductive process BP 1.23E-03 6
GO:0002250 adaptive immune response BP 1.40E-03 9
GO:0046545 development of primary female sexual characteristics BP 1.40E-03 9
GO:2001257 regulation of cation channel activity BP 1.67E-03 6
GO:0005529 sugar binding MF 1.86E-03 10
GO:0007338 single fertilization BP 2.33E-03 6
GO:0022602 ovulation cycle process BP 2.74E-03 8
GO:0046889 positive regulation of lipid biosynthetic process BP 2.74E-03 6
GO:0046943 carboxylic acid transmembrane transporter activity MF 3.54E-03 9
GO:0061039 ovum-producing ovary development BP 5.07E-03 7
GO:0048924 posterior lateral line neuromast mantle cell differentiation BP 5.07E-03 2
GO:0015291 secondary active transmembrane transporter activity MF 5.55E-03 12
GO:0001817 regulation of cytokine production BP 7.93E-03 12
GO:0008037 cell recognition BP 8.00E-03 7
GO:0002819 regulation of adaptive immune response BP 8.48E-03 6
GO:0015293 symporter activity MF 9.63E-03 10
GO:0017157 regulation of exocytosis BP 9.63E-03 7
GO:0010389 regulation of G2/M transition of mitotic cell cycle BP 9.96E-03 4
GO:0034381 plasma lipoprotein particle clearance BP 9.96E-03 4
GO:0002443 leukocyte mediated immunity BP 1.10E-02 8
GO:0001541 ovarian follicle development BP 1.10E-02 6
Nature Genetics: doi:10.1038/ng.2890
93
GO:0034764 positive regulation of transmembrane transport BP 1.10E-02 6
GO:0046890 regulation of lipid biosynthetic process BP 1.13E-02 7
GO:0002526 acute inflammatory response BP 1.15E-02 6
GO:0002703 regulation of leukocyte mediated immunity BP 1.15E-02 6
GO:0046486 glycerolipid metabolic process BP 1.17E-02 12
GO:0051928 positive regulation of calcium ion transport BP 1.20E-02 6
GO:0005615 extracellular space CC 1.35E-02 19
GO:0042102 positive regulation of T cell proliferation BP 1.47E-02 5
GO:0002699 positive regulation of immune effector process BP 1.47E-02 6
GO:0045017 glycerolipid biosynthetic process BP 1.47E-02 8
GO:0045137 development of primary sexual characteristics BP 1.59E-02 10
GO:0022804 active transmembrane transporter activity MF 1.65E-02 14
GO:0002252 immune effector process BP 1.72E-02 11
GO:0002697 regulation of immune effector process BP 1.85E-02 8
GO:0051897 positive regulation of protein kinase B signaling cascade BP 1.99E-02 5
GO:0005215 transporter activity MF 2.24E-02 33
GO:0022857 transmembrane transporter activity MF 2.32E-02 28
GO:0015101 organic cation transmembrane transporter activity MF 2.35E-02 3
GO:0022891 substrate-specific transmembrane transporter activity MF 2.68E-02 26
GO:0030246 carbohydrate binding MF 2.82E-02 13
GO:0042461 photoreceptor cell development BP 2.82E-02 5
GO:0050727 regulation of inflammatory response BP 2.90E-02 7
GO:0010876 lipid localization BP 3.01E-02 10
GO:0015171 amino acid transmembrane transporter activity MF 4.08E-02 6
GO:0050900 leukocyte migration BP 4.14E-02 9
GO:0004185 serine-type carboxypeptidase activity MF 4.25E-02 2
GO:0022892 substrate-specific transporter activity MF 4.28E-02 28
GO:0003006 developmental process involved in reproduction BP 4.28E-02 13
Nature Genetics: doi:10.1038/ng.2890
94
GO:0006869 lipid transport BP 4.88E-02 9
GO:0006641 triglyceride metabolic process BP 4.88E-02 6
GO:0043491 protein kinase B signaling cascade BP 4.88E-02 6
Supplementary Table 46. GO enrichment by DEGs up-regulated in pseudomale
testes.
GO_ID GO_Term GO_Class P-value Gene Number
GO:0005874 microtubule CC 8.80E-07 27
GO:0005929 cilium CC 1.33E-06 20
GO:0006928 cellular component movement BP 7.88E-03 58
GO:0044463 cell projection part CC 7.88E-03 36
GO:0004111 creatine kinase activity MF 7.88E-03 4
GO:0005827 polar microtubule CC 7.88E-03 4
GO:0030215 semaphorin receptor binding MF 7.88E-03 5
GO:0009434 microtubule-based flagellum CC 7.88E-03 6
GO:0005930 axoneme CC 8.93E-03 9
GO:0007286 spermatid development BP 1.19E-02 9
GO:0016775 phosphotransferase activity, nitrogenous group as acceptor MF 1.20E-02 6
GO:0044430 cytoskeletal part CC 1.20E-02 51
GO:0035639 purine ribonucleoside triphosphate binding MF 1.20E-02 80
GO:0008054 cyclin catabolic process BP 1.20E-02 4
GO:0005856 cytoskeleton CC 1.20E-02 66
GO:0032555 purine ribonucleotide binding MF 1.22E-02 81
GO:0007283 spermatogenesis BP 1.22E-02 18
GO:0019861 flagellum CC 1.32E-02 7
GO:0001831 trophectodermal cellular morphogenesis BP 1.74E-02 4
GO:0031463 Cul3-RING ubiquitin ligase complex CC 1.74E-02 4
GO:0072014 proximal tubule development BP 1.74E-02 4
GO:0019953 sexual reproduction BP 1.75E-02 25
GO:0005509 calcium ion binding MF 1.85E-02 37
GO:0030332 cyclin binding MF 1.89E-02 5
GO:0042995 cell projection CC 1.89E-02 55
Nature Genetics: doi:10.1038/ng.2890
95
GO:0030240 skeletal muscle thin filament assembly BP 2.13E-02 4
GO:0008603 cAMP-dependent protein kinase regulator activity MF 2.21E-02 5
GO:0005524 ATP binding MF 2.38E-02 66
GO:0032559 adenyl ribonucleotide binding MF 2.38E-02 67
GO:0006600 creatine metabolic process BP 2.38E-02 4
GO:0000090 mitotic anaphase BP 2.38E-02 4
GO:0072019 proximal convoluted tubule development BP 2.86E-02 3
GO:0015293 symporter activity MF 2.95E-02 14
GO:0001829 trophectodermal cell differentiation BP 2.95E-02 5
GO:0035024 negative regulation of Rho protein signal transduction BP 3.47E-02 4
GO:0048870 cell motility BP 3.47E-02 44
GO:0004467 long-chain fatty acid-CoA ligase activity MF 3.47E-02 4
GO:0001822 kidney development BP 3.47E-02 16
GO:0001539 ciliary or flagellar motility BP 3.47E-02 5
GO:0044441 cilium part CC 3.47E-02 9
GO:0004053 arginase activity MF 3.47E-02 2
GO:0010963 regulation of L-arginine import BP 3.47E-02 2
GO:0035379 carbon dioxide transmembrane transporter activity MF 3.47E-02 2
GO:0072230 metanephric proximal straight tubule development BP 3.47E-02 2
GO:0072220 metanephric descending thin limb development BP 3.47E-02 2
GO:0020003 symbiont-containing vacuole CC 3.47E-02 2
GO:0085018 maintenance of symbiont-containing vacuole via substance
secreted by host
BP 3.47E-02 2
GO:0072232 metanephric proximal convoluted tubule segment 2
development
BP 3.47E-02 2
GO:0017111 nucleoside-triphosphatase activity MF 4.10E-02 40
GO:0035085 cilium axoneme CC 4.17E-02 6
GO:0032501 multicellular organismal process BP 4.17E-02 186
GO:0015291 secondary active transmembrane transporter activity MF 4.17E-02 16
GO:0015630 microtubule cytoskeleton CC 4.69E-02 35
GO:0016459 myosin complex CC 4.69E-02 8
GO:0043292 contractile fiber CC 4.99E-02 13
Nature Genetics: doi:10.1038/ng.2890
96
Supplementary Table 47. Sex-biased GO.
GO_ID GO_Term GO_Class ovary_expression testis_expression Pvalue
GO:0007340 acrosome reaction BP 62.09 2.51 7.08E-09
GO:0009988 cell-cell recognition BP 41.96 2.23 1.22E-07
GO:0030332 cyclin binding MF 3.65 35.29 4.32E-05
GO:0070528 protein kinase C signaling cascade BP 20.63 2.19 5.21E-05
GO:0005771 multivesicular body CC 36.55 4.71 2.19E-04
GO:0008603 cAMP-dependent protein kinase regulator
activity
MF 2.34 16.96 3.53E-04
GO:0000313 organellar ribosome CC 104.18 17.36 1.23E-03
GO:0005761 mitochondrial ribosome CC 104.18 17.36 1.23E-03
GO:0002920 regulation of humoral immune response BP 15.81 2.86 2.03E-03
GO:0001539 ciliary or flagellar motility BP 2.38 12.15 3.30E-03
GO:0035085 cilium axoneme CC 2.41 12.08 3.61E-03
GO:0006270 DNA-dependent DNA replication initiation BP 35.76 7.32 4.24E-03
GO:0009994 oocyte differentiation BP 29.48 6.15 4.72E-03
GO:0048599 oocyte development BP 29.48 6.15 4.72E-03
GO:0005930 axoneme CC 2.51 11.47 6.19E-03
GO:0032633 interleukin-4 production BP 18.49 4.10 6.59E-03
GO:0044447 axoneme part CC 2.35 10.26 7.78E-03
GO:0070206 protein trimerization BP 2.45 10.49 8.74E-03
GO:0021591 ventricular system development BP 3.00 12.74 9.13E-03
GO:0043049 otic placode formation BP 1.43 5.99 9.99E-03
GO:0009434 microtubule-based flagellum CC 3.95 16.43 1.02E-02
GO:0044241 lipid digestion BP 1.42 5.91 1.03E-02
GO:0006364 rRNA processing BP 91.74 22.57 1.14E-02
GO:0045921 positive regulation of exocytosis BP 16.15 4.01 1.20E-02
GO:0006271 DNA strand elongation involved in DNA
replication
BP 44.79 11.15 1.22E-02
GO:0042254 ribosome biogenesis BP 96.38 24.11 1.25E-02
GO:0060986 endocrine hormone secretion BP 2.01 7.95 1.30E-02
GO:0030916 otic vesicle formation BP 1.38 5.47 1.31E-02
GO:0022616 DNA strand elongation BP 40.80 10.40 1.37E-02
GO:0016460 myosin II complex CC 3.52 13.54 1.51E-02
GO:0016775 phosphotransferase activity, nitrogenous
group as acceptor
MF 2.69 10.33 1.51E-02
Nature Genetics: doi:10.1038/ng.2890
97
GO:0044452 nucleolar part CC 64.56 16.87 1.55E-02
GO:0015669 gas transport BP 3.22 12.31 1.56E-02
GO:0016072 rRNA metabolic process BP 81.15 22.09 1.90E-02
GO:0032350 regulation of hormone metabolic process BP 2.71 9.62 2.21E-02
GO:0000387 spliceosomal snRNP assembly BP 111.25 31.60 2.32E-02
GO:0002076 osteoblast development BP 2.10 7.38 2.36E-02
GO:0009125 nucleoside monophosphate catabolic
process
BP 2.83 9.86 2.42E-02
GO:0009651 response to salt stress BP 1.87 6.41 2.63E-02
GO:0002455 humoral immune response mediated by
circulating immunoglobulin
BP 8.70 2.57 2.81E-02
GO:0005201 extracellular matrix structural constituent MF 1.83 6.16 2.86E-02
GO:0040023 establishment of nucleus localization BP 4.85 16.30 2.87E-02
GO:0030553 cGMP binding MF 1.36 4.55 2.95E-02
GO:0048477 oogenesis BP 21.71 6.67 3.32E-02
GO:0007512 adult heart development BP 1.73 5.52 3.65E-02
GO:2001259 positive regulation of cation channel activity BP 18.04 5.67 3.69E-02
GO:0030532 small nuclear ribonucleoprotein complex CC 112.76 35.57 3.75E-02
GO:0032649 regulation of interferon-gamma production BP 10.28 3.26 3.82E-02
GO:0032609 interferon-gamma production BP 9.72 3.10 3.92E-02
GO:0015172 acidic amino acid transmembrane
transporter activity
MF 1.78 5.57 3.94E-02
GO:0005313 L-glutamate transmembrane transporter
activity
MF 1.78 5.57 3.94E-02
GO:0017158 regulation of calcium ion-dependent
exocytosis
BP 11.79 3.78 4.05E-02
GO:0030551 cyclic nucleotide binding MF 1.77 5.50 4.08E-02
GO:0008173 RNA methyltransferase activity MF 27.25 8.78 4.11E-02
GO:0030315 T-tubule CC 1.55 4.77 4.31E-02
GO:0050886 endocrine process BP 2.10 6.45 4.31E-02
GO:0002437 inflammatory response to antigenic
stimulus
BP 12.63 4.12 4.32E-02
GO:0044058 regulation of digestive system process BP 1.94 5.95 4.37E-02
GO:0001502 cartilage condensation BP 1.65 5.03 4.44E-02
GO:0008033 tRNA processing BP 28.55 9.37 4.44E-02
GO:0051537 2 iron, 2 sulfur cluster binding MF 25.53 8.38 4.45E-02
GO:0071599 otic vesicle development BP 1.88 5.73 4.47E-02
GO:0071600 otic vesicle morphogenesis BP 1.87 5.67 4.51E-02
GO:0032201 telomere maintenance via
semi-conservative replication
BP 37.27 12.30 4.55E-02
GO:0032011 ARF protein signal transduction BP 2.15 6.49 4.65E-02
Nature Genetics: doi:10.1038/ng.2890
98
GO:0007159 leukocyte cell-cell adhesion BP 2.29 6.90 4.66E-02
GO:0060174 limb bud formation BP 1.32 3.96 4.71E-02
GO:0019861 flagellum CC 4.78 14.35 4.75E-02
GO:0006261 DNA-dependent DNA replication BP 31.62 10.55 4.78E-02
GO:0005086 ARF guanyl-nucleotide exchange factor
activity
MF 2.31 6.87 4.93E-02
GO:0034470 ncRNA processing BP 43.42 14.67 4.95E-02
GO:0000146 microfilament motor activity MF 2.33 6.93 4.98E-02
Supplementary Table 48. Metabolism Pathway (KEGG) enriched by DEGs between
female ovaries and pseudomale testes.
KO_ID Pvalue Gene Num Drscription
ko01100 1.00E-02 43 Metabolic pathways
ko01110 3.79E-02 13 Biosynthesis of secondary metabolites
ko04020 5.08E-03 11 Calcium signaling pathway
ko04520 3.45E-04 9 Adherens junction
ko04114 4.77E-03 8 Oocyte meiosis
ko04910 6.16E-03 8 Insulin signaling pathway
ko04972 5.25E-04 8 Pancreatic secretion
ko05412 2.22E-03 8
Arrhythmogenic right ventricular
cardiomyopathy (ARVC)
ko04540 4.03E-03 7 Gap junction
ko04723 1.30E-02 7 Retrograde endocannabinoid signaling
ko04724 2.61E-02 7 Glutamatergic synapse
ko04915 4.94E-03 7 Estrogen signaling pathway
ko04916 8.64E-03 7 Melanogenesis
ko05410 1.20E-02 7 Hypertrophic cardiomyopathy (HCM)
ko05414 2.15E-02 7 Dilated cardiomyopathy
ko03320 4.14E-03 6 PPAR signaling pathway
ko04713 4.57E-02 6 Circadian entrainment
ko04918 5.26E-03 6 Thyroid hormone synthesis
ko04970 4.67E-03 6 Salivary secretion
ko04920 2.03E-02 5 Adipocytokine signaling pathway
ko04971 1.36E-02 5 Gastric acid secretion
ko04974 2.03E-02 5 Protein digestion and absorption
ko00010 3.13E-02 4 Glycolysis / Gluconeogenesis
ko00561 1.77E-02 4 Glycerolipid metabolism
ko00601 1.97E-03 4
Glycosphingolipid biosynthesis - lacto and
neolacto series
ko04070 4.18E-02 4 Phosphatidylinositol signaling system
Nature Genetics: doi:10.1038/ng.2890
99
ko04742 3.24E-03 4 Taste transduction
ko04913 4.18E-02 4 Ovarian steroidogenesis
ko04961 8.57E-03 4
Endocrine and other factor-regulated calcium
reabsorption
ko04962 3.13E-02 4 Vasopressin-regulated water reabsorption
ko00051 2.46E-02 3 Fructose and mannose metabolism
ko00052 3.31E-02 3 Galactose metabolism
ko04964 5.27E-03 3 Proximal tubule bicarbonate reclamation
ko04973 2.08E-02 3 Carbohydrate digestion and absorption
ko00512 4.30E-02 2 Mucin type O-Glycan biosynthesis
ko00981 3.78E-02 1 Insect hormone biosynthesis
Supplementary Table 49. Data production and alignment statistic of smRNA-Seq.
Samples raw reads used reads aligned aligned rate
(%)
used for miRNA
prediction
miRNA reads
number
ZW ovary F1 12,598,07
2
11,338,592 8,169,725 72.05 1,777,571 630,479
ZW ovary F2 17,656,29
5
16,414,960 10,686,452 65.10 3,023,837 872,401
ZW testis F1 12,377,77
3
11,487,874 7,871,839 68.52 3,125,824 1,873,452
ZW testis F2 18,804,13
9
17,476,137 12,072,738 69.08 3,894,509 1,380,666
ZZ testis P 19,931,62
5
19,182,781 12,868,705 67.08 7,826,868 3,964,030
Supplementary Table 50. Differentially expressed miRNAs between female and
reversed male.
ID name ovary_exp testis_exp P-value
m0058 - 0.00 55.77 2.11E-02
m0059 - 0.00 37.66 2.94E-02
m0064 - 364.80 0.00 1.40E-02
m0108 miR-724 0.00 445.44 4.29E-03
m0120 miR-200a 0.00 224.53 1.02E-02
m0182 - 306.12 0.00 1.67E-02
m0212 miR-724 0.00 446.16 4.28E-03
m0248 miR-30c 0.00 170.93 1.37E-02
Nature Genetics: doi:10.1038/ng.2890
100
m0355 miR-7 0.00 816.27 1.44E-03
m0426 - 114.20 0.00 2.19E-02
m0435 miR-455 517.07 0.00 9.60E-03
m0753 - 0.00 47.08 2.35E-02
m0756 miR-125a 0.00 1176.24 6.47E-04
m0766 miR-30c 172.88 0.00 1.94E-02
m0837 miR-132 17.45 344.04 6.06E-02
m0838 miR-212 0.00 813.38 1.45E-03
m0875 miR-124 0.00 34.04 3.38E-02
m1020 miR-181b 0.00 175.28 1.33E-02
m1024 miR-101 0.00 5041.77 1.11E-05
m1123 - 0.00 20.28 8.86E-02
m1124 miR-153 0.00 19.56 9.55E-02
m1140 let-7 505.96 3879.29 5.35E-02
m1148 - 0.00 45.63 2.41E-02
m1151 miR-132 7.93 100.68 9.99E-02
m1260 - 0.00 24.63 5.99E-02
m1306 let-7 0.00 357389.12 7.93E-07
m1344 let-7i 0.00 4500.00 1.58E-05
m1367 - 0.00 68.81 1.98E-02
m1462 let-7i 0.00 4500.00 1.58E-05
m1503 let-7 211074.44 0.00 1.29E-07
m1533 - 39.65 1531.87 1.40E-02
m1567 - 0.00 203.53 1.13E-02
m1568 - 0.00 157.89 1.49E-02
m1595 - 870.77 0.00 5.05E-03
m1599 - 0.00 52.87 2.17E-02
m1614 - 870.77 0.00 5.05E-03
m1628 - 0.00 46.35 2.38E-02
m1633 - 0.00 63.01 2.02E-02
m1638 miR-22 0.00 7.97 2.43E-02
m1672 miR-219 20.92 0.00 7.42E-07
m1635 miR-455 517.07 0.00 9.60E-03
m1652 miR-130 0.00 1379.77 4.50E-04
Nature Genetics: doi:10.1038/ng.2890
101
Supplementary Table 51. Comparison of assembled scaffolds and independently
finished 4 BACs of tongue sole genome. The scaffolds were aligned with the BACs
using BLASTN (E value<1e-20). The alignment blocks were then chained along the
BACs by SOLAR and also with manual confirmation.
BAC ID BAC Len.(bp) Coverage # of scaffold Scaffold Len(bp)
zscgax 127,603 0.97 1 1,146,347
zscgbx 145,278 0.97 1 1,989,243
zscgdx 165,379 0.96 1 1,989,243
zscgex 107,567 0.97 1 618,700
Supplementary Table 52. Comparison of assembled scaffolds and ESTs. We aligned
ESTs to scaffolds using BLAT with default parameters and chose the best hit for each
one.
Len.(bp) Total Total match >=50% coverage >=90% coverage
# % # % # %
All 14,687 14,685 99.99 14,610 99.48 14,365 97.81
>200 14,629 14,622 99.95 14,547 99.44 14,307 97.8
>500 11,130 11,112 99.84 11,068 99.44 10,866 97.63
>1000 1 1 100 1 100 1 100
m1667 - 0.00 159.34 1.47E-02
M1612 miR-27 486.93 60.12 0
m1704 miR-455 517.07 0.00 9.60E-03
m1713 miR-7 55.51 580.15 8.91E-02
m1752 - 880.28 0.00 4.97E-03
Nature Genetics: doi:10.1038/ng.2890
102
Supplementary Table 53. Enrichment of GO terms in expanded gene families of
tongue sole genome. The gene enrichment was compared to all the reference genes.
P-values were calculated by Fisher-exact test, and adjusted by FDR.
GO Term Description GO space Gene
number P value
P value
(Adjusted)
GO:0007017 microtubule-based process BP 11 1.41E-12 5.85E-10
GO:0016051 carbohydrate biosynthetic process BP 7 1.65E-08 3.91E-06
GO:0007156 hurnilic cell adhesion BP 11 2.09E-08 4.34E-06
GO:0008378 galactosyltransferase activity MF 6 1.35E-06 2.04E-04
GO:0006468 protein phosphorylation BP 24 1.69E-06 2.34E-04
GO:0005272 sodium channel activity MF 4 3.36E-06 3.98E-04
GO:0008146 sulfotransferase activity MF 7 3.44E-05 0.003
GO:0043687 post-translational protein
modification BP 6 5.02E-05 0.005
GO:0008076 voltage-gated potassium channel
complex CC 7 1.03E-04 0.009
GO:0005230 extracellular ligand-gated ion
channel activity MF 6 1.98E-04 0.016
GO:0006486 protein glycosylation BP 6 2.98E-04 0.022
Note:BP,Biological Process; MF,Molecular Function; CC,Cellular Component.
Supplementary Table 54. Enrichment of GO terms in contracted gene families of
tongue sole genome. The gene enrichment was compared to all the reference genes.
P-values were calculated by Fisher-exact test, and adjusted by FDR.
GO Term Description GO space Gene
number P value
P value
(Adjusted)
GO:0007186 G-protein coupled receptor protein
signaling pathway BP 37 0.00E+00 0
GO:0004984 olfactory receptor activity MF 21 1.18E-34 9.794E-32
GO:0016021 integral to membrane CC 37 5.16E-11 2.8552E-08
GO:0000786 nucleosome CC 7 7.28E-09 3.0212E-06
GO:0008270 zinc ion binding MF 35 2.74E-08 7.58067E-06
GO:0006334 nucleosome assembly BP 7 2.73E-08 9.0636E-06
GO:0019882 antigen processing and presentation BP 3 3.54E-05 0.008
GO:0020037 heme binding MF 6 1.82E-04 0.038
GO:0004497 monooxygenase activity MF 5 2.38E-04 0.039
GO:0005525 GTP binding MF 12 2.34E-04 0.043
Nature Genetics: doi:10.1038/ng.2890
103
Supplementary Table 55. Oligonucleotide primers used in the study.
Gene name Tm (℃) Primer Sequence 5’-3’
pepd 59.5 F: CAAAATCGGTTGTGGTGCTG
R: CTCCCATCCATGTGGCGTA
fbn1 61 F: TTCCTGGCTCCATCATTTCGT
R: GGCTCCTCTCACACTCGTCATC
mgam 61 F: GTCCCCATCAGCGACACGTT
R: GCAAAGACCAGGGGTGCTATC
ace2 60 F: TGGTGTGTCATCCCACTGCG
R: GGTAGGACAGGTTGCGATAAGC
itih2 61 F: GGAGATCCGACTGTGGGTGAG
R: GCAGCATCGTGGTTGGCATA
gda 61 F: TTCCTGCGGAGTCTCGCTTTA
R: CGGTGGTGGTTCCATTTCTCA
mep1b 60 F: ATAACAGCACCAACCCTAACGG
R:GACCCCTCAAATACAACACGAAA
hnf4a 60 F:AGAGCAGAAATCAGCCACTATCGT
R: ACACAGCCGTTACCTAAAAGCAG
cpb1 61 F: CGGCTACGACTACACCCACAAG
R: TTCTGCGGATGAAGTCGGC
cdhr2 60 F: TCTCGTTAGGGCTGAGGACTTG
R: GTCTATTACGAACACATCCACGGT
slc15a2 61 F: CACCCATCCTCGGAGCTCTTA
R: TCAAACTGGTCTCCTCCAAACG
cp 60 F: AGCCTCTATATGAGCTTCGGGA
R: CATTATGGTGTGGTAGGACCGTT
tmem67 60 F: ACCAGCAGTACAGTGGACGGTT
R: TGTCCTTGCTGCGTTCTCG
xdh 60 F: TGCTGCAAGAACGGAGGTAAC
R: ACTGAACGCTCCCCACGAA
cd74 61 F: GCCTCACTTCAATGACACCTTCC
R: CCTTCTGGCACTTGGTCTTCAC
rh1 61 F: GGGGTCGTCAGGAATCCGTAT
R: TCGTGGTGAATCCTCCGAAGA
lws1 61 F: CCTGCCACTTTCAATAATCCTCC
R: GCAGGCAAAGAAGGTGTAAGGTC
Nature Genetics: doi:10.1038/ng.2890
104
rh2 61 F: GCACCGGAACACCCATCAA
R: CCACCAGAGACCACAGAGCAAC
dmrt1(RT-PCR) 60 F: AAGCAGCGCCCTGACTACAC
R: GCCCTTCAACGGAGACACG
follistatin 60 F:GCGGTGCCAGGTGCTCTA
R: GGGTCCACAGTCAACATTATCG
patched1 60 F: GCTTCAGAGCGTGGCGAC
R: ACCACTGGCTACACGGATGAC
sf-1 60 F:GCTGCCAGTATTGCCGCT
R: GTGATGGTTGGTTGCCCTCT
neurl3 59 F:CTGGTGTTTAGCAGCCGTCCT
R: CCAGAACTCCAGCACTGACCC
dmrt1(BS-PCR,1# exon) 48 F: GGTTAAATATTGTTATAGTAGTAGTAG
R: ACRATTACCTACACCACCA
dmrt1(BS-PCR,1# intron) 50 F: GTTATTGTGATTGGAGGGA
R: ATTATAATAAATTACTCTACAACAT
dmrt1(DM domain) 60 DM-F: AAGCAGCGCCCTGACTACAC
DM-R: GCCCTTCAACGGAGACACG
gas8 45 F: GCTCAGGACCACAACA
R: TTTCCAGGTGCTTCAT
nme5 49 F: TCTTTCCCAGGTTGATTAT
R: TTTGCTCTGAGGCTTTT
ropn1l 52 F: TGCCCAACATCCTCAAA
R: GCAGCGGTTCTCCATTA
tekt1 48 F: TCCAAGAAACGGACAAA
R: TCCAGAGCCTTCAACAC
plcz1 46 F: ATCTACCAAGCCCAAAT
R: TCCTCACCCTCATCTGT
tbpl1 55 F: CACAGCCACAATCTCATCG
R: GAACGGCACGGAACAAA
spag6 51 F: TACAGACCTGGAAACCC
R: CGTCGTGCGAGAAGAT
gal3st1 47 F: TCCTCATAGCGGAACA
R: TCAGACACGGACGAAC
dnajb13 47 F: TCCTCGGTTTGTTAGAG
R: ATGTCGTTGATGGGTAT
cldn11 54 F: ACCACTGCGTCTCCCTG
R: ACCACCGTGCGTTTGTT
gpr64 53 F: CTGCTGGCTGCGTAATG
Nature Genetics: doi:10.1038/ng.2890
105
R: GCGAACAGAGCGAAGC
aqp1 54 F: GAATAGCAGCAGCCCTCA
R: TGTCATCAGCAGCATCCC
β-actin 61 F: CCTTGGTATGGAGTCCTGTGGC
R: TCCTTCTGCATCCTGTCGGC
Nature Genetics: doi:10.1038/ng.2890
106
Supplementary Note
DNA library construction and sequencing
One adult female and one adult male of tongue sole were selected for whole genome
shotgun sequencing and were temporarily maintained at 22 C in the facilities of the Key
Laboratory of Sustainable Development of Marine Fisheries in Qingdao. Fish blood cells
were taken from the selected female and male using sterile injectors with pre-added
anticoagulant solutions (0.5 M EDTA, pH8.0). High quality genomic DNA suitable for
construction of the large fragment insert libraries (2k~40Kb) was extracted from the
blood cells using Puregene Tissue Core Kit A (Qiagen, Maryland, USA). Fifteen
paired-end libraries for the female and 11 paired-end libraries for the male (170bp~40Kb)
were then constructed using the Illumina standard operating procedure. Paired-end
sequencing was performed on an Illumina Hiseq2000 for each library.
Genome assembly
Raw data of 91.35 Gb and 67.86 Gb were obtained for the female and male individual,
respectively. Before genome assembly, we filtered out artificial and low-quality reads to
obtain a usable reads set containing 857.5 M and 730.0 M reads, representing 63.86 Gb
and 46.67 Gb of data, for the female and male individual, respectively. The genome
coverage was 212. In addition, we corrected sequencing errors for the 17-mers with a
frequency lower than four using a method described in a previous study1. We then
assembled the reads into contigs and scaffolds to build the male and female genomes
using SOAPdenovo2.
Identification of Z and W-linked scaffolds
With the same sequencing coverage, the depth of Z-linked scaffolds in the
non-pseudoautosomal region (non-PAR) in the female is expected to be half of that in
male Z chromosome, in female autosomes and in male autosomes, respectively
Nature Genetics: doi:10.1038/ng.2890
107
(Zf=1/2Zm=1/2Af=1/2Am). Accordingly, we identified 26 and 126 Z-linked scaffolds in
the male and female assembly, respectively. For the W-linked scaffolds in the female
assembly only, we expected that these scaffolds should not be covered by male reads, and
that the sequencing depth should be about half of the average value of autosomes by
female reads. Using this method, we identified 306 W-linked scaffolds in the non-PAR,
representing 16.4 Mb with a scaffold N50 of 128 Kb. Considering the interference of W
reads and the high quality assembly in the male genome relative to the female genome for
Z assembly (scaffold N50 of 1,305 Kb versus 357 Kb), we chose the scaffolds from the
male assembly as the Z-linked scaffolds in the final version. For other scaffolds,
representing autosomes and W, and other undetected Z-linked scaffolds (if any); we used
the female version. Ultimately, the final genome had contig and scaffold N50 sizes of 26
Kb and 867 Kb, respectively.
Genetic map construction, ordering and chromosomal assignment of
scaffolds
An F1 cross panel between a wild male and a cultured female with 92 offspring was used
for simple sequence repeat (SSR) genetic mapping. Another mapping population
consisting of 216 individuals was used for high-resolution SNP genetic map using
RAD-Seq. Linkage analysis was performed using JoinMap 4.03 with a logarithm of odds
(LOD) score of 3 for grouping. Sex-specific genetic linkage maps were constructed
independently for each parent using informative markers. In a few cases, some markers
were discarded during the mapping stage where their presence caused inconsistencies in
the map. Finally, the two genetic linkage maps were constructed for the tongue sole
comprising 942 SSR markers and 12,142 SNP markers, respectively. We used BLASTN
(E-value <1E-5, identity ≥95%, and aligning rate >50%) to map SSR markers to scaffolds.
Of the 26 Z-linked scaffolds that were identified by depth comparison, 24 resided on the
Z chromosome. In addition, two other small scaffolds (243 Kb and 399 Kb) with 1:1
sequencing depth rate between male and female were located distal to the Z chromosome.
Nature Genetics: doi:10.1038/ng.2890
108
Furthermore, these two scaffolds were orthologous with medaka chromosome 9. We thus
inferred that they are sequences of the PAR of the tongue sole sex chromosomes. For the
W chromosome, we ordered 306 W-linked scaffolds in a pseudo-W chromosome based on
their gene synteny with the Z chromosome. We linked scaffolds onto chromosomes with a
string of 100 ‘N’s representing the gap between two adjacent scaffolds. In total, 944
scaffolds comprising 445 Mb (93.3% of scaffolds in length) were anchored to 22
chromosomes, representing the 20 autosomes, Z and W.
Validation of Z chromosome
To further verify the Z chromosome, all the continuous sequences originating from the
exons of genes in the Z-linked scaffolds that did not match with other genomic regions,
were selected to design primers that would not amplify similar sequences in the genome.
Finally, Primer Premier 5 was used to design the 1-4 primers from compatible exons in
specific sequence of 24 scaffolds and quantitative PCR (qPCR) was used to confirm the
depth ratio between the male and the female. Briefly, the three sample DNAs of male and
female were mixed at a ratio of 1:1:1, and then qPCR was conducted on an Applied
Biosystems 7500 Real-Time PCR System following the standard protocol (SYBR®
Premix Ex Taq™ (Takara)). β-actin was used for normalization and the 2-ΔΔ
CT method
was selected as the relative quantification calculation method. The result shows that the
ratio of ZZ and ZW in 51 genes for 22 scaffolds was approximately 2, suggesting that
these scaffolds are located on the Z chromosome. In addition, the ratio of four genes from
the PAR two scaffolds was almost equal to 1.
Genome evaluation
To assess assembly quality, we analyzed gene region coverage and assembly accuracy.
Firstly, we compared scaffolds with four BACs independently sequenced using Sanger
sequencing technology to assess the large-scale accuracy of the assembly. A fast search
was performed to identify counterparts in scaffolds for each BAC using MUMer4. We
then compared each sequence pair (BAC and scaffold) in detail using BLASTN5 (E value
Nature Genetics: doi:10.1038/ng.2890
109
<1e-20), and used sorting out local alignment result (SOLAR) to chain local alignments
into global results. Up to 96.9% of the sequences of the BACs were well covered by
one-to-one alignments to scaffolds, with few misassemble errors. In addition, the 14,687
ESTs of tongue sole from three samples (ovary, testis and immune tissues) were aligned
to the tongue sole assembly using BLAT6, with a cutoff of identity ≥95% and aligning
rate ≥50%7. The result shows that 99.48% (14,610/14,687) of the ESTs could be detected
in the scaffolds, with a cutoff of identity ≥95% and aligning rate ≥50%. Using a more
stringent cutoff (identity ≥95% and aligning rate ≥90%), 14,365 (97.81%) of them were
still detected. While this likely overestimates coverage because it avoids repeated
sequences, it remains an important indicator of the representation of the transcribed
sequences in the assembly. These data indicate that our assembly has good coverage and
completeness for gene regions.
We used a Kmer-based method to estimate the genome size as described in a previous
study2. From the distribution curve of depth-frequency by 17-mer statistics, we calculated
the genome size as 545 Mb for the female and 495 Mb for the male, according to the
formula: G=kmer_num/kmer_depth. Given the presence of sequencing errors, we expect
that the 17-mer depth an underestimate, and consequently the tongue sole genome size
should be slightly smaller than 545 Mb for the female and 495 Mb for the male. This
estimated genome size indicated that the assembled contigs and scaffolds covered about
83% and 88% of the whole genome, respectively.
We measured the GC content in 500 bp non-overlapping sliding windows along the
genome and filtered out windows containing over 50% Ns. For each window, we divided
the number of bases that were either C or G by the total number of bases, not counting
any ambiguous bases. The average G+C content of tongue sole is consistent at around
40.8%, which is approximately 5% lower than Takifugu (45.5%) and Tetraodon (46.4%),
4% higher than zebrafish (36.8%), and almost equal to medaka (40.5%) and human
(40.9%). The GC content of the tongue sole gene-coding regions is 52.7%, which is about
12% higher than the GC content throughout its whole genome.
Nature Genetics: doi:10.1038/ng.2890
110
Repeat annotation
Construction of a de novo transposable element (TE) library
We used two software packages, PILER-DF8 and RepeatScout
9, to construct a de novo
TE library for the tongue sole genome. We ran the software with default parameters
independently, filtered out too short (<100bp) and gap ‘N’ > 5%, then combined the
results to obtain a consensus library. The library contains 1182 elements that were
classified using homologies with TEs from Repbase10
. Of these, 701 sequences were
considered as “Unknown”, meaning they cannot be classified. Of the 480 annotated
sequences, 325 are shorter than 500 bp and are not necessarily short interspersed elements
(SINEs). This means that only a few elements encode proteins, and even fewer elements
in the tongue sole genome may still be active.
Detection of long terminal repeats (LTRs) and terminal inverted repeats (TIRs)
structures
a. LTR_Finder
LTR_Finder software11
was used to detect LTRs, which is specific for LTR
retrotransposons and also dictyostelium intermediate repeat sequence (DIRS) elements.
No LTRs were found in the TE database. This result is consistent with the protein
prediction and phylogeny (see below): few LTR retrotransposons were found and most of
them are small. These results suggest that LTR retrotransposons are not active in the
genome of the tongue sole.
b. e-inverted
e-inverted software, which is specific for DNA transposons, was used to detect TIRS.
This software did not find any specific TIRs in the tongue sole genomes.
Phylogenies of TEs
a. Methods
Protein prediction:
Using known TE proteins, we used BLAST software on the de novo TE library to predict
proteins on a large scale. We examined all potential predicted proteins for reverse
transcriptases of different families (LINE, DIRS and LTRs) and transposases.
Nature Genetics: doi:10.1038/ng.2890
111
Number of predicted sequences:
-LINE reverse transcriptase: 33 sequences
-LTR reverse transcriptase: 8 sequences
-Penelope reverse transcriptase: 7 sequences
-Transposase: 111 sequences.
The protein sequences were then aligned and phylogenetic trees were constructed.
Phylogeny:
Phylogenetic trees were constructed using the SEAVIEW-PhyML software12
.
b. All reverse transcriptase phylogeny
There are three different types of retrotransposons orders (LINE, LTR and Penelope);
therefore, we decided to align them and perform a complete reverse transcriptase tree. We
could easily differentiate the three different orders and the four families of LINE. The
main branches were well supported with high bootstrap values. From this tree, we also
observed that there are many more LINE elements compared with LTRs and Penelope.
Two Penelope elements, one Rex/Babar and one Gypsy, were not placed together with
their respective families.
c. DNA Transposons
We identified 90 Tc1 elements, one Tc5, eight hAT, two PiggyBac, six Buster, one
Harbinger and six Pogo-like elements in the tongue sole genomes.
The genome of the tongue sole contains around 5.85% of TEs, mainly represented by Tc1
transposons and LINE RTE and Babar retrotransposons. There are few LTR
retrotransposons in the genome of the tongue sole. The main LTR elements present are
Sushi elements (from the Gypsy family) and are probably not longer active. Both APE
and REL (only one family) endonuclease elements were found; the main represented
elements belonging to the RTE and Babar families. Finally, DNA transposons are mainly
represented by Tc1 elements, with 90 predicted elements.
Transcriptome sequencing
Nature Genetics: doi:10.1038/ng.2890
112
Tongue sole larvae came from one batch of fertilized eggs and were maintained at 23 °C
until 23 days post hatching (dph) in the Shandong Huanghai Aquaculture Co., Ltd. Six
larvae were sampled individually at 18 dph (pre-metamorphosis stage) and at 22 dph
(post-metamorphosis stage), respectively. Sampled larvae were sacrificed by anesthetic
overdose and their gonads were isolated for genetic sex identification, and then placed in
liquid nitrogen. For the sex-reversed samples, the offspring of a normal parent male (ZZ
testis P)and female were incubated at 28 °C during the critical stage (30 dph- 80 dph) to
produce the first generation of sex-reversed fish (ZW testis F1)and female (ZW ovary F1)
in 2008. Then in 2010 one of the pseudomales was used for crossing with a normal
female to produce the next generation consisting of spontaneously sex-reversed fish (ZW
testis F2) and normal females. Taken together, ten for parent male, first generation of
pseudomale and female, and second generation of pseudomale and normal female,
respectively, were collected from Laizhou Mingbo, Co. Ltd. in 2011. All these fishes were
sacrificed and their gonads were isolated and preserved at -80℃ for RNA and DNA
isolation.
Total RNA was isolated and purified from all samples using a traditional phenol method13
.
RNA concentration was measured using Nanodrop technology. For the metamorphosis
samples, RNAs of three larvae were pooled at equal quantities for library construction.
Firstly, the oligo-dT-coupled beads were used to enrich poly-A+ RNA molecules.
Random hexamers and Superscript II reverse transcriptase (Invitrogen) were used for first
strand cDNA synthesis and E. coli DNA PolI (Invitrogen) was used for second strand
cDNA synthesis. A Qiaquick PCR purification kit (Qiagen, Germantown, MD) was used
to purify the double stranded cDNA. The cDNA was then sheared with a nebulizer
(Invitrogen) to 100–500 bp fragments. The fragments were ligated to Illumina PE adapter
oligo mix after end repair and addition of a 3' dA overhang. Then, 150 ± 20 bp/200 ± 20
bp (two sizes) cDNA fragments were collected by gel purification. After 15 cycles of
PCR amplification, the libraries were subjected to paired-end sequencing (90 bp or 75 bp
Nature Genetics: doi:10.1038/ng.2890
113
at each end) using Illumina HiSeq2000. Finally, 12.38 Gb of transcriptome sequences
were generated from the seven tissues to aid precise gene annotation.
Gene prediction and annotation
Homology-based gene prediction
Protein sequences of Oryzias latipes, Takifugu rubripes, Tetraodon nigroviridis,
Gasterosteus aculeatus, Danio rerio and Homo sapiens were obtained from Ensembl
database (release 57).Short (<50 aa) sequences were filtered out before further processing.
We used the following pipeline to project them as parent proteins onto the tongue sole
genome: (a) Rough alignment. We aligned the parent protein sequences to the tongue sole
genome by TBLASTN at E-value <1e-5, and grouped all the high-scoring segment pairs
(HSPs) into gene-like structures using SOALR. Alignments with less than 70% aligning
sequence similarity to their parent proteins were filtered out. (b) Precise alignment. We
first isolated the target gene region in the genome by extending the alignment regions by
500 bp at both ends, including the intron regions, and then aligned the parent proteins to
these DNA fragments using GeneWise14
to predict the precise transcript structure. To
filter out low quality results, we only retained transcripts with ≥150 bp. (c) Transcript
clustering. We clustered all the predicted transcripts by genomic overlap with a cutoff of
more than 100 bp. For each gene locus, the transcript with the longest length was chosen.
(d) Filtering out pseudogenes. There are two types of frame errors, frame shift and
internal stop codons, that identify pseudogenes. We filtered out genes containing more
than one frame error for single-exon genes, and more than two frame errors for
multiple-exon genes. Finally, we predicted 18,284 genes and 587 pseudogenes in the
homology-based gene set.
RNA-seq models
We mapped a mixture of Illumina paired-end reads from seven tongue sole libraries using
TopHat15
, and then used Cufflinks16
to construct transcripts. After that, an in-house
software was used to predict potential open reading frames (ORF) (≥150bp) for 42,912
spliced transcripts. All predicted ORFs were aligned to Uniprot17
Protein Existence (PE)
Nature Genetics: doi:10.1038/ng.2890
114
classification level 1 and 2 proteins, using BLASTP with a cutoff E-value<1e-5.
Transcripts without any significant hits were not included for further analysis. If a
transcript contained more than one ORF, the one with max alignment score was chosen.
In the end, 30,253 transcripts were considered as RNA-seq models.
De novo prediction
First, we masked all TE-related regions in the tongue sole genome. We then performed de
novo prediction using two software programs: Genscan18
and Augustus19
, both with gene
model parameters trained from Homo sapiens, and filtered out partial (missing start or
stop codon) and short (coding region<150bp) genes. To filter out TE-derived genes, we
aligned predicted protein sequences to a TE protein database in RepeatMasker using
BLASTP with an E-value ≥1e-5. Genes showing more than 50% alignment were filtered
out. We then clustered genes according to their genomic overlap (≥100bp) and chose the
gene with longest coding region for each cluster. Ultimately, we obtained 27,327 genes in
the de novo gene set.
Integrating gene sets
To form a comprehensive reference gene set, we integrated gene models defined by the
different methods with the following steps: a) Gene were clusterd from all the input sets,
with a cutoff of genomic overlap greater than 100 bp for each gene locus. b) We chose
one representative gene model for each cluster according to the following priority:
homology-based model > RNA-seq model>de novo model. c) To complete the gene
structure in the homology-based gene set, we identified all supporting evidence in other
gene sets, and then used GLEAN to try to complete the structure of homology-based
genes presented in the reference gene set. d) We used a more stringent cutoff for de novo
genes than for homology-based genes and RNA-seq models. If de novo genes were chose
in the reference gene set, we only retained those with more than 30% aligning when
searching in Uniprot database and that contained at least 3 exons. The final reference
gene set contained 21,516 genes. Ninety-nine percent of the predicted genes were
supported by homologs in other organisms or in the transcriptome. In particular, the
Nature Genetics: doi:10.1038/ng.2890
115
highly conserved structure of homologous genes in other well-annotated teleost fish
genomes confirmed the accuracy of the annotation.
Functional annotation
For the tongue sole reference genes, we annotated the motifs and domains by
InterProScan20
against publicly available databases, including Pfam, PRINTS, PROSITE,
ProDom, SMART and PANTHER, and then retrieved Gene Ontology (GO) annotation
from the results of InterProScan. From the reference gene set, 17,890 and 14,935 genes
could be annotated by IPR and GO annotation, respectively. We also annotated 20,265
genes by searching the Swiss-Prot database using BLASTP at E-value ≥1e-5.
Constructinggene family and reconstructing phylogeny
Constructing gene families
With human and chicken as out-groups, we constructed gene families for sequenced fish
genomes including medaka, Takifugu, Tetraodon, stickleback, zebfrafish and tongue sole
using Treefam’s methodology21
. For the tongue sole, the reference gene set was used. The
protein-coding genes of other species were obtained from Ensembl (release 57). After
filtering out short genes (coding sequence<150 bp), we chose the transcripts with longest
coding sequence to represent each gene. We constructed gene families used the same
pipeline as a previous study1. In summary, we first performed all-against-all comparison
of all proteins using BLASTP with a cutoff of E-value<1e-7, aligning rate≥1/3 to both
genes, and then clustered the genes into gene families using Hcluster_sgwith
consideration of proteins of out-group species (human and chicken). Treebest was then
used to infer all the orthology and paralogy gene relations.
Reconstructing phylogeny
Using Treefam, 2,426 single-copy gene families were defined. These single-copy gene
families were used to reconstruct the phylogeny. Four-fold degenerate sites (4d) from
them were extracted and concatenated to one super gene for each species. Modeltest22
was used to select the best substitution model (GTR+gamma+I) and Mrbayes23
was used
to reconstruct the phylogenetic tree. The chain length was set to 50,000,000 (1
Nature Genetics: doi:10.1038/ng.2890
116
sample/1000 generations) and the first 1,000 samples were discarded as burn in.
Branch-specific dN and dS were estimated with codeml in PAML with branch model24
.
The transition/transversion rate ratio was estimated as a free parameter, while other
parameters were set with the default settings.
Gene family expansion and contraction
We used Café25
, a maximum-likelihood method, to analyze gene family expansion and
contraction in all lineages with a birth/death model. The model had a global parameters λ
(lambda), representing the gene birth and death (μ = -λ) rate of each branch in the tree.
This method estimated the family sizes in the common ancestor, and then defined
expansion and contraction by comparing the family size between the current species and
the ancestor. To gain an insight into the evolution of the gained and lost genes, we also
performed functional enrichment analysis (p<0.05 by Fisher’s exact test) based on GO
annotation of genes involved in significant (p<0.01) gains and losses gene families in the
tongue sole lineage.
Estimation of divergence time
We applied mcmctree program in the PAML package24
to estimate species divergence
time with 4d sites extracted from all single-copy gene families. The correlated molecular
clock and JC69 model was chosen, and Root Age was set to be below 2.0. The process
was run for 20000 samples, first 1000 of which were burn in. Other parameters were set
as defaults. We also used divergence times from human-chicken (267-325 Mya),
human-zebrafish (438-455 Mya) and zebrafish-medaka (258-307 Mya)26
as the
calibration times.
ncRNA prediction
We predicted tRNA genes by tRNAscan-SE with eukaryote mode and the default
threshold. After filtering out RNA genes covered by TE-related regions (≥80%), 674
tRNA genes, with an average length of 77 bp, were predicted. Based on sequence
conservation, we identified 104 rRNA gene fragments, with an average length of 107 bp,
by aligning the rRNA template sequences from the human genome using BLASTN with
cutoff of E-value <1e-5, identity ≥85% and match length ≥50 bp. 285 miRNA and 434
Nature Genetics: doi:10.1038/ng.2890
117
snRNA were also predicted by INFERNAL27
software against the Rfam database (release
9.1, 1,372 families)28
with Rfam's family-specific "gathering" cutoff.
Benthic adaptation by transcriptome analysis
To investigate the gene expression profiling associated with benthic adaptation of tongue
sole, we identified the differential expressed genes (DEGs)between pre-metamorphosis
and post-metamorphosis fish. Firstly, TopHat v1.2.0 package15
was used to map
transcriptome reads to the genome, with the following parameters: -a/--min-anchor 8,
-m/--splice-mismatches 0, -i/--min-intron-length 50, -I/--max-intron-length 500000,
--segment-mismatches 2 and --segment-length 25. High quality splice junctions were also
predicted by Tophat. High sequence depth regions joining known gene coding regions
directly or by high quality junction reads were considered as UTRs. Gene expression was
measured by reads per kilobase of gene per million mapped reads (RPKM)29
, and adjusted
by a scaling normalization method30
. Only genes with an RPKM>1 in at least one
sequenced sample were considered. Differentially expressed genes were detected using
DESeq31
and Cuffdiff32
. We ran DESeq and Cuffdiff with the parameter “method = blind”
because we lacked a biological replicate. The P-values were then adjusted by the false
discovery rate (FDR)33
. Only genes with adjusted P-value < 0.05 in any method and a
change in fold > 4 were considered as true DEGs.
Annotation of genes to GO categories was performed to the orthologous relationship
between the C. semilaevis gene set and D. rerio gene set, which had a perfect GO
annotation. Fisher's Exact Test and the Chi-squared Test34
was used to identify whether a
list of genes (foreground genes) was enriched in a specific GO category compared with
background genes, by comparing the number of background genes annotated to this
specific GO, the number of foreground genes annotated to this specific GO, the total
number of background genes and the total number of foreground genes. The P-value was
adjusted for multiple testing by consideration of the Benjamini-Hochberg FDR33
. The
KEGG automatic annotation server35
, annotated the genes to KEGG pathways, with
Nature Genetics: doi:10.1038/ng.2890
118
zebrafish and human as references. Fisher's exact test and the Chi-squared test then
identified enriched pathways36
.
Improved branch-site model A37
was used to extrapolate the sites under positive selection
(dN/dS>1 in tongue sole lineage and dN/dS ≤1 in other lineages). We performed detection
of signatures of positive selection for 1,577 conserved tongue sole genes of all single-copy
families (2426) with a cutoff of coverage >70% to all genes in the same family. Using a
rooted tree with Human and chicken as outgroups, we detected at least one positive
selected site with Bayesian empirical Bayes posterior probability >0.95, and then filtered
out positive selected genes with an FDR q-value <0.05. Finally, 219 genes under positive
selection in tongue sole lineages were identified. We then compared them with the DEGs
between pre- and post metamorphosis fish, 15 genes involved in benthic adaptation were
discovered to be under positive selection.
Furthermore, 15 positively selected genes including cp, mep1b, hnf4a, ace2, tmem67,
fbn1, cdhr2, pepd, itih2, mgam, cpb1, xdh, cd74, slc15a2, gda were verified by qRT-PCR.
Briefly, total RNA of each individual at pre- and post- metamorphosis stages of normal
fish was isolated and reverse transcribed as described previously13
. Three individuals at
each stage were collected for isolation of total RNA. Primers for qRT-PCR analysis were
designed using the Primer Premier 5 program for the genes. The final PCR reactions
contained 0.4 mM of each primer, 10 µl SYBR Green (Invitrogen) and as template 80 ng
of cDNA reverse transcribed from a standardized amount of total RNA. qRT-PCR was
performed on ABI PRISM 7500 Real-Time PCR System using Hotstart Taq polymerase
(Qiagen) in a final volume of 20 μl and β-actin gene was used as internal reference. All
reactions were subjected to: 95℃for 35 s followed by 40 cycles at 95℃ for 5 s, 60℃ 34
s. Melting curve analysis was applied to all reactions to ensure homogeneity of the
reaction product. The results were analyzed using 7500 System SDS Software.
In addition, we used zebrafish visual gene proteins in a comparison with other teleost
genomes by blat, with 90% identity and 70% gene coverage. The corresponding regions
Nature Genetics: doi:10.1038/ng.2890
119
(extended by 500 bp at both ends) in other teleost genomes were extracted for GeneWise
gene prediction. Combining the best prediction results with the synteny analysis, we
finally identified the corresponding protein genes in other teleost genomes and detected
the expression level of visual genes in tongue sole based on the transcriptome analysis.
Also, the significantly differentially expressed genes including the rh1, rh2 and lws1 were
verified by RT-PCR as mentioned previously.
Reconstruction of ancestral vertebrate chromosomes
We performed an all-against-all comparison between tongue sole protein sequences and
human protein sequences using BLASTP (E-value < 1e-10). Each tongue sole gene was
assigned a best-matched human gene, if any. Then, for each human gene corresponding to
more than one best-matched genes in tongue sole, we defined the first and second
orthologs as the best and second best matches, which were defined as paralogs associated
with a human gene in tongue sole. Finally, we identified 2,733 paralogs in the tongue sole
genome, 2,365 of which were anchored on chromosomes. Moreover, we paired
paralogous chromosomes according to the number of paralogs between two chromosomes.
We obtained Tetraodon, medaka and zebrafish gene sequences from Ensembl (release 57)
and identified reciprocal best-match orthologous genes between tongue sole and each
other fish using BLASTP at an E-value of 1e-10. 14,231, 14,310 and 13,084 orthologous
genes were identified for Tetraodon, medaka and zebrafish, respectively. We identified
duplicated regions by synteny analysis, using a method from a previous study38
.
Conserved synteny is defined as orthologous genes on a pair of chromosomes from
distinct species. There have been few interchromosomal arrangements in teleosts;
therefore, the ancestral chromosome in human lineage, which did not have any whole
genome duplication (WGD) event after separating from the teleost lineage, was broken
into smaller blocks, which were likely to have conserved synteny on a pair of duplicated
tongue sole chromosomes. These human blocks are called doubly conserved synteny
(DCS)38
. Following the above principle, we identified 61 DCS blocks. This result is
consistent with analyses of the Tetraodon and medaka genomes38,39
except for some small
Nature Genetics: doi:10.1038/ng.2890
120
DCSs that did not affect the result and thus were filtered out in our analysis. With the
human genome as the out-group, we deduced the ancestral teleost karyotype by
considering results from paralogy and orthology relations and DCSs, as done in a
previous study39
. The method is summarized as follows:
a. Chromosome clustering. To reduce complexity, we tried to divide the tongue sole
chromosomes into a few sub-groups. According to number of shared DCSs, 21 tongue
sole chromosomes were clustered into nine groups. Some chromosomes were involved in
multiple groups because of their completed evolution. In each group, chromosomes are
likely to contain regions duplicated from one or more ancestral chromosomes.
b. Rearrangement detection. To infer inter-chromosomal rearrangements, such as
fusions and fissions, among chromosomes in the same group, we checked whether there
were substantial paralogs between pairs of tongue sole chromosomes that originated
from one ancestral chromosome. If two tongue sole chromosomes have many DCSs but
just a few paralogs in common, we inferred that these two chromosomes were derived
from a fission event of an ancestral chromosome.
c. Inferring when arrangements occurred. After the previous step, we detected
potential arrangements from the ancestor to the current tongue sole genome. To infer
when these events occurred, we needed to further consider the orthological relationship
between tongue sole and other fish genomes, including medaka, Tetraodon, and
stickleback, using zebrafish as the out-group. Using this method, the ancestral teleost
karyotype (gnathostome ancestor) was determined to have 13 chromosomes, represented
as Anc1~Anc13. The ancestral vertebrate ancestor consisting of 10 proto-chromosomes
was reconstructed and the evolutionary hierarchy from the ancestral vertebrate ancestor to
the genomes of the human, chicken, and medaka were assigned, as reported by Nakatani40
.
Based on the evolutionary relationship between tongue sole and medaka fish, we then
deduced the chromosome evolution from the ancestral vertebrate ancestor to tongue sole.
Finally, we found that the chicken and tongue sole Z chromosomes have a common origin,
being derived from proto-chromosome A in the vertebrate ancestor, and A0 in the
gnathostome ancestor.
Nature Genetics: doi:10.1038/ng.2890
121
Genomic organization and evolution of the sex chromosomes
Structural features of sex chromosomes
The gene density on CseZ (42 genes/Mb) is slightly lower that the average value of
autosomes (46 genes/Mb), and higher than on seven of the autosomes. The CseW has a
lower gene density (19 genes/Mb) than any of the autosomes, about half of the autosomal
average. Conversely, the density of interspersed repeats of both CseZ and CseW is much
higher (by ~2.3- and ~6.9-fold, respectively) than the average for the autosomes. On CseZ,
the most abundant type of interspersed repeats is DNA transposons (36.1% of all
interspersed repeats); while on CseW, LINE elements (31.4% of all interspersed repeats)
are the most abundant. In addition, the PAR includes two scaffolds, scaffold589 (398,660
bp) and scaffold757 (243,113 bp), which are anchored distally of Z and have the same
coverage depth in both male and female samples. We identified 22 protein-coding genes
and one pseudo gene in the PAR, and inferred their function by BLAST searching against
SwissProt (E-value<1e-5) and retained the best hit for further analysis.
Homologous genes in the non-PAR regions of Z and W
To identify homologous gene pairs in the non-PAR of Z and W, we compared all W and Z
genes (395 and 937) from the non-PAR, including functional genes and pseudogenes,
using BLASTP with a cutoff of identity >50% and an alignment rate >50%. We then
chose the best hit for each W gene. We found that 339 W genes are homologous to 297 Z
genes, because some Z genes have more than one homologous W gene. As we described
before, there are two unplaced scaffolds, which are identified as Z-linked scaffolds by
their M:F depth ratios. We observed that 24 W genes are homologous to genes on these
two scaffolds. We also aligned Z and W genes to genes on autosomes and unplaced
scaffolds (except the two Z-linked scaffolds) using the same method, and found that 258
Z genes and 30 W genes have homologous genes. The other Z (382) and W (2) genes
without any homologous genes were defined as Z-specific and W-specific genes,
respectively. Neither of the two W specific gene sequences, nor their expression patterns,
gave any indication that they might function as ovary-determining genes. In addition,
pseudogenes were defined as having more than one frame error for single-exon genes and
Nature Genetics: doi:10.1038/ng.2890
122
more than two frame errors for multiple-exon genes; frame errors included frame shift
and internal stop codon.
Estimation of divergence time between Z and W
We obtained a lineage-specific Ks of 0.47 and mean divergence time of 197 (170~220)
MY (Million Years) for the separation of the tongue sole lineage from medaka using
orthologs of the whole genome. We could then calculate a mean lineage specific rate of
~2.39E-9/site/year for the tongue sole lineage. If we assumed that the Z and W
chromosome evolutionary rates were equal to the lineage specific rate, the combined rate
of Z-W divergence would be ~4.77E-9/site/year. Using this rate, we estimated a mean
divergence time of ~31 MY between Z and W. Besides, we calculated the Ks for
autosome genes and PAR genes, respectively. We firstly mapped reads from the ZW fish
to WGS assembly result using BWA41
, and performed SNP calling using pileup in
SAMtools package42
. Then, SnpEff package43
was used to classify whether the SNP site is
synonymous or non-synonymous. We also used codeml to calculate the number of
synonymous sites in every gene. Finally, the Ks was calculated by the number of
synonymous SNP sites per the number of synonymous sites in coding region.
Dosage compensation
We used RNA-seq data from the whole body (without gonad) of male and female fish to
test for dosage compensation of the Z chromosome. The male:female (M:F) gene
expression ratio was used to measure the dosage compensation level for every gene in the
female or sex reversal male relative to the normal male, calculated as the RPKM ratio of
each gene between two samples. Only genes (763) with an RPKM>1 in both the normal
male and female was considered. The Z to autosomes expression ratio (Z:A) for every
gene in the Z chromosome was calculated by dividing the RPKM of the gene by the
median RPKM of all autosomal genes. After filtering out genes with RPKM <1, we
calculated the M:F expression ratio for all Z-linked genes. The result shows that the
tongue sole exhibits incomplete dosage compensation via an upregulation of gene
expression levels in the female. We further defined the compensated and uncompensated
Nature Genetics: doi:10.1038/ng.2890
123
genes on the Z chromosome, using the same cutoff (M:F<1.3 and M:F>1.3) as described
in for zebrafinch and chicken44
. Of 763expressed Z genes (RPKM >1), 370 and 393 genes
were classified as compensated and not compensated genes, respectively. To test the
conservation of the compensated genes between tongue sole and birds (chicken and
zebrafinch), we downloaded the expression data of zebrafinch from GEO (accession no.
GSE20035)45
, and calculated the average M:F expression ratio for each pair of samples,
including four pairs of brain samples from d1 (day 1 after hatching), d25, d45 and adult
male and female individuals, respectively. We also downloaded the raw expression data of
chicken from GEO (accession nos. GSE6843, GSE6844 and GSE6856)44
, which are from
heart, brain and liver samples from male and female individuals, respectively.
Normalization of the raw expression data was performed in R using Affymetrix MAS 5.0
algorithm in Bioconductor packages46
, and the expression values were log2-transformed
to produce a more normal distribution. We then calculated the average M:F expression
ratio for each pair of samples. Using reciprocal best matching method by BLASTP at an
E-value <1e-5, we identified 10,111 and 9,979 reciprocal best orthologs to chicken and
zebrafinch, respectively. Of these orthologs, between tongue sole and zebrafinch, 158 are
located on the Z chromosome, and only 40 were expressed in both tongue sole and
zebrafinch. Correspondingly, 100 of 169 orthologs, which reside on the Z chromosome,
were expressed in both tongue sole and chicken. We found that the compensated Z genes
in tongue sole are not the same as those compensated in birds. Interestingly, the male to
female expression ratio was about 1.2-1.4 (less than 1.5) for all tested species with ZW
systems (1.32 for tongue sole; 1.36 for crow; 1.40 for chicken; 1.23 for zebrafinch; and
1.41 for silkworm)44,47,48
.
Production and genetic analyses of sex-reversals
Treatment of tongue sole with high temperature (28°C) during the critical developmental
stage directly affects the sex ratio of progeny49
. Briefly, about 3,000larvae were collected
at 25 dph and then evenly allocated into three tanks (3 m3) at 23°C.The seawater was
heated up gradually and maintained 28°C until the 100 dph. The genetic sex identification
Nature Genetics: doi:10.1038/ng.2890
124
of larvae was performed by PCR analysis using the sex-linked SSR markers, CseF-SSR1,
which produced one band of 206 bp for ZZ and two bands of 206 bp and 218 bp for ZW50
.
The phenotypic sex was identified following routine histology that the ovaries contain
oocytes or the testis contains spermatocytes51
. Under treatment of high temperature, about
70% of ZW individuals developed as male, while under normal condition (22°C), the
spontaneously sex reversal rate is about 14%. The sex-reversed fish produced by high
temperature can crossed with normal females. Unexpectedly, there was an extremely
male-skewed sex ratio (>94%) in offspring of pseudo-male families raised under normal
conditions (22°C). This was caused by an extremely high sex reversal rate of genetic
females to phenotypic males (~94%, compared to the sex reversal rate of 14% under
normal conditions). A similar phenomenon was also detected in the offspring of
pseudo-male families, crossing spontaneous sex reverted males with normal females.
Theoretically, the genotypes of progeny of pseudo-males should be Z*Z (1): Z*W (1):
ZW* (1): W*W (1) (Z* and W* are derived from the pseudo-male, and Z and W are
derived from the normal female), and thus the male to female ratio should be 1:3. We first
analyzed the genotype of fertile sperm of WZ pseudo-males using sex chromosome
specific microsatellite markers. Surprisingly, no W sperm was detected. More importantly,
the sex ratios were determined 53.26% genetic female and 46.74% genetic male in 1,812
progeny of pseudomale families, which is almost a 1:1 ratio. Thus the genotype of
progeny of pseudo-male were Z*Z (1): Z*W (1). To verify the paternal inheritance of the
Z chromosome in the progeny, we selected Z specific microsatellite markers and then
determined the genotypes of the parents in the pseudo-male families. For the pseudo-male
families with different microsatellite genotypes for the maternal and paternal
Z-chromosomes, the offspring were analyzed. This revealed that all WZ fish had the
paternal microsatellite marker, while the ZZ fish were heterozygous for both parental
markers.
Sex-related genes and expression of Z specific gene
Characterization of the sex-related genes involved in gonadal development is an initial
Nature Genetics: doi:10.1038/ng.2890
125
step towards understanding sex determination of the tongue sole. Thus, we
comprehensively searched for known sex-related genes, based on studies from other
vertebrates. Fifth-eight sex-related genes were retrieved from the tongue sole genome
sequence using BLAST against known sex-related gene sequences. As the sex reversal
experiments clearly indicated that sex determination in tongue sole is mediated by a Z
encoded dominant factor, we determined the location of the 58 sex-related genes in the
tongue sole genome and found that five genes including follistatin, patched1, figalpha,
sf-1 and dmrt1 are located on the sex chromosomes.
For whole genome methylation analysis, two biological replicates were used, and each
replicate was a pool with five gonads from the same group of fish (ZZ testis P, ZW ovary
F1, ZW testis F1, ZW testis F2 and ZW ovary F2)52
. Briefly, up to 25 μg genomic DNA
were isolated from five pooled gonads of the same replicate, and 5 μg DNA was mixed
with 25 ng of cl857 Sam7 Lambda DNA and used for BS-Seq library construction with a
modified NH4HSO3-based protocol53
. Libraries were sequenced on an Illumina HiSeq
2000. Short reads were aligned onto the tongue sole genome with SOAP254
. Cs in BS-Seq
reads that matched to Cs on the reference genome were counted as potential mCs. The
conversion rate for each library was ~99.5%. The methylation level of an individual
cytosine was determined by the number of reads containing a C at the site of interest
divided by the total number of reads containing the site. We then detected the methylation
profile of the Z-linked sex-related genes among those samples.
To identify the expression pattern of sex-related genes during sex determination stages,
we firstly analyzed the gonadal development of female and male tongue soles. The
gonads of tongue sole at different developmental stages, including 25 dph, 48 dph, 70 dph,
160 dph, 1 year and 2 years were dissected, rapidly fixed in Bouin's solution and
processed by routine histological procedures55
, including hematoxylin-eosin staining. All
individuals were checked for genetic sex identity. The period of the gonadal
differentiation was then identified. At 25 dph, the primordial gonad, stretching from the
ventral end of the kidney to the posterior end of the abdominal cavity, was detected in
Nature Genetics: doi:10.1038/ng.2890
126
females and males, without any morphological differentiation towards testis or ovary.
After this period, the primordial gonads showed differentiation with primordial germ cells
(PGCs) cells uplifting inwards, and clusters of ovarian cavities appearing as a result of the
high rate of mitotic multiplication at 48 dph, which indicated ovarian differentiation of
the tongue sole. The male gonad at the corresponding stage did not yet form a testis cavity,
but the PGCs also showed faster mitotic multiplication and filled the primordial gonad. At
70 dph, the developing ovary had a relatively large ovarian cavity and the number of
primary oocytes had increased, with the appearance of a few oogonia. Although in the
male gonad the formation of spermatogonial clusters of cysts and seminal lobules, which
are cytological features of testicular differentiation, had become apparent, testis
differentiation in the tongue sole, as in other teleosts, is evidently delayed compared with
ovarian differentiation. In addition, immature spermatids were detected in the
spermatogenic cysts at higher magnification at 70 dph. At 160 dph, ovarian development
had further proceeded, visible by the presence of primary oocytes containing prominent
nucleoli. In testes, matured spermatozoa were observed to flow into the lumen of
seminiferous lobula because of the rupture of spermatogenic cysts. Histological analysis
revealed that the matured gonads were full of oocytes and sperms in different phases,
respectively, both in the gonad of 1- and 2-year-old fish.
Total RNA of gonads at different developmental stages of normal and sex-reversed fish
was isolated and reverse transcribed as described previously13
. Primers for qRT-PCR
analysis were designed using the Primer Premier 5 program for Z-linked genes including
follistatin, patched1, sf-1 and dmrt1. The final PCR reactions contained 0.4 mM of each
primer, 10 µl SYBR Green (Invitrogen) and 80 ng of cDNA reverse transcribed from a
standardized amount of total RNA as the template. qRT-PCR was performed on an ABI
PRISM 7500 Real-Time PCR System using Hotstart Taq polymerase (Qiagen), in a final
volume of 20 μl and the β-actin gene was used as an internal reference. The PCR
conditions comprised: 95C for 35 s, followed by 40 cycles at 95 C for 5 s and 60 C for
34 s. Melting curve analysis was applied to all reactions to ensure homogeneity of the
Nature Genetics: doi:10.1038/ng.2890
127
reaction product. The results were analyzed using 7500 System SDS Software. In
comparison to dmrt156,57
, the other three Z-linked genes sf-1, patched1 and follistatin
have considerably lower expression levels at the sex determining stage in male fish. The
adult tissue expression pattern and expression at the critical sex determining period during
temperature-induced sex reversal did not indicate a male determining function of those
genes. Combining the expression data with the methylation pattern of these three genes in
normal and sex reversed individuals excluded all three male sex determining gene.
Four genes had homologs on both the Z and W chromosome, respectively, except for the
follistatin, which are only found on the Z chromosome. However, we detected that the Z
homolog of figalpha and W homologs of sf-1 and dmrt1, respectively, have to be
considered as pseudogenes, because they appeared to have lost the all or some of those
conserved domains that exert the main function of the respective gene products. figalpha
plays a key role in ovarian development, while expression of sf-1 and dmrt1are critical for
male development. In addition, there is a paralog of sf-1 on chr.14, resulting from
fish-specific WGD. Both patched1 genes on the Z and W chromosome are very similar
except for one intron that is missing from the W ortholog.
Analysis of a possible primary sex-determining role of CseDmrt1
Z specific localization of dmrt1
From the genetic map, scaffold 317, which contains the intact dmrt1 gene, was anchored
on the linkage group of Z. For qPCR analysis a specific pair of primers designed from the
DM domain of dmrt1 was used on different samples (three males (ZZ), three females
(ZW) and two super-females (WW) that were induced by gynogenesis50
, as described
above. The M: F (male versus female) ratio was about two (as expected) and expression
level in WW super-female embryos was almost zero, indicating that dmrt1 is indeed a
Z-linked gene. To further identify the physical position of the dmrt1 gene, fluorescence in
situ hybridization (FISH) was carried out essentially according to the methods described
previously58
, with slight modifications. Briefly, the metaphases of chromosome spreads
Nature Genetics: doi:10.1038/ng.2890
128
were obtained from the head kidney of females and males, respectively, and then stored at
-20℃. Chromosome preparations were passed through an ethanol series and air-dried
before denaturation (at 70℃ for 1 min and 10 seconds in 70% formamide, 2×SSC). The
dmrt1-BAC clone was cultured in LB medium containing 30 µg/ml chloramphenicol at
37℃ for 16 h. The BAC DNA was then extracted and labeled by nick translation using a
Nick Translation System (DIG-Nick Translation Mix (Roche)). The BAC-FISH probe
contained 1 µg of labeled dmrt1-BAC DNA, 50 µg of sonicated salmon sperm DNA and
10 µg of Cot-1 DNA. After hybridization in a moist chamber at 37℃ for 24 hours,
chromosome slides were subjected to a series of washing steps (2×SSC for 5 min; 50%
formamide for 5 min; 1×SSC for 5 min). Signal detection and amplification were
performed using sheep-anti-digoxigenin and FITC-Donkey-anti-sheep. FISH staining was
performed with propidium iodide (PI). Image capture was carried out with NIS-element
fluorescence microscope (Nikon) and then analyzed by the LUCIA system and Adobe
photoshop software.
Gonad in situ hybridization for dmrt1 expression
We confirmed the distribution of the dmrt1 transcripts by gonad in situ hybridization from
different developmental stages. Gonad complexes were dissected and fixed in 4%
paraformaldehyde in 0.1 M phosphate buffer (PB) (pH 7.4) at 48 C overnight. After
fixation, gonads were embedded in paraffin. Cross-sections were cut at 6–8 micrometers
(µm). Probes of digoxigenin (DIG)-labeled sense and antisense strands were generated by
in vitro transcription from linearized tongue sole dmrt1 cDNA plasmid, using an RNA
labeling kit (Roche Applied Science, Germany). Gonad in situ hybridization was
performed as described previously59
. Sections were deparaffinized, hydrated, treated with
proteinase K (10 mg/ml) and then hybridized using sense or antisense DIG-labeled RNA
probes at 70 C overnight. Hybridization signals were then detected using alkaline
phosphatase conjugated anti-DIG antibody (Roche Applied Science, Germany) and a mix
of BCIP and NBT as the chromogens.
Nature Genetics: doi:10.1038/ng.2890
129
Determination of methylation levels in different regions of dmrt1 by Bisulfite
(BS)-PCR
We found an obvious differentially methylated region (DMR) between the female and the
male, ranging from 210 bases upstream of the transcription start site (TSS) to almost the
3' end of the first intron of dmrt1, based on the whole genome methylation analysis using
five different samples (testis (ZZ testis P), ovary (ZW ovary F1), testis (ZW testis F1),
testis (ZW testis F2) and ovary (ZW ovary F2). BS PCR was performed on the first exon
and intron of dmrt1 to verify the authenticity of the DMR. In brief, a 40 μl PCR was
carried out in 1 PCR buffer, 5 mM MgCl2, 1 mM dNTP mix, 1 unit of Taq polymerase,
50 pmol each of the forward primer and reverse primer and 50 ng of bisulfite-treated
genomic DNA. BS-PCR primers were designed using the sense strand of the
bisulfite-converted DNA. PCR cycling conditions were 94C for 1 min; 40 cycles of
94C for 30 s, 50C for 30 s and 72C for 30 s; followed by 72 C for 5 min; and stored at
4C. The PCR products were electrophoresed on 1% agarose gels, the bands were excised
and gel extracted using a Zymoclean Gel DNA Recovery Kit. The purified PCR products
were cloned using the pMD18-T Simple Vector cloning kit following the manufacturer’s
protocol. For each sample, a minimum of 15 clones was sequenced. All clones were
sequenced on an ABI 3730xl DNA analyzer using SP6 or T7 primers. BS-PCR together
with sequencing of several clones provided allele-specific methylation profiles.
Methylation studies showed that the region upstream from the TSS to the end of intron 1
of dmrt1 is specifically demethylated in male gonads, but not in female gonads
In addition, we found that an E3 ubiquitin ligase gene, neurl3, which is also located only
on the Z chromosome and is absent from W. RT-PCR and methylation profile were also
carried out according to the method described previously.
Sex reversal analysis by transcriptome and miRNA sequencing
The female-to-male sex reversal phenomenon in tongue sole fish offers a good
opportunity to investigate gene expression profiles during this process. We firstly
compared the gene expression between the ovary of a normal female and the testis of a
Nature Genetics: doi:10.1038/ng.2890
130
pseudomale. We found 836 differentially expressed genes (with at least 4-fold difference)
with 262 up-regulated in female and 574 up-regulated in pseudomale. For every GO
category containing more than 20 genes, we calculated the geometric mean expression
level in both the ovary and testis, in which an RPKM <1 was considered as 1. Gene
ontology analyses suggested that these genes are related to sexual reproduction. The shift
from genetic females to phenotypic males is accompanied by the enhanced expression of
genes involved in spermatogenesis, flagellar assembly and spermmobility, together with
depressed expression of genes with function in oogenesis and ovary development, such as
aqp1, gas8, ropn1l, nme5, tekt1, plcz1, tbpl1, spag6, gal3st1, dnajb13, cldn11, gpr64.
These genes were verifed by semi-quantitative RT-PCR. Total RNA of gonads from three
female and three pseudomale individuals were isolated and reverse transcribed as described
previously. Primers for these genes were designed by Primer premier 5.0PCR was
performed in a 25 μl volume consisting of 0.5 μl forward/reverse primers (10 μM), 12.5 μl
2x Taq MasterMix (CWBIO), 1 μl cDNA and 10.5 μl ddH2O. The PCR conditions were as
follows: 95℃ for30 s, 27 cycles of 95 ℃ for 30 s, 52 ℃ for 30 s and 72 ℃ for 30
s.β-actin was used to calibrate the cDNA template for corresponding samples. The final
amplification products were resolved on 1.0 % agarose gel with a DL2000 DNA marker.
Besides, the KEGG automatic annotation server35
was used annotate the genes to KEGG
pathways, with zebrafish and human as references. Fisher's Exact Test and Chi-squared
Test36
were then used to identify enriched pathway.
For small RNA, 18-30 nts RNAs were purified from the gonads, including the ZZ testis P,
ZW ovary F1, ZW testis F1, ZW testis F2 and ZW ovary F2. Illumina 5’ and 3’ RNA
adapters were sequentially ligated to the RNA fragments and the ligated products were
size-selected on denaturing polyacrylamide gels. The adapter-linked RNA was reverse
transcribed with small RNA RT primers and amplified using 15 cycles of PCR with small
RNA PCR primer 1 and 2 (Illumina). The libraries were sequenced with the Illumina
Genome Analyzer. The adapters were removed using our custom scripts, and low quality
reads were discarded. Reads were aligned to the genome using bwa-0.6.241
. We discarded
Nature Genetics: doi:10.1038/ng.2890
131
reads aligned to repeats, exons, tRNAs, rRNAs, snRNAs and the remnant were used for
microRNA detection using our custom scripts, which use ViennaRNA-2.0.760 to compute
RNA structure. Expression level of miRNAs were calculated as RPM (reads per million)
and DESeq was used to identify differentially expressed microRNAs32
. As a result, 44
microRNAs showed differential expression between female and pseudomale gonads.
Intriguingly, several microRNAs having important regulatory roles in sex determination
are upregulated in pseudomale testes. For instance, the expression of miR-124, miR-132,
miR-212 and miR-22, were significantly induced by up to 8-fold in testes. In mouse,
miR-124 can directly target sox9, a gene that has a critical role in testis development,
regulating expression of amh in Sertoli cells to inhibit the development of the female
reproductive system61
. MiR-132 and miR-212 are the products of the same pri-miRNA in
mouse and human, highly conserved among vertebrates (from teleosts to mammals), and
share a common consensus seed sequence62
. In mouse, the two micro-RNAs are regulated
by GnRH, a hormone central to the regulation of reproductive function, and cooperatively
(up/down)-regulate the expression of LH/hCG, which trigger ovulation in female, and
stimulate leydig cell production of testosterone in male63
. In tongue sole, miR-132 and
miR212 were found to be highly expressed in pseudomale testes. The estrogen
receptor-alpha (ER-alpha), a gene essential for sexual development and reproductive
function, is targeted by miR-22 and miR-219 and miR-27 in human MCF-7 cell line64
. In
tongue sole, miR-22 andmiR-219 were Z-linked microRNAs and miR-27 was a W-linked
microRNA. Further, in the human MCF-7 cell line, miR-27 has been reported to be
co-expressed with beta-catenin, which is an essential gene for female development and
fertility65
. Taken together, the expression profiling revealed an overall trend of inhibition of
ovary development and stimulation of testis development during sex reversal from genetic
females to males.
Nature Genetics: doi:10.1038/ng.2890
132
Supplementary URLs
SOAP, http://soap.genomics.org.cn/; Ensembl, http://www.ensembl.org/index.html; KEGG,
http://www.genome.jp/kegg/; Repbase, http://www.girinst.org/repbase/index.html; SOALR,
http://treesoft.svn.sourceforge.net/viewvc/treesoft/; RepeatMasker, http://repeatmasker.org/; GLEAN,
http://sourceforge.net/projects/glean-gene/; Time tree, http://www.timetree.org.
References
1. Li, R.Q. et al. The sequence and de novo assembly of the giant panda genome. Nature 463,
311-317 (2010).
2. Li, R.Q. et al. De novo assembly of human genomes with massively parallel short read
sequencing. Genome Res. 20, 265-272 (2010).
3. JW, V.O. JOINMAP®4, Software for the caculation of genetic linkage maps in experimetal
populations. (2006).
4. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5,
R12 (2004).
5. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res. 25, 3389-3402 (1997).
6. Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656-664 (2002).
7. Sha, Z. X. et al. Generation and analysis of 10 000 ESTs from the half-smooth tongue sole
Cynoglossus semilaevis and identification of microsatellite and SNP markers. J. Fish Biol. 76,
1190-1204 (2010).
8. Edgar, R.C. & Myers, E.W. PILER: identification and classification of genomic repeats.
Bioinformatics 21 Suppl 1, i152-158 (2005).
9. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large
genomes. Bioinformatics 21 Suppl 1, i351-358 (2005).
10. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet.
Genome Res. 110, 462-467 (2005).
11. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR
retrotransposons. Nucleic Acids Res. 35, W265-268 (2007).
12. Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: A multiplatform graphical user
interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221-224
(2010).
13. Chen, S.L., Hong, Y.H., Scherer, S.J. & Schartl, M. Lack of ultraviolet-light inducibility of the
medakafish (Oryzias latipes) tumor suppressor gene p53. Gene 264, 197-203 (2001).
14. Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988-995
(2004).
15. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq.
Nature Genetics: doi:10.1038/ng.2890
133
Bioinformatics 25, 1105-1111 (2009).
16. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated
transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511-515
(2010).
17. UniProt, C. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38,
D142-148 (2010).
18. Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome
Res. 10, 516-522 (2000).
19. Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron
submodel. Bioinformatics 19 Suppl 2, ii215-225 (2003).
20. Zdobnov, E.M. & Apweiler, R. InterProScan--an integration platform for the
signature-recognition methods in InterPro. Bioinformatics 17, 847-848 (2001).
21. Galtier, N., Gouy, M. & Gautier, C. SEAVIEW and PHYLO_WIN: two graphic tools for
sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12, 543-548 (1996).
22. Posada, D. & Crandall, K.A. MODELTEST: testing the model of DNA substitution.
Bioinformatics 14, 817-818 (1998).
23. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees.
Bioinformatics 17, 754-755 (2001).
24. Yang, Z.H. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24,
1586-1591 (2007).
25. De Bie, T., Cristianini, N., Demuth, J.P. & Hahn, M.W. CAFE: a computational tool for the
study of gene family evolution. Bioinformatics 22, 1269-1271 (2006).
26. Hedges, S.B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times
among organisms. Bioinformatics 22, 2971-2972 (2006).
27. Nawrocki, E.P., Kolbe, D.L. & Eddy, S.R. Infernal 1.0: inference of RNA alignments.
Bioinformatics 25, 1335-1337 (2009).
28. Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic
Acids Res. 33, D121-124 (2005).
29. Mortazavi, A., Williams, B.A., Mccue, K., Schaeffer, L. & Wold, B. Mapping and quantifying
mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621-628 (2008).
30. Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression
analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
31. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome
Biol. 11, R106 (2010).
32. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L. & Pachter, L. Improving RNA-Seq
expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011).
33. Benjamini, Y.& Hochberg., Y. Controlling the False Discovery Rate: A Practical and Powerful
Approach to Multiple Testing. J. R. Stat. Soc. 57, 289-300 (1995).
34. Beissbarth, T. & Speed, T.P. GOstat: find statistically overrepresented Gene Ontologies within
a group of genes. Bioinformatics 20, 1464-1465 (2004).
35. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C. & Kanehisa, M. KAAS: an automatic
genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182-W185
(2007).
36. Huang, D.W., Sherman, B.T. & Lempicki, R.A. Bioinformatics enrichment tools: paths
Nature Genetics: doi:10.1038/ng.2890
134
toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1-13
(2009).
37. Zhang, J.Z., Nielsen, R. & Yang, Z.H. Evaluation of an improved branch-site likelihood
method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22, 2472-2479
(2005).
38. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the
early vertebrate proto-karyotype. Nature 431, 946-957 (2004).
39. Kasahara, M. et al. The medaka draft genome and insights into vertebrate genome evolution.
Nature 447, 714-719 (2007).
40. Nakatani, Y., Takeda, H., Kohara, Y. & Morishita, S. Reconstruction of the vertebrate
ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res.
17, 1254-1265 (2007).
41. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754-1760 (2009).
42. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,
2078-2079 (2009).
43. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide
polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118
; iso-2;
iso-3. Fly 6, 1-13 (2012).
44. Itoh, Y. et al. Sex bias and dosage compensation in the zebra finch versus chicken genomes:
General and specialized patterns among birds. Genome Res. 20, 512-518 (2010).
45. Tomaszycki, M.L. et al. Sexual differentiation of the zebra finch song system: potential roles
for sex chromosome genes. BMC Neurosci. 10, 24 (2009).
46. Gentleman, R.C. et al. Bioconductor: open software development for computational biology
and bioinformatics. Genome Biol. 5, R80 (2004).
47. Wolf, J.B. & Bryk, J. General lack of global dosage compensation in ZZ/ZW systems?
Broadening the perspective with RNA-seq. BMC Genomics 12, 91 (2011).
48. Zha, X.F. et al. Dosage analysis of Z chromosome genes using microarray in silkworm,
Bombyx mori. Insect Biochem. Mol. Biol. 39, 315-321 (2009).
49. Deng, S.P., Chen, S. L., Tian, Y. S., Liu, B. W. & Zhuang, Z. M. Gonadal differentiation and
effects of temperature on sex determination in half-smooth tongue sole, Cynoglossus
semilaevis. J. Fish. Sci. China, 5, 714-719 (2007).
50. Chen, S.L. et al. Induction of Mitogynogenetic Diploids and Identification of WW
Super-female Using Sex-Specific SSR Markers in Half-Smooth Tongue Sole (Cynoglossus
semilaevis). Mar. Biotechnol. 14, 120-128 (2012).
51. Chen, S.L. et al. Selection of the families with high growth rate and high female proportion in
half-smooth tongue sole (Cynoglossus semilaevis). J. Fish. China 37, 481-488 (2013).
52. Shao, C.W. et al. Epigenetic Modification and Inheritance in Sexual Reversal of Fish. Genome
Res. doi:10.1101/gr.162172.113 (2014).
53. Hayatsu, H., Tsuji, K. & Negishi, K. Does urea promote the bisulfite-mediated deamination of
cytosine in DNA? Investigation aiming at speeding-up the procedure for DNA methylation
analysis. Nucleic Acids Symp. Ser. (Oxf) 50, 69-70 (2006).
54. Li, R.Q. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25,
1966-1967 (2009).
Nature Genetics: doi:10.1038/ng.2890
135
55. Liang, Z., Chen, S.L., Zhang, J., Song, W.T. & Liu S.S. Gonadal development process
observation of half-smooth tongue sole in rearing population. J. Southern Agr. 12, 2074-2078
(2012).
56. Deng, S.P.& Chen, S. L. Molecular cloning, characterization and RT-PCR expression analysis
of Dmrt1α from half-smooth tongue-sole, Cynoglossus semilaevis. J. Fish. Sci. China, 4,
577-584 (2008).
57. Sun, Y.Y. et al. Cloning and expression analysis of DMRT1 gene in Cynoglossus semilaevis. J.
Wuhan Univ. (Nat.Sci.Ed.), 221-226 (2008).
58. Szczerbal, I., Klukowska-Roetzler, J., Dolf, G., Schelling, C. & Switonski, M. FISH mapping
of 10 canine BAC clones harbouring genes and microsatellites in the arctic fox and the
Chinese raccoon dog genomes. J. Anim. Breed. Genet. 123, 337-342 (2006).
59. Kobayashi, T., Kajiura-Kobayashi, H. & Nagahama, Y. Differential expression of vasa
homologue gene in the germ cells during oogenesis and spermatogenesis in a teleost fish,
tilapia, Oreochromis niloticus. Mech. Develop. 99, 139-142 (2000).
60. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithm. Mol. Biol. 6, 26 (2011).
61. Cheng, L.C., Pastrana, E., Tavazoie, M. & Doetsch, F. miR-124 regulatesadult neurogenesis in
the subventricular zone stem cell niche. Nat. Neurosci. 12, 399-408 (2009).
62. Wanet, A., Tacheny, A., Arnould, T. & Renard, P. miR-212/132 expressionand functions:
within and beyond the neuronal compartment. Nucleic Acids Res. 40, 4742-4753 (2012).
63. Fiedler, S.D., Carletti, M.Z., Hong, X.M. & Christenson, L.K. Hormonalregulation of
microRNA expression in periovulatory mouse muralgranulosa cells. Biol. Reprod. 79,
1030-1037 (2008).
64. Pandey, D.P. & Picard, D. miR-22 inhibits estrogen signaling by directlytargeting the estrogen
receptor alpha mRNA. Mol. Cell. Biol. 29, 3783-3790 (2009).
65. Li, X. et al. MicroRNA-27a indirectly regulates estrogen receptor {alpha}expression and
hormone responsiveness in MCF-7 breast cancer cells. Endocrinology 151, 2462-2473 (2010).
Nature Genetics: doi:10.1038/ng.2890