Draft
Genome-wide DNA polymorphism in the indica rice varieties
RGD-7S and Taifeng B, revealed by whole genome re-sequencing
Journal: Genome
Manuscript ID gen-2015-0101.R2
Manuscript Type: Article
Date Submitted by the Author: 10-Dec-2015
Complete List of Authors: Fu, Chongyun; Rice research institute, Guangdong academy of agricultural
sciences; Guangdong provincial Key Laboratory of New Technology in Rice Breeding Liu, Wu-Ge; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Liu, Di-Lin; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Li, Jin-Hua; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Zhu, Man-Shan; rice reserach institute, Guangdong academy of agricultural
sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Liao, Yi-Long; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Liu, Zhen-Rong; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Zeng, Xue-Qin; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Wang, Feng; rice reserach institute, Guangdong academy of agricultural
sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding
Keyword: rice, genome resequencing, SNPs, InDels, CNVs
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Page 1 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B,
revealed by whole genome re-sequencing
Chong-Yun Fu1,2,Wu-Ge Liu
1,2, Di-Lin Liu
1,2, Ji-Hua Li
1,2, Man-Shan Zhu
1,2,, Yi-Liao Liao
1,2,
Zhen-Rong Liu1,2
, Xue-Qin Zeng1,2
, Feng Wang1,2
*
1 Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640,
P.R. China.
2 Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou
510640, P.R. China.
* Corresponding author:
Name: Feng Wang
Address:No.3, Jinying East 1st Street, Tianhe District, Guangzhou City, Guangdong Province,
P.R. China.
Office Telephone: 086-020-38491901; Email: [email protected].
Page 2 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Abstract
Next-generation sequencing technologies provide opportunities to understand the genetic
variation even in the closely related cultivars. We performed the whole genome resequencing
of two elite indica rice varieties RGD-7S and Taifeng B, whose F1 progeny showed hybrid
weakness and hybrid vigor when they were grown in the early- and late-cropping seasons,
respectively. Approximate 150 million 100-bp pair-end reads were generated, which covered
about 86% of rice (Oryza sativa L. Japonica. cv. Nipponbare) reference genome. A total of
2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in
RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified
961,791SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The
density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for
RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S,
1,989 of 2,727 CNVs were overlapped in 218 genes, and 1,231 of 2,010 CNVs were
annotated in 175 gene in Taifeng B. In addition, we verified a subset of InDels in the interval
of hybrid weakness genes Hw3 and Hw4, and obtained some polymorphic InDel markers,
which will provide a sound foundation for cloning hybrid weakness genes. Analysis of
genomic variations will also contribute to understanding the genetic basis of hybrid weakness
and heterosis.
Key words: rice, genome resequencing, SNPs, InDels, CNVs
Introduction
Page 3 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Rice is one of the vital food crops for human nutrition worldwide. The utilization of heterosis
makes hybrid rice yield 10-20% higher than the conventional varieties (Cheng et al., 2007).
However, the presence of hybrid weakness potentially hinders the full utilization of heterosis,
especially in the intra-subspecific crosses. Most cases of the hybrid weakness occurred in the
crosses between wild and cultivated rice or between indica and japonica rice (Ichitani et al.,
2007, 2011; Kuboyama et al., 2009; Chen et al., 2013, 2014). We have recently reported a
type of low temperature-induced hybrid weakness in the crosses between two indica rice
varieties V1134 (SH527//GD-7S/BL122)and Taifeng A in our hybrid rice breeding
programme. And the F1 progeny from the cross between them showed hybrid weakness in the
early-growing seasons with relatively low temperatures (average daily temperature of
19.87 °C at seedling stage) and hybrid vigor in late-growing seasons with relatively high
temperatures (average daily temperature of 28.4 °C at seedling stage) (Fu et al., 2013). We
proved that the F1 progeny from the cross between the two elite indica rice lines RGD-7S
(GD-7S/BL122, one of parents of V1134) and Taifeng B (the maintainer line of Taifeng A
with the same nuclear genome) also showed hybrid weakness when they were grown in the
early seasons. Understanding the temperature-dependent switch between hybrid weakness and
hybrid vigor could contribute to elucidating the molecular mechanism of heterosis.
To fully understand the genetic basis of the low temperature-dependant hybrid weakness, we
conducted the fine mapping of the hybrid weakness genes. However, the lower genetic
polymorphisms between the two indica breeding parents hamper further cloning of the hybrid
Page 4 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
weakness genes. The advent of the next generation sequencing technologies (NGS)
contributes to discovering genome-wide genetic variation and genotyping in a highly efficient
way through large-scale re-sequencing of whole genomes (Han et al., 2013). Genetic variation
includes sequence variation and structure variation. Sequence variation generally consists of
SNPs, and short sequence insertions and deletions (InDels). Structure variation is usually
considered as gain/loss variations and copy number variations (CNVs), which include large
scale deletions, insertions, duplications, inversions and translocations. One of the important
roles of NGS technologies is to detect a large number of sequence polymorphisms, such as
single-nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) (Huang et al.,
2013), which have been widely applied in marker-assisted and genomic selection, QTL
mapping, positional cloning and haplotype and pedigree analysis because they are cheap,
stable and high throughput (McCouch et al., 2010). NGS technologies also assist in
understanding the distribution of SNPs and InDels and the variations of copy number of DNA
fragments and chromosome structure which can affect gene expression and function. Based
on the reference genome sequences of the japonica and indica rice cultivars, some rice lines
have been resequenced and a massive number of SNPs and InDels have been identified
(Arai-Kichise et al., 2011; Subbaiyan et al., 2012; Jain et al., 2014).
This study aimed to discover the genomic variations and DNA polymorphisms in two elite
indica rice varieties RGD-7S and Taifeng B by whole genome re-sequencing. The reads
generated were mapped to the Nipponbare genomic sequence
Page 5 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
(Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules ). To investigate the genomic
variation, SNPs, InDels and copy number variation (CNV) were comprehensively detected
and the DNA polymorphism between RGD-7S and Taifeng B in the candidate regions of
hybrid weakness genes were validated by PCR. The investigation of their whole genomic
variations will contribute to further cloning the hybrid weakness genes and understanding the
genetic basis underlying hybrid weakness and heterosis in hybrid rice.
Materials and Methods
Plant materials and sequencing
In the previous study of hybrid weakness (Fu et al., 2013), we found that the F1 progeny from
Taifeng A and V1134 (SH527//GD-7S/BL122) showed temperature-dependent hybrid
weakness. In order to understand the cause of the type of hybrid weakness, we carried the
following crosses: (1) between SH527 and Taifeng A, (2) between RGD-7S (GD-7S/BL122)
and Taifeng B (the maintainer line of Taifeng A with the same nuclear genome), and we
found that only the F1 progeny from the cross (2) displayed hybrid weakness in the
early-growing seasons. RGD-7S is a thermo-sensitive male sterility line carrying two resistant
genes Pi1 and Pi2 to rice blast, and Taifeng B is a slender-grain good quality maintainer line.
In this study, RGD-7S and Taifeng B were used to conduct the genomic resequencing to
further understand the cause of hybrid weakness. The genomic DNA of RGD-7S and Taifeng
B was extracted from 20-day-old seedlings using modified CTAB (Hexadecyl
trimethylammonium bromide) method and purified by Chloroform-phenol (1:1). RNA was
Page 6 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
removed by using RNAase A for 30 min at 37°C. The quality of isolated DNA was checked
by Agilent bioanalyzer 1000 (Agilent Technologies, Singapore). The library preparation was
performed according to the manufacturer’s protocol. The genomic re-sequencing was
conducted to generate paired-end 100-base long reads using Illumina Hiseq 2000 platform
(Illumina Technologies) by Macrogen (Seoul, Korea). The raw data generated were filtered
via standard Illumina pipeline. All of sequencing data of RGD-7S and Taifeng B have been
uploaded to NCBI, and the accession numbers of RGD-7S and Taifeng B are
SAMN04320540 and SAMN04320541, respectively.
Bioinformatics analysis
To obtain the clean data, the raw data were processed for base calling, quality evaluation,
removing the adaptor sequence and low-quality sequence using CASAVA (v1.8.2) and
FastQC software. The redundant data was discarded and the high-quality filtered reads were
mapped onto the rice reference genome (Oryza sativa L. japonica cv. Nipponbare, MSU7;
http://rice.plantbiology.msu.edu/index.shtml) using BWA software (v0.5.9-r16). The default
parameters were used in the BWA software. The SAMtools software (v0.1.18) was utilized to
detect SNPs and InDels and investigate their quantity, type and distribution on chromosomes
and gene coding regions. Copy number variations (CNVs) were investigated using
Control-FREEC software (FREEC v6.2) (Boeva et al., 2012). The parameters were set as
follows: ploidy:2, breakpoint threshold: 0.8, window:10 kb, expected GC content:0.35-0.55,
and min mappability per window:0.95. .
Page 7 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
InDel primers design and experimental validation
Based on the sequencing results, the polymorphic InDels with insertion or deletion size ≥10
bp were selected. The putative polymorphic InDel loci were scanned using Primer Premier
5.0 to design oligonucleotide primers. The optimized input parameters were as follows:
product size: 100–500 bp; primer size: 18–22 bp; primer Tm: 55-60°C; primer GC content:
40-60%. PCR was performed in a reaction volume of 20 µl containing 10 ng of template DNA,
0.2 mM of each primer, 0.2 mM of each dNTP, 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5
mM MgCl2, and 1 U of Taq DNA polymerase. Amplification was carried out in a S1000
thermocycler (Bio-Rad, USA) using the following conditions: 5 min at 94°C; followed by 35
cycles of 0.5 min at 94°C, 0.5 min at 55°C, and 0.5 min at 72°C; and a final extension of 5
min at 72°C. PCR products were separated in 3.0% agarose gels and stained with Gold View
(Applygen Technologies Inc).
Results
Reads mapping to Oryza sativa Nipponbare reference genome
Whole genome re-sequencing of RGD-7S and Taifeng B generated 148,249,780 and
153,044,582 100-bp pair-end reads, and yielded 14,973,227,780 and 15,457,502,782 bases
respectively (Table 1) and their mean depth of coverage was 27.83 X and 28.72 X. The rates
of high-quality (Q30) bases are 91.61% and 89.92% in RGD-7S and Taifeng B, respectively.
These reads were mapped to the Nipponbare genome (Release 7) as the reference rice genome
using Burrows-Wheeler Alignment (BWA) and Picard software. There were 320,035,665 and
Page 8 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
324,501,839 base-pairs from RGD-7S and Taifeng B mapped to the Nipponbare genome and
covered about 85.5% and 86.69% of the reference genome, respectively. Of these, only about
73% of high-quality reads could be mapped to the Nipponbare genome. These results
indicated that there was significant difference in the genome sequences between Nipponbare
and the two varieties.
Detection and distribution of DNA polymorphisms in RGD-7S and Taifeng B
Compared with the Nipponbare genome sequence, a total of 2,758,740 polymorphic sites
including 2,408,845 SNPs and 349,895 InDels were discovered in RGD-7S and Taifeng B
respectively, using SAMtools software,. Two filter conditions (coverage ≥10 and ≤100) were
applied to minimize the detection of false-positive SNPs and InDels, and heterozygous SNPs
and InDels were removed. Based on these filter conditions, the total numbers of DNA
polymorphisms were respectively 2,082,219 (1,744,556 SNPs and 337,663 InDels) and
2,156,997 (1,821,644 SNPs and 335,353 InDels) in RGD-7S and Taifeng B (Table 2). In
order to understand the distribution of DNA polymorphism, we investigated the variant
frequency (the number of DNA polymorphisms per 100 kb) in RGD-7S and Taifeng B. The
results showed that the variant frequency varied from 459.8 to 675.2 in RGD-7S and from
431.5 to 677.7 in Taifeng B, with the maximum on chromosome 10 (Chr.10) for both samples,
and the minimum on Chr.1 and Chr.9 for RGD-7S and Taifeng B, respectively.
To develop more polymorphic molecular markers for further fine mapping and cloning of the
hybrid weakness genes, DNA polymorphisms between RGD-7S and Taifeng B were further
Page 9 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
analyzed. Three filter conditions (coverage≥5, deletion of heterozygous sites and InDel ≥5bp)
were applied to screen the polymorphic SNPs and InDels between RGD-7S and Taifeng B.
As both RGD-7S and Taifeng B belong to indica rice varieties, the detected RGD-7S/Taifeng
B DNA polymorphisms included only 961,791 SNPs and 46,640 InDels (Table 3). The
density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for
RGD-7S/Taifeng B, which was far less than those between Nipponbare and them. The
distribution of polymorphic SNPs was different from that of polymorphic InDels. The most
polymorphic SNPs and InDels were found on Chr.12 and Chr 1, respectively, while Chr.10
had the least polymorphic SNPs and InDels..We also investigated the distribution of SNPs in
the different regions of genes, such as upstream, downstream, intergenic, intron and exon. The
distribution of SNPs are mainly in the downstream, upstream, intergenic, intron and exon of
genes, and their proportions are about 30%, 32%, 23%, 7% and 5%, respectively (Table 4).
The proportions of SNPs are only about 0.24%, 0.01%, 0.97% and 0.64% in the splice sites,
start codon regions, 3’and 5’untranslated regions, respectively.
In addition, we further analyzed the distribution of so-called large-effect SNPs which are
considered to potentially affect gene function in RGD-7S and Taifeng B genomes (Table 5).
In the RGD-7S genome, 6141 SNPs were expected to cause premature stop codons, 1135 to
lose the annotated stop codons, 598 to lose the start codons, 1430 to gain start codons, and
909 to disrupt splicing donor or acceptor sites. In Taifeng B, 5987 were supposed to gain
premature stop codons, 1132 to alter annotated stop codons, 558 to lose start codons, 1383 to
Page 10 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
obtain start condons and 847 SNPs to influence splicing donor or acceptor sites.
Detection of copy number variation (CNV) in RGD-7S and Taifeng B
Copy number variants (CNVs) are currently defined as unbalanced changes in the genome
structure and they include deletions, insertions, and duplications of over 50 bp in size
(Muñoz-Amatriaín et al., 2013). CNVs can influence gene transcriptional and translational
level. We utilized FREEC software to detect the copy number variations (CNVs) in RGD-7S
and Taifeng B, and obtained 2727 and 2010 CNVs in RGD-7S and Taifeng B, respectively
(Table6). Eight hundreds and nineteen CNVs were detected on Chr.9 in RGD-7S, and
surprisingly there were as high as 581 gains in the first 40 kb interval from 0 to 40000 on
Chr.9. Relatively fewer CNVs were found on Chr.11 for both RGD-7S and Taifeng B, and
only 49 and 29 CNVs were detected, respectively. In addition, we found that 126 and 119
segments were absent in RGD-7S and Taifeng B genomes, respectively. In RGD-7S genome,
738 and 1989 CNVs were respectively overlapped in the intergenic regions and 218 genes
including 119 transposable elements, 12 genes with alternative splicing sites and 8 resistance
genes. In Taifeng B genome, 779 and 1231 CNVs respectively occurred in the intergenic
regions and 175 genes containing 80 transposable elements, 21 genes with alternative splicing
sites and 5 resistance genes.
Analysis of SNPs and InDels in RGD-7S and Taifeng B
Based on the nucleotide substitutions, the polymorphic SNPs were divided into two classes,
namely, transitions (C/T and G/A) and transversions (C/A, G/C, A/T, T/G). The total
Page 11 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
transitions detected were 1,727,621 and 1,745,022 and the transversions detected were
681,066 and 691,942 in RGD-7S and Taifeng B, respectively (Table 7). The ratio of
transitions to transversions (Ts/Tv) was about 2.5 in the two varieties. The numbers of C/T
and G/A transitions are roughly equal, while the G/C transversions were considerably less
than other forms of transversions (C/A, A/T, T/G).
The total InDels detected were 349,895 (177,194 deletions and 172,701 insertions) in
RGD-7S and 352,336 (174388 deletions and 177948 insertions) in Taifeng B. The size of
deletions ranged from 1 to 43 bp for both RGD-7S and Taifeng B, and that of insertions were
up to 28 and 29 bp in RGD-7S and Taifeng B, respectively. About 91% of insertions and
deletions were in the range of 1-9 bp, while large insertions and deletions (≥10-bp) only
accounted for approximately 9% (Figure 1), which was very convenient for gel-based
genotyping in genetic mapping and marker-assisted selection.
Annotation of SNPs and InDels
The annotations of Nipponbare genome (Release 7.0) was used to investigate the distribution
of SNPs and InDels in RGD-7S and Taifeng B. Similar distribution patterns of SNPs and
InDels were found in RGD-7S and Taifeng B (Figure 2). Interestingly, the variants in the
upstream (32.8%) and downstream (30.4%) of the genes were more than those in the
intergenic regions (22.8%) in both lines. Excluding the heterozygous sites, 319,506 and
311,257 SNPs and 15,316 and 14,706 InDels were detected in the coding regions of RGD-7S
and Taifeng B, respectively. In order to minimize the ratio of false position, a rigorous filter
Page 12 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
condition (coverage ≥10) was used. We detected 254,155 and 263,125 SNPs and 11,233 and
12,518 InDels in 39,641 and 39,338 gene-coding regions annotated in RGD-7S and Taifeng B
genomes, respectively. GO analysis indicated that the variant genes were mainly distributed in
the terms of cell component (9208 and 9236), catalytic activity (6319 and 6341), binding
activity (6637 and 6658), cellular process (7976 and 7982) and metabolic process (8682 and
8675) in Taifeng B and RGD-7S, respectively (Figure 3). Further significant enrichment
analysis showed that the variant genes from Taifeng B and RGD-7S were mainly grouped in
the nearly same categories such as kinase activity, carbohydrate binding, cellular protein
modification process, nucleotide binding, signal transduction, hydrolase activity, and stress
response (Figure 4). The rest of variant genes from Taifeng B were enriched in the categories
of response to biotic stimulus and cell death, while those from RGD-7S were enriched in the
categories of hydrolase activity and response to extracellular stimulus. BLAST analysis
demonstrated that 61 and 117 special variant genes were present in RGD-7S and Taifeng B
(Supplementary Table 1), respectively.
Validation of the partial sequencing results in the surrounding intervals of two hybrid
weakness genes Hw3 and Hw4
To validate the re-sequencing results and develop polymorphic markers for further fine
mapping of the hybrid weakness genes Hw3 and Hw4, we respectively designed 20 and 36
InDel markers around the mapping intervals of Hw3 (Supplementary Table 2) and Hw4
(Supplementary Table 3) based on the re-sequencing results. PCR analysis indicated that 13
Page 13 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
out of 20 markers showed polymorphism between Taifeng B and RGD-7S in the surrounding
interval of Hw3, except one marker failing in PCR amplification. Three of the 13 polymorphic
InDel markers (InDel1111, InDel1112 and InDel1113) harbored in the mapping interval (136
kb) of Hw3, which will narrow down the candidate interval of Hw3. To further test the
sequencing analysis results, the candidate gene LOC_Os11g44310 for Hw3 was analyzed (Fu
et al. 2013). Reads mapping indicated that high-quality reads from RGD-7S encompassed the
complete sequence of LOC_Os11g44310, while only about 20 reads from Taifeng B were
non-continuously anchored in the region of LOC_Os11g44310 (Figure 4), which perfectly
matched our previous results of PCR amplification. On the other hand, Hw4 gene was mapped
in about 15 cM interval on Chr.7 in our previous study. PCR assay showed that 24 out of 36
InDel markers displayed polymorphism between Taifeng B and RGD-7S in the 15 cM
interval of Hw4 except 3 markers failing in PCR amplification.
Discussion
In this study, we comprehensively studied the genome diversity of two indica rice varieties
RGD-7S and Taifeng B, and obtained a global view of genomic variation of both varieties
including SNPs, InDels and CNVs.
Reads mapping and detection of genome-wide DNA polymorphism
NGS technology enabled us to produce massive sequence output and made high-throughput
DNA marker discovery feasible and cost-effective. The whole genome of two indica rice
varieties were re-sequenced and mapped to Nipponbare as a reference genome to uncover
Page 14 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
genome-wide DNA variants for further understanding the genetic basis of hybrid weakness
and to develop polymorphic molecular markers between them for map-based cloning of
hybrid weakness genes. The mean depth of coverage was 27.83x and 28.72x for RGD-7S and
Taifeng B, respectively. About 73% of high-quality data obtained from the two varieties
could be mapped on the Nipponbare reference genome, and the mapped reads covered about
86% of the reference genome, which was similar to the coverage reported in the previous
studies (Yamamoto et al., 2010; Subbaiyan et al., 2012; Jain et al., 2014). This indicated that
there was large genetic difference between the two indica rice varieties and Nipponbare and
also reflected the inherent differences in genomes accumulated through genetic differentiation
between the indica-japonica subspecies. In addition, the coverage bias was observed with
libraries prepared using both enzymatic and physical shearing, although use of Illumina
sequencing technology with libraries prepared without amplification led to the least biased
coverage (Quail et al., 2013). Recently, Schatz et al (2014) assembled three rice genome
sequences including Nipponbare, indica rice IR64 and aus rice DJ123 based on Illumina
HiSeq 2000 instruments, and found that their mapped reads from Nipponbare only covered
91.2% of the reference Nipponbare genome (IRGSP 1.0) and found that the
unassembled/unaligned regions are highly enriched for high copy repeats too complex to be
assembled. And the mapped reads from other two rice varieties covered about 88% of the
reference genome. In RGD-7S and Taifeng B, 91.61% and 89.92% of total bases had high
quality scores of 30 (Q30), which was much higher than the rate of 74% reported by Minoche
Page 15 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
(2012) using HiSeq 2000 PE100.
In the present study, we identified a total of 2,758,740 polymorphic sites including 2,408,845
SNPs and 349,895 InDels between two indica rice varieties and Nipponbare genomes, and
discovered 961,791 SNPs and 46,640 InDels between the two indica rice varieties. The
polymorphic analysis of genome sequence among different rice lines was reported in the
previous studies (Feltus et al., 2004; Subbaiyan et al., 2012; Jain et al., 2014). Apart from the
genetic background of rice lines themselves, the different filtering conditions may result in
significant difference in the density of DNA polymorphism The average density of DNA
polymorphism between RGD-7S and Taifeng B was 256.8 SNPs and 12.5 InDels per 100 kb,
which will facilitate marker development and accelerate the cloning for hybrid weakness
genes.
The nonrandom distribution of SNPs and the base substitutions
Similar to the previous reports, we found that there was uneven distribution of SNPs and
InDels on the chromosomes. The investigation of the number of SNPs and InDels in the two
indica rice varieties showed that the highest variant frequency occurred on Chr.10. The DNA
polymorphism was mainly distributed in the intergenic, upstream, downstream regions and
introns of genes. This is mainly because these regions were subjected to relaxed selection
pressure in evolution and their change could influence the gene expression but not the gene
function. Our study indicated that the proportions of genic SNPs was about 11.5%, 37.9% and
50.6 % in UTRs, coding regions (including exon, splicing site and start codon) and intronic
Page 16 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
regions, respectively. Compared to McNally’ report (2009), the intronic regions harbored
more SNPs in our study. They found that approximately 2.7% of the rice genes contained
large-effect SNPs. However, there were 8,205 and 7,950 genes with large-effect SNPs in
RGD-7S and Taifeng B, respectively, which was about 20 % of the total number of annotated
rice genes. RGD-7S and Taifeng B belong to indica rice lines, and the japonica rice line
Nipponbare genome was used as the reference genome, which may be biased towards having
more large-effect SNPs. And the identification of large-effect SNPs also depends on the
annotation of gene models (Zheng et al., 2011). It seems possible that variants with
large-effect SNPs may lie within alternate transcripts that have not yet been described or
within transcripts that have been incorrectly annotated as coding. Premature termination
codons are usually destroyed through the process of nonsense-mediated decay. And it was
found that naturally occurring stop codon-gain variants are generally not expressed unless
they have secondary annotations in or near other transcripts (Cirulli et al., 2011). Cao et al
(2011) found 6,197 genes with 12,468 SNPs that caused large effects on gene structure in
Arabidopsis. Similarly, Tan et al (2012) detected 7,602 genes with large effect SNPs in
Arabidopsis.
As for the substitution of bases, the ratio of transition to transversion (Ts/Tv) in both rice
varieties was 2.5 in this study, which indicates a considerable bias in favor of transitions over
transversions similar to the findings from previous reports. For instance, Subbaiyan et al
(2012) reported a Ts/Tv ratio of 2.0 in six rice elite lines. In the recently completed 3000 rice
Page 17 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
genome project, a Ts/Tv ratio of 2.3 was found from the total number of 20 million rice SNPs
(Alexandrov et al., 2015). Vignal et al (2002) reported that the proportion of Ts/Tv ranged
from 2.3 to 4.0 in chicken genome.Transitions are more likely to maintain the structure of the
DNA double helix than transversions due to their conformational advantage in case of
mispairing and better tolerance during natural selection, which contributes to a higher
frequency of transitional mutations over transversions (Wakeley, 1996).
Copy number variations
CNVs can create new genes, alter gene dosage and reshape gene structures. They are
considered likely major sources of genetic variation, and may influence phenotypic variation,
gene expression and fitness (Yu et al., 2011). We detected a total of 2727 and 2010 CNVs in
RGD-7S and Taifeng B genomes, respectively. Except for some being overlapped with the
intergenic regions, over half of the CNVs were annotated within genes. Most of these
annotated genes belong to transposable elements, while the rest contained the genes with
alternative splicing sites and the resistance genes with the NBS-LRR structure. Similar to
previous reports, we also found that the CNVs tended to be located near the ends of the
chromosomes. Surprisingly, we detected 581 CNVs in the first 40 kb interval from 0 to 40000
on Chr.9. Further analysis indicated that these CNVs were overlapped with the region
between two adjacent retrotransponsons. The transposable elements were considered to play
an important role in the formation of CNVs (Hastings et al., 2009; Conrad et al., 2010) .Yu et
al (2011) also found genes in many CNVs were involved in resistance in rice, and resistance
Page 18 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
genes in plants tended to cluster at the same loci within genome (Huibert et al., 2001).
Glessne et al (2013) observed that significant enrichment of alternative splicing genes
impacted by the CNVs in human.
Although the phenotypic effects of CNVs in plants have not been confirmed directly, recent
studies in maize showed its potential contribution to the heterosis of this crop during
domestication and disease responses (Springer et al., 2009; Lai et al., 2010). Gene-containing
CNVs may influence the expression level of these genes. Hence, CNVs have the potential to
affect the downstream phenotype and reproductive fitness ultimately (Perry, 2008). We
previously fine mapped the hybrid weakness gene Hw3 and suggested the gene
LOC_Os11g44310 encoding a putative calmodulin binding protein as its candidate gene, and
BLAST analysis indicated that LOC_Os11g44310 only existed in japonica genome (Fu et al.,
2013). In this study, reads mapping confirmed its presence in RGD-7S and absence in Taifeng
B. In most cases of hybrid weakness or necrosis, the causative genes only existed in one of
the parents. The potential combinations of single-copy genes with CNVs (presence or absence)
in hybrids also provide the opportunity for novel gene complements in hybrids relative to the
parental lines (Springer et al., 2009). The type of hybrid weakness we reported only occurred
under the condition of low temperature. The same F1 progenies showed hybrid vigor when
they were planted under high temperatures conditions. We also analyzed the transcriptional
profiles of both parental varieties and their F1 progenies grown under high and low
temperatures conditions, respectively (data not published), and found that certain pathways
Page 19 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
were specially activated at low temperatures compared with those at high temperatures.
Therefore, understanding gene expression regulated by the single-copy gene containing CNVs
in F1 progeny under different conditions might contribute to building a bridge between hybrid
weakness and vigor and accelerating the elucidation of the molecular mechanism of heterosis.
InDels as molecular markers for fine mapping hybrid weakness genes
In contrast to SNPs, which have been studied extensively, InDels have received less attention.
Although small InDels up to a few base pair in length may be called by sensitive alignment
tools in the routine re-sequencing process (Grimm et al., 2013), InDels were not evaluated in
validation studies in most cases, and if any, only a small subset was examined (Mullaney et
al., 2010).We detected a quantity of InDels in RGD-7S and Taifeng B, which will lay a solid
foundation for map-based cloning of hybrid weakness genes. Based on sequence results, we
designed 20 and 36 InDel markers in the previous mapping interval of Hw3 and Hw4,
respectively, to test the authenticity of these InDels and develop polymorphic markers for
further map-based cloning of hybrid weakness genes. Except 4 markers failing in PCR
amplification, 37 out of 52 successfully amplified InDel markers showed polymorphism
between RGD-7S and Taifeng B. Approximate 71% of InDels could be validated, which was
similar to a previous study in which false negative rate ranged from 10% to 35% (Mullaney et
al., 2010). Currently, pair-end reads are widely used to detect InDels. However, output files
generated by some relative software mostly produce lists of potential variants with probability
scores and coverage values where the exact level of accuracy is unknown (Pelleymounter et
Page 20 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
al., 2011). We also found that the authenticity of these InDels was not directly associated with
coverage values. In general, the sensitivity of the detection of InDels of a certain size is not
overly dependent on the InDel frequency itself, and for increasing InDel size, the sensitivities
differ for the different mapping tools in datasets of a mean sequencing depth of 18 and 36 bp
short reads (Krawitz et al., 2010). Hence, the detection and validation of InDels remains a
challenge, and bioinformatic algorithm and analytical tools await improvement to provide
more satisfied solution.
Acknowledgements
This work was supported in part by grants from the Guangdong Provincial Natural Science
Fund (2014A030313573 ), the National ‘863’ project (2011AA 10A 101), the earmarked fund
for Modern Agro-industry Technology Research System (CARS-01-10), and the President
Fund of Guangdong Academy of Agricultural Sciences (201402).
Reference
Alexandrov, N., Tai, S., Wang, W., Mansueto, L., Palis, K., Fuentes, R.R., Ulat, V.J.,
Chebotarov, D., Zhang, G., Li, Z., Mauleon, R., Hamilton, R.S., and McNally, K.L. 2015.
SNP-Seek database of SNPs derived from 3000 rice genomes. Nucleic. Acids Res.
43:1023-1027. doi: 10.1093/nar/gku1039.
Arai-Kichise, Y., Shiwa,Y., Nagasaki, H., Ebana, K., Yoshikawa, H., Yano, M., and Wakasa,
K. 2011. Discovery of genome-wide DNA polymorphisms in a landrace cultivar of Japonica
rice by whole-genome sequencing. Plant Cell Physiol. 52: 274–282. doi: 10.1093/pcp/pcr003.
Page 21 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Boeva, V., Popova, T., Bleakley, K., Chiche, P., Cappo, J., Schleiermacher, G.,
Janoueix-Lerosey, I., Delattre, O., Barillot, E. 2012. Control-FREEC: a tool for assessing
copy number and allelic content using next-generation sequencing data. Bioinformatics,
28:423–425. doi: 10.1093/bioinformatics/btr670.
Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C.,
Stegle, O., and Lippert, C. 2011.Whole-genome sequencing of multiple Arabidopsis thaliana
populations. Nat. Genet. 43(10): 956–963. doi: 10.1038/ng.911.
Chen, C., Chen, H., Lin, Y.S., Shen, J.B., Shan, J.X., Qi, P., Shi, M., Zhu, M.Z., Huang, X.H.,
Feng, Q., Han, B., Jiang, L.W., Gao, J.P., and Lin, H.X. 2014. A two-locus interaction causes
interspecific hybrid weakness in rice. Nat. Commun. 5:3357. doi: 10.1038/ncomms4357.
Chen, C., Chen, H., Shan, J.X., Zhu, M.Z., Shi, M., Gao, J.P., and Lin, H.X. 2013. Genetic
and physiological analysis of a novel type of interspecific hybrid weakness in rice. Mol. Plant
6: 716–728. doi: 10.1093/mp/sss146.
Cheng, S.H., Cao, L.Y., Zhuang, J.Y., Chen, S.G., Zhan, X.D., Fan, Y.Y., Zhu, D.F., and Min,
S.K. 2007. Super hybrid rice breeding in China: achievements and prospects. J. Integr. Plant
Biol. 49 (6): 805−810. doi: 10.1111/j.1744-7909.2007.00514.x.
Cirulli, E.T., Heinzen, E.L., Dietrich, F.S., Shianna, K.V., Singh, A., Maia, J.M., Goedert, J.J.,
Goldstein, D.B. 2011. A whole-genome analysis of premature termination codons. Genomics.
98:337–342. doi:10.1016/j.ygeno.2011.07.001.
Conrad, D.F., Pinto, D., Redon, R., Feuk, L., Gokcumen, O., Zhang, Y., Aerts, J., Andrews,
Page 22 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
T.D., Barnes, C., Campbell, P., and Fitzgerald, T. 2010. Origins and functional impact of
copy number variation in the human genome. Nature. 464:704-712. doi: 10.1038/nature08516.
Feltus, A., Wan, J., Schulze, S.R., Estill, J.C., Jiang, N., and Paterson, A.H. 2004. An SNP
resource for rice genetics and breeding based on subspecies indica and japonica genome
alignments. Genome Res. 14: 1812–1819. doi:10.1007/s00122-011-1633-5.
Fu, C.Y., Wang, F., Sun, B.R., Liu, W.G., Li, J.H., Deng, R.F., Liu, D.L., and Liu, Z.R.
2013.Genetic and cytological analysis of a novel type of low temperature-dependent
intrasubspecific hybrid weakness in rice. PLoS ONE. 8(8): e73886.
doi:10.1371/journal.pone.0073886.
Glessner, J.T., Smith, A.V., Panossian, S., Kim, C.E., Takahashi, N., Thomas, K.A., Wang, F.,
Seidler, K., Harris, T.B., and Launer, L.J. 2013. Copy number variations in alternative
splicing gene networks impact lifespan. PloS ONE. 8(1): e53846. doi:
10.1371/journal.pone.0053846.
Grimm, D., Hagmann, J., Koenig, D., Weigel, D., and Borgward, K. 2013. Accurate indel
prediction using paired-end short reads. BMC Genomics, 14: 132. doi:
10.1186/1471-2164-14-132.
Han, B. and Huang, X.H. 2013. Sequencing-based genome-wide association study in rice.
Curr. Opin. Plant Bio. 16: 133–138. doi: 10.1016/j.pbi.2013.03.006.
Hastings, P.J., Lupski, J.R., Rosenberg, S.M., and Ira, G. 2009. Mechanisms of change in
gene copy number. Nature Rev. Genet. 10: 551-564. doi: 10.1038/nrg2593.
Page 23 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Huang, X., Lu, T., and Han, B.. 2013. Resequencing rice genomes: an emerging new era of
rice genomics. Trends Genet. 29: 225–232. doi: 10.1016/j.tig.2012.12.001.
Hulbert, S.H., Webb, C.A., Smith, S.M., and Sun, Q. 2001. Resistance gene complexes:
evolution and utilization. Annu. Rev. Phytopathol. 3(9):285-312. doi:
10.1146/annurev.phyto.39.1.285.
Ichitani, K., Namigoshi, K., Sato, M., Taura, S., Aoki, M., Matsumoto, Y., Saitou, T.,
Marubashi, W., and Kuboyama, T. 2007. Fine mapping and allelic dosage effect of Hwc1, a
complementary hybrid weakness gene in rice. Theor. App. Genet. 114: 1407–1415.doi:
10.1007/s00122-007-0526-0.
Ichitani, K., Taura, S., Tezuka, T., Okiyama, Y., and Kuboyama, T. 2011. Chromosomal
location of HWA1 and HWA2, complementary hybrid weakness genes in rice. Rice. 4(2):
29–38. doi 10.1007/s00122-007-0553-x.
Jain, M., Moharana, K.C., Shankar, R., Kumari, R., and Garg, R. 2014. Genome wide
discovery of DNA polymorphisms in rice cultivars with contrasting drought and salinity stress
response and their functional relevance. Plant Biotechnol. J. 12: 253–264. doi:
10.1093/pcp/pcr003.
Krawitz, P., Rodelsperger, C., Jager, M., Jostins, L., Bauer, S., and Robinson, P.N. 2010.
Microindel detection in short-read sequence data. Bioinformatics. 26(6):722–729. doi:
10.1093/bioinformatics/btq027.
Kuboyama, T., Matsumoto, T., Wu, J.Z., Kanamori, H., Taura, S., Sato, M., Marubashi, W.,
and Ichitani, K. 2009. Fine mapping of HWC2, a complementary hybrid weakness gene, and
Page 24 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
haplotype analysis around the locus in rice. Rice. 2: 93–103. doi:10.1007/s00122-007-0526-0.
Lai, J., Li, R., Xu, X., Jin, W., Xu, M., Zhao, H., Xiang, Z., Song, W., Ying, K., and Zhang,
M. 2010.Genome-wide patterns of genetic variation among elite maize inbred lines. Nat.
Genet. 42: 1027-1030. doi: 10.1038/ng.684.
McCouch, S.R., Zhao, K., Wright, M., Tung, C.W., Ebana, K., Thomson, M., Reynolds, A.,
Wang, D., DeClerck, G., and Ali, M.L. 2010. Development of genome-wide SNP assays for
rice. Breed. Sci. 60: 524–535. doi.org/10.1270/jsbbs.60.524.
McNally, K.L., Childs, K.L., Bohnert, R., Davidson, R.M., Zhao, K., Ulat, V.J., Zeller, G.,
Clark, R.M., Hoen, D.R., and Bureau, T.E. 2009. Genomewide SNP variation reveals
relationships among landraces and modern varieties of rice. Proc. Natl. Acad. Sci. 106:
12273-12278. doi: 10.1073/pnas.0900992106.
Minoche, A.E., Dohm, J.C., and Himmelbauer, H. 2012. Evaluation of genomic
high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems.
Genome Biol. 12: 112. doi: 10.1186/gb-2011-12-11-r112.
Mullaney, J.M., Mills, R.E., Stephen, P.W, and Devine, S.E. 2010. Small insertions and
deletions (INDELs) in human genomes. Hum. Mol. Genet. 19(2): 131–136. doi:
10.1093/hmg/ddq400.
Muñoz-Amatriaín, M., Eichten, S.R., Wicker, T., Richmond, T.A., Mascher, M., Steuernagel,
B., Scholz, U., Ariyadasa, R., Spannagl, M., and Nussbaumer, T.2013. Distribution,
functional impact, and origin mechanisms of copy number variation in the barley genome.
Genome Biol. 14: R58. doi: 10.1186/gb-2013-14-6-r58.
Page 25 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Pelleymounter, L.L., Moon, I., Johnson, J.A., Laederach, A., Halvorsen, M., Eckloff, B., Abo,
R., and Rossetti, S. 2011. A novel application of pattern recognition for accurate SNP and
indel discovery from high-throughput data: Targeted resequencing of the glucocorticoid
receptor co-chaperone FKBP5 in a Caucasian population. Mol. Genet. Metab. 104: 457–469.
doi: 10.1016/j.ymgme.2011.08.019.
Perry, G.H. 2008. The evolutionary significance of copy number variation in the human
genome. Cytogenet. Genome Res. 123: 283–287. doi: 10.1159/000184719.
Quail, M.A., Smith, M., Coupland, P., Otto, T.D., Harris, S.R, Connor, T.R., Bertoni, A.,
Swerdlow, H.P., and Gu, Y. 2012. A tale of three next generation sequencing platforms:
comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC
Genomics. 13: 341. doi: 10.1186/1471-2164-13-341.
Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez, W.A., Gurtowski, J., Biggers, E., Lee, H.,
Kramer, M., Antoniou, E., and Ghiban, E. 2014. New whole genome de novo assemblies of
three divergent strains of rice (O. sativa) documents novel gene space of aus and indica.
Genome Biol. 15: 506. doi:10.1186/s13059-014-0506-z.
Springer, N.M., Ying, K., Fu, Y., Ji, T., Yeh, C.T., Jia, Y., Wu, W., Richmond, T., Kitzman,
J., and Rosenbaum, H. 2009. Maize inbreds exhibit high levels of copy number variation and
presence/absence variation (PAV) in genome content. PLoS Genet. 5(11): e1000734. doi:
10.1371/journal.pgen.1000734.
Subbaiyan, G.K., Waters, D.L., Katiyar, S.K., Sadananda, A.R., Vaddadi, S., Jain, M.,
Page 26 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Moharana, K.C., Shankar, R., Kumari, R., and Garg, R. 2012. Genome-wide DNA
polymorphisms in elite indica rice inbreds discovered by whole-genome sequencing. Plant
Biotechnol. J. 10: 623-634. doi: 10.1111/j.1467-7652.2011.00676.x.
Tan, S.J., Zhong, Y., Hou, H., Yang, S.H., and Tian, D.C. 2012. Variation of
presence/absence genes among Arabidopsis populations. BMC Evol. Biol. 12:86. doi:
10.1186/1471-2148-12-86.
Vignal, A., Milan, D., Cristobal, M.S., and Eggen, A. 2002. A review on SNP and other types
of molecular markers and their use in animal genetics. Genet. Sel. Evol. 34: 275–305.
doi:10.1186/1297-9686-34-3-275.
Wakeley, J. 1996. The excess of transitions among nucleotide substitutions: new methods of
estimating transition bias underscore its significance. Tree. 11: 158–162.
doi:10.1016/0169-5347(96)10009-4.
Yamamoto, T., Nagasaki, H., Yonemaru, J.I., Ebana, K., Nakajima, M., Shibaya, T., and
Yano, M. 2010. Fine definition of the pedigree haplotypes of closely related rice cultivars by
means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics. 11:
267. doi: 10.1186/1471-2164-11-267.
Yu, P., Wang, C.H., Xu, Q., Feng, Y., Yuan, X.P., Yu, H.Y., Wang, Y.P., Tang, S.X., and
Wei, X.H. 2011. Detection of copy number variations in rice using array-based comparative
genomic hybridization. BMC Genomics. 12: 372. doi: 10.1186/1471-2164-12-372.
Page 27 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Zheng, L.Y., Guo, X.S., He, B., Sun, L.J., Peng, Y., Dong, S.S., Liu, T.F. 2011.
Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor).
Genome Biology. 12: R114. doi:10.1186/gb-2011-12-11-r114.
Page 28 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 1 Coverage of the reads from resequencing of two indica rice varieties using Hiseq
2000 to the Nipponbare genome
RGD-7S Taifeng B
Site(Ref.Length) 374,305,986 374,305,986
Total bases 14,973,227,780 15,457,502,782
Read count 148,249,780 153,044,582
Mapped Site(>=1X) 320,035,665 324,501,839
Coverage (﹥=1X) 85.50% 86.69%
Q30 (%) 91.61 89.92
Mapped bases rate 73.34% 73.04%
GC (%) 41.51 41.48
MeanDepth 27.83 X 28.72 X
Table 2 The distribution of SNPs and InDels in RGD-7S and Taifeng B relative to
Nipponbare
Chr.
RGD-7S Taifeng B
No. of No. of No. of variant No. of No. of No. of variant
SNPs insertions deletions Frequency SNPs insertions deletions frequency
1 153152 24829 20733 459.8 227939 25220 21251 634.2
2 173186 20163 17455 586.6 195996 20134 17437 649.9
3 190535 20942 17467 628.7 194974 21103 17530 641.5
4 143323 13741 12053 476.3 147269 14052 12186 488.7
5 118289 12783 10719 473.3 126019 13348 11578 503.9
6 152214 14149 12369 572 160527 15174 12812 603.3
7 141106 13016 11733 558.5 138495 12655 11391 547.3
8 138697 13580 11469 575.7 135126 13194 11278 561.1
9 116863 10980 9674 597.6 83191 8635 7468 431.5
10 133108 11830 10536 675.2 133316 12006 10741 677.7
11 158935 13733 12278 637.3 154184 13348 11778 617.9
12 125148 11429 10002 532.4 124608 11364 9670 529
Totle 1744556 181175 156488 564.45 1821644 180233 155120 573.83
Note: The variant frequency is the number of DNA polymorphisms per 100 kb.
Page 29 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 3 The distribution and frequency of DNA polymorphism between RGD-7S and Taifeng
B
SNP variant
InDel variant
frequency frequency
Chr.1 103424 239 5922 13.7
Chr.2 94899 264.1 5068 14.1
Chr.3 73419 201.6 4498 12.4
Chr.4 69714 196.4 3189 9
Chr.5 74272 248 3577 11.3
Chr.6 84362 270 3963 12.7
Chr.7 75459 254.1 3523 11.9
Chr.8 68249 240 3313 11.6
Chr.9 62397 271.1 3178 13.8
Chr.10 59561 256.6 2823 12.2
Chr.11 84408 290.9 3751 12.9
Chr.12 111627 405.4 3835 13.9
Total 961791 257.7 46640 12.5
Table 4 The number and ratio of variants in different regions
RGD-7S Taifeng B
Count Percent
(%) Count
Percent
(%)
DOWNSTREAM 2,466,715 30.44 2,492,284 30.43
EXON 408,077 5.04 412,706 5.04
INTERGENIC 1,846,976 22.79 1,868,258 22.81
INTRON 573,136 7.07 577,108 7.05
SPLICE SITE 19,295 0.24 19,455 0.24
TRANSCRIPT 851 0.01 832 0.01
START CODON 2,658,060 32.80 2,689,545 32.84
UTR_3_PRIME 78,674 0.97 78,496 0.96
UTR_5_PRIME 51,708 0.64 51,740 0.63
Page 30 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 5 The distribution of large-effect SNPs in genic regions
RGD-7S Taifeng B
No. No.of annonated genes No. No.of annonated genes
Splicing site* 909 863 847 804
Stop gain 6141 4496 5987 4358
Stop loss 1135 1109 1132 1118
Start gain 1430 1156 1383 1125
Start loss 598 581 558 545
Total 10213 8205 9907 7950
*:including splicing site acceptor and donor
Table 6 The distribution of CNVs in RGD-7S and Taifeng B
Chr. 1 2 3 4 5 6 7 8 9 10 11 12 Total
RGD-7S 217 153 129 243 159 167 153 352 819 228 49 58 2727
Taifeng B 130 116 126 238 214 149 230 133 255 236 29 154 2010
Table 7 The number and ratio of base substitutions
Substitution RGD-7S Taifeng B
Transitions (Ts) A/G 862380 870351
C/T 865241 874671
Transversion (Tv)
C/G 129727 132246
T/A 188201 191670
A/C 182417 184581
G/T 180721 183445
Ts/Tv ratio 2.5366 2.5219
Page 31 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure captions:
Figure 1 The distribution of different InDel sizes in RGD-7S and Taifeng B
Figure 2 The ratio of DNA variants in different regions of genes in RGD-7S and Taifeng B
Figure 3 The distribution of GO-term enriched of genes containing DNA variations in
RGD-7S and Taifeng B
Figure 4 The distribution of enrichment of variant genes in RGD-7S and Taifeng B
Blue: variant genes in a functional term; Red: the total of background genes in a functional
term. X-axis represents the percentage of variant genes in the total genes enriched in a given
term. It represents significant enrichment of variant genes in a given term that Blue bands are
longer than red bands.
Figure 5 The reads mapping from RGD-7S and Taieng B in the interval of the candidate gene
Hw3 (LOC_Os11g44310)
Approximate 20 reads from Taifeng B were mapped discontinuously in the interval of
LOC_Os11g44310 based on the reference genome, and the reads from RGD-7S completely
covered in the candidate region.
Page 32 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
209x148mm (300 x 300 DPI)
Page 33 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
68x51mm (600 x 600 DPI)
Page 34 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
30x20mm (600 x 600 DPI)
Page 35 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
243x188mm (300 x 300 DPI)
Page 36 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
238x131mm (300 x 300 DPI)
Page 37 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Supplement table 1 The specific variant genes in RGD-7S and Taifeng B
RGD-7S Taifeng B
LOC_Os01g25130 LOC_Os07g03319 LOC_Os01g01019 LOC_Os04g16794 LOC_Os07g02820 LOC_Os11g10610
LOC_Os01g46600 LOC_Os07g03409 LOC_Os01g01030 LOC_Os04g16798 LOC_Os07g02900 LOC_Os11g14490
LOC_Os01g57324 LOC_Os07g03418 LOC_Os01g01302 LOC_Os04g26740 LOC_Os07g02910 LOC_Os11g15300
LOC_Os01g71280 LOC_Os07g41510 LOC_Os01g01307 LOC_Os04g26750 LOC_Os07g02920 LOC_Os11g15730
LOC_Os01g73130 LOC_Os07g49540 LOC_Os01g11820 LOC_Os04g30750 LOC_Os07g03368 LOC_Os11g15740
LOC_Os01g73560 LOC_Os08g14460 LOC_Os01g14850 LOC_Os04g43440 LOC_Os07g03377 LOC_Os11g27730
LOC_Os01g73570 LOC_Os08g15258 LOC_Os01g16340 LOC_Os05g09340 LOC_Os07g04400 LOC_Os11g35820
LOC_Os02g03320 LOC_Os08g15264 LOC_Os01g36670 LOC_Os05g16230 LOC_Os07g04410 LOC_Os11g40900
LOC_Os02g06760 LOC_Os08g15266 LOC_Os01g57294 LOC_Os05g16290 LOC_Os07g05110 LOC_Os11g40910
LOC_Os02g06770 LOC_Os08g15268 LOC_Os01g59240 LOC_Os05g16300 LOC_Os07g05260 LOC_Os11g40920
LOC_Os02g15460 LOC_Os08g15272 LOC_Os02g03420 LOC_Os05g18270 LOC_Os07g05310 LOC_Os11g46060
LOC_Os03g59220 LOC_Os08g15276 LOC_Os02g03790 LOC_Os05g18780 LOC_Os07g06300 LOC_Os11g46150
LOC_Os04g03510 LOC_Os08g15278 LOC_Os02g04100 LOC_Os05g19430 LOC_Os07g06315 LOC_Os12g03060
LOC_Os04g03520 LOC_Os08g20340 LOC_Os02g04325 LOC_Os05g22980 LOC_Os07g06325 LOC_Os12g03070
LOC_Os04g16848 LOC_Os09g37490 LOC_Os02g16360 LOC_Os05g23430 LOC_Os07g06335 LOC_Os12g10250
LOC_Os04g16854 LOC_Os09g37495 LOC_Os02g37750 LOC_Os05g30670 LOC_Os07g09560 LOC_Os12g11260
LOC_Os04g24970 LOC_Os09g37500 LOC_Os03g01170 LOC_Os06g10290 LOC_Os07g09570 LOC_Os12g11840
LOC_Os04g25020 LOC_Os10g01008 LOC_Os03g04550 LOC_Os06g17160 LOC_Os07g24270 LOC_Os12g12120
LOC_Os04g35590 LOC_Os10g14250 LOC_Os03g04580 LOC_Os06g17170 LOC_Os08g07400 LOC_Os12g12130
LOC_Os04g35690 LOC_Os10g14260 LOC_Os03g16470 LOC_Os06g34880 LOC_Os08g07410 LOC_Os12g17400
LOC_Os04g40940 LOC_Os10g14290 LOC_Os03g17220 LOC_Os06g34890 LOC_Os08g08540 LOC_Os12g28590
LOC_Os04g50212 LOC_Os11g44310 LOC_Os03g17230 LOC_Os06g34970 LOC_Os08g08750 LOC_Os12g30270
LOC_Os04g50216 LOC_Os12g11230 LOC_Os03g17240 LOC_Os06g37170 LOC_Os08g38300 LOC_Os12g33870
Page 38 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
LOC_Os04g54270 LOC_Os12g11240 LOC_Os03g17250 LOC_Os06g45700 LOC_Os08g40130 LOC_Os12g34056
LOC_Os05g18860 LOC_Os12g11250 LOC_Os03g17330 LOC_Os06g45710 LOC_Os08g40140
LOC_Os06g10700 LOC_Os12g34018 LOC_Os03g17340 LOC_Os07g01550 LOC_Os10g03100
LOC_Os06g15950 LOC_Os12g34064 LOC_Os03g25410 LOC_Os07g01720 LOC_Os10g04630
LOC_Os06g16530 LOC_Os12g34148 LOC_Os04g11540 LOC_Os07g02290 LOC_Os10g04640
LOC_Os06g37270 LOC_Os12g34154 LOC_Os04g14240 LOC_Os07g02640 LOC_Os10g04660
LOC_Os06g48420 LOC_Os12g35150 LOC_Os04g14250 LOC_Os07g02650 LOC_Os11g06259
LOC_Os07g03310 LOC_Os12g37010 LOC_Os04g16792 LOC_Os07g02660 LOC_Os11g09880
Page 39 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Supplement table 2 The InDel primers designed in the surround region of Hw3
Forwards primers Reverse primers Size(bp) Coverage
InDel1101 5' TAGGGGCGACTAAATGAAGG 3' 5' TTTTGTTGCTCGTTCACCTG 3 354 17 negative
InDel1102 5' CATTGTTCCACCATCGTCCT 3' 5' ATCAATGGGACAACACTCGC 3' 108 8 positive
InDel1103 5' ATCGCCACCTTGCCGAGTTC 3' 5' AGGGAGGATGTTAGGACCAGC 3' 327 2 positive
InDel1104 5' CCATAATCTCCACAATCGGC 3' 5' AGAGGACGAAGAAGAACGCC 3' 161 5 negative
InDel1105 5' TCTTCGCCACCTTCTCCCAA 3' 5' TACAATGTGTCGTTGCCTGC 3' 148 9 positive
InDel1106 5' ATCTTCGTCTTCCACCTCGC 3' 5' GCCAACTGGGAAACTGTGTC 3' 147 6 positive
InDel1107 5' CAAGAAAGCCACTCCCTGC 3' 5' CGGCTCTTGACCTTGCTGA 3' 272 10 positive
InDel1108 5' TTCGTGACGCTGGAGGTT 3' 5' ACTTGAGGAAATCGTGTGTT 3' 251 7 failure
InDel1109 5' GAACTTATGCGGAGGAGACG 3' 5' CGAGCACATCTCTCTCACATA 3' 162 34 negative
InDel1110 5' GCGATGTCGGTATCATTGCT 3' 5' GGGTGTTGGTCCTCATTGTA 3' 117 8 positive
InDel1111 5' TCACATCACGGAGCAGGAG 3' 5' AAACGGAGGAGAGGCAAGA 3' 141 3 positive
InDel1112 5' AGAGAGAGGGGGAAGAGAGA 3' 5' GGGAATGTTTCAGGTGCGAC 3' 202 14 positive
InDel1113 5' CCTGAAGTTCCCGAAGATTT 3' 5' ACGAGCATTGGAACTCAGAT 3' 138 21 positive
InDel1114 5' CGTGCGAAGTCAGAGGAGTC 3' 5' ACACACCCTAAGCCATCCAA 3' 224 3 positive
InDel1115 5' CTCAGCCAATAGCATCTCCG 3' 5' CAGGAGCAGCAAGAGAAACG 3' 322 10 negative
InDel1116 5' TAATAGTCGTGTCCGAGATGG 3' 5' CAACCCGAAAGCAAGAACT 3' 156 21 negative
InDel1117 5' TCCATTGAAGCCAACATCG 3' 5' TGGGCGAAGTCGTAAGAACAT 3' 313 6 positive
InDel1118 5' CTCCATTGAAGCCAACATCG 3' 5' TGAAGGGAGTTCCTATTGACC 3' 202 18 negative
InDel1119 5' TGATTTGACAATGTGGTGCTAC 3' 5' TTTGGGACGGATGGAGTAAG 3' 328 11 positive
InDel1120 5' TCCTTGTGGTGGTGCCTCA 3' 5' GGTCTCGTCATCTGCTTCAA 3' 429 31 negative
Page 40 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Supplement table 3 The InDel primers designed in the interval of Hw4
Forwards primers Reverse primers Size(bp) Coverage
InDel701 5' ATCAGTTTCGCCTATGGACG 3' 5' TCACCGTTCTAACTCCCAGC 3' 113 16 positive
InDel702 5' ATGGCAGATGTAAGCAAGCG 3' 5' CCTTTTTGCTTATGTGGAAC 3' 195 9 positive
InDel703 5' GACGGTAACAAAGTCGTTCAGA 3' 5' ATCAAACAGCGTGTCAAACT 3' 254 23 negative
InDel704 5' CCTCCGTTTTTTCATTCCTG 3' 5' TGAAACGGAGGAGAGTAACAAA 3' 223 14 positive
InDel705 5' GAGGAGGAGGATGAGGGCTA 3' 5' TTCGCTTCTGCCTTTATTCG 3' 370 11 positive
InDel706 5' TTTATGATGGAGGGAGTGTCTG 3' 5' GCCTTCAATCCTCACCAATC 3' 356 19 positive
InDel707 5' GGGAACAATGGTGTGTGCTTT 3' 5' CACAAGATGCGGCGAAGTTT 3' 464 9 positive
InDel708 5' AAATGCTCTGACTTGGGGAC 3' 5' CACGAATGTTTGGTTTACGC 3' 152 6 positive
InDel709 5' TTCTCCGCTCCTTCCGTT 3' 5' TAGGGTTTTAGTGAGGTGGG 3' 109 7 positive
InDel710 5' TGGAACAATGCCACTGCC 3' 5' TGATGACCGCCAGCAAGT 3' 248 13 failure
InDel711 5' TTTCTTCTCCGACCCACCAC 3' 5' GCCACCCCTTTTAGTATTGCT 3' 141 11 positive
InDel712 5' TAAACATAACCCAACAGCCG 3' 5' TTTCACCAGTAATCAGCCGT 3' 245 10 positive
InDel713 5' GGGGGAGAGAGAGAACGAAT 3' 5' CACAGGGGGATGAGGATGAC 3' 249 8 positive
InDel714 5' AGCCCTCCCCCACAAATAAT 3' 5' AAATAGGACAGGCAGCAGCG 3' 309 4 positive
InDel715 5' AGTAACCCGAACCCTTAGCAT 3' 5' AAAAAAAGGTCCTCCGTCCC 3' 256 5 positive
InDel716 5' TGAGTTCTGACGCAATGGGC 3' 5' GCTTGCTCTGACCCTTCCAT 3' 172 32 positive
InDel717 5' GTGCGAAAAACCACGGCTG 3' 5' CTGCGTGTCAAAAGCGGAAT 3' 273 9 positive
InDel718 5' GATGAGTTGTTTGGTTTGTTGC 3' 5' GATTGAGACCGTCGTAAAGATG 3' 339 11 positive
InDel719 5' CGTGAACGAACGAAGAATGAT 3' 5' TGAGGGACGACGACGACAA 3' 425 8 failure
InDel720 5' TCTGTTTTCTTGGCGGTGGT 3' 5' GTTCTGGATTGGGGTTTTCA 3' 262 8 failure
InDel721 5' CCATCGCAGTAGGGTCAGTG 3' 5' GGCGTGACGATAGGTGTGAC 3' 218 8 positive
InDel722 5' AAACGGACTGTAAACTTCGGT 3' 5' AGGTTCTTTGTAGCCTTCTCA 3' 184 5 positive
InDel723 5' CGTTCATCCATCCGTCTTATTT 3' 5' ATGCTATTTGTTGGTGACTCGT 3' 238 6 negative
InDel724 5' TGTCAGTGGGCGAGTGTGTT 3' 5' TGTTTATGGTCGTTGATTGCG 3' 244 5 negative
InDel725 5' GCACAGATTACTCCTCCCGA 3' 5' AATCATCTTGCGGTTGTTGT 3' 435 10 negative
InDel726 5' GCCAACCAAAGCAAATAGAGG 3' 5' CCCCTGATGGATACGAAACTC 3' 395 5 positive
InDel727 5' GGCGACGCTTTTTGAACTAC 3' 5' TGAGATTAGCAAGTTTTTCCCC 3' 411 19 positive
Page 41 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
InDel728 5' CTGGCACAATGGTGGCAACT 3' 5' CTGGGACTGTGAACACTGGG 3' 198 29 negative
InDel729 5' TTGGATGGTGGTCAATGGT 3' 5' TCCGAAAGGTGATGATAACA 3' 188 20 positive
InDel730 5' CCTGCTTGGTGTGTGCTCTT 3' 5' CTGCTGGTGTCTGAAACTCTG 3' 389 16 positive
InDel731 5' GCGAACTCACCGATGACCAA 3' 5' ATCTTTCCTTCTTGTCGCCG 3' 385 14 negative
InDel732 5' CGTGTCCTTCCTCATCTTGC 3' 5' ACGCTCGTAATGGCTTTGTT 3' 226 3 negative
InDel733 5' TGGCACGACAGAGTAGAGAATG 3' 5' GCTTATGATGCGTTATGCCTG 3' 174 9 positive
InDel734 5' GCTCGGAGAAACGCTCAGAT 3' 5' ACCACGGCACCATTTACCCT 3' 205 8 negative
InDel735 5' AGAAAAGTAATGCGGTGCCA 3' 5' GCAATAGCAATGCGTGTGGT 3' 107 9 positive
InDel736 5' TCTCCTGTCCTCATCTCCATT 3' 5' ACAGGGTTCATACCAGCAGG 3' 147 14 positive
Page 42 of 41
https://mc06.manuscriptcentral.com/genome-pubs
Genome