Download - Draft - University of Toronto T-Space · Draft Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B, revealed by whole genome re-sequencing Chong-Yun Fu

Draft

Genome-wide DNA polymorphism in the indica rice varieties

RGD-7S and Taifeng B, revealed by whole genome re-sequencing

Journal: Genome

Manuscript ID gen-2015-0101.R2

Manuscript Type: Article

Date Submitted by the Author: 10-Dec-2015

Complete List of Authors: Fu, Chongyun; Rice research institute, Guangdong academy of agricultural

sciences; Guangdong provincial Key Laboratory of New Technology in Rice Breeding Liu, Wu-Ge; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Liu, Di-Lin; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Li, Jin-Hua; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Zhu, Man-Shan; rice reserach institute, Guangdong academy of agricultural

sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Liao, Yi-Long; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Liu, Zhen-Rong; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Zeng, Xue-Qin; rice reserach institute, Guangdong academy of agricultural sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding Wang, Feng; rice reserach institute, Guangdong academy of agricultural

sciences; Guangdong Provincial Key Laboratory of New Technology in Rice Breeding

Keyword: rice, genome resequencing, SNPs, InDels, CNVs

https://mc06.manuscriptcentral.com/genome-pubs

Genome

Draft

Page 1 of 41


Genome

Draft

Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B,

revealed by whole genome re-sequencing

Chong-Yun Fu1,2，Wu-Ge Liu

1,2, Di-Lin Liu

1,2, Ji-Hua Li

1,2, Man-Shan Zhu

1,2,, Yi-Liao Liao

1,2,

Zhen-Rong Liu1,2

, Xue-Qin Zeng1,2

, Feng Wang1,2

*

1 Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640,

P.R. China.

2 Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou

510640, P.R. China.

* Corresponding author:

Name: Feng Wang

Address：No.3, Jinying East 1st Street, Tianhe District, Guangzhou City, Guangdong Province,

P.R. China.

Office Telephone: 086-020-38491901; Email: [email protected].

Page 2 of 41


Genome

Draft

Abstract

Next-generation sequencing technologies provide opportunities to understand the genetic

variation even in the closely related cultivars. We performed the whole genome resequencing

of two elite indica rice varieties RGD-7S and Taifeng B, whose F1 progeny showed hybrid

weakness and hybrid vigor when they were grown in the early- and late-cropping seasons,

respectively. Approximate 150 million 100-bp pair-end reads were generated, which covered

about 86% of rice (Oryza sativa L. Japonica. cv. Nipponbare) reference genome. A total of

2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in

RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified

961,791SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The

density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for

RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S,

1,989 of 2,727 CNVs were overlapped in 218 genes, and 1,231 of 2,010 CNVs were

annotated in 175 gene in Taifeng B. In addition, we verified a subset of InDels in the interval

of hybrid weakness genes Hw3 and Hw4, and obtained some polymorphic InDel markers,

which will provide a sound foundation for cloning hybrid weakness genes. Analysis of

genomic variations will also contribute to understanding the genetic basis of hybrid weakness

and heterosis.

Key words: rice, genome resequencing, SNPs, InDels, CNVs

Introduction

Page 3 of 41


Genome

Draft

Rice is one of the vital food crops for human nutrition worldwide. The utilization of heterosis

makes hybrid rice yield 10-20% higher than the conventional varieties (Cheng et al., 2007).

However, the presence of hybrid weakness potentially hinders the full utilization of heterosis,

especially in the intra-subspecific crosses. Most cases of the hybrid weakness occurred in the

crosses between wild and cultivated rice or between indica and japonica rice (Ichitani et al.,

2007, 2011; Kuboyama et al., 2009; Chen et al., 2013, 2014). We have recently reported a

type of low temperature-induced hybrid weakness in the crosses between two indica rice

varieties V1134 （SH527//GD-7S/BL122）and Taifeng A in our hybrid rice breeding

programme. And the F1 progeny from the cross between them showed hybrid weakness in the

early-growing seasons with relatively low temperatures (average daily temperature of

19.87 °C at seedling stage) and hybrid vigor in late-growing seasons with relatively high

temperatures (average daily temperature of 28.4 °C at seedling stage) (Fu et al., 2013). We

proved that the F1 progeny from the cross between the two elite indica rice lines RGD-7S

(GD-7S/BL122, one of parents of V1134) and Taifeng B (the maintainer line of Taifeng A

with the same nuclear genome) also showed hybrid weakness when they were grown in the

early seasons. Understanding the temperature-dependent switch between hybrid weakness and

hybrid vigor could contribute to elucidating the molecular mechanism of heterosis.

To fully understand the genetic basis of the low temperature-dependant hybrid weakness, we

conducted the fine mapping of the hybrid weakness genes. However, the lower genetic

polymorphisms between the two indica breeding parents hamper further cloning of the hybrid

Page 4 of 41


Genome

Draft

weakness genes. The advent of the next generation sequencing technologies (NGS)

contributes to discovering genome-wide genetic variation and genotyping in a highly efficient

way through large-scale re-sequencing of whole genomes (Han et al., 2013). Genetic variation

includes sequence variation and structure variation. Sequence variation generally consists of

SNPs, and short sequence insertions and deletions (InDels). Structure variation is usually

considered as gain/loss variations and copy number variations (CNVs), which include large

scale deletions, insertions, duplications, inversions and translocations. One of the important

roles of NGS technologies is to detect a large number of sequence polymorphisms, such as

single-nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) (Huang et al.,

2013), which have been widely applied in marker-assisted and genomic selection, QTL

mapping, positional cloning and haplotype and pedigree analysis because they are cheap,

stable and high throughput (McCouch et al., 2010). NGS technologies also assist in

understanding the distribution of SNPs and InDels and the variations of copy number of DNA

fragments and chromosome structure which can affect gene expression and function. Based

on the reference genome sequences of the japonica and indica rice cultivars, some rice lines

have been resequenced and a massive number of SNPs and InDels have been identified

(Arai-Kichise et al., 2011; Subbaiyan et al., 2012; Jain et al., 2014).

This study aimed to discover the genomic variations and DNA polymorphisms in two elite

indica rice varieties RGD-7S and Taifeng B by whole genome re-sequencing. The reads

generated were mapped to the Nipponbare genomic sequence

Page 5 of 41


Genome

Draft

(Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules ). To investigate the genomic

variation, SNPs, InDels and copy number variation (CNV) were comprehensively detected

and the DNA polymorphism between RGD-7S and Taifeng B in the candidate regions of

hybrid weakness genes were validated by PCR. The investigation of their whole genomic

variations will contribute to further cloning the hybrid weakness genes and understanding the

genetic basis underlying hybrid weakness and heterosis in hybrid rice.

Materials and Methods

Plant materials and sequencing

In the previous study of hybrid weakness (Fu et al., 2013), we found that the F1 progeny from

Taifeng A and V1134 (SH527//GD-7S/BL122) showed temperature-dependent hybrid

weakness. In order to understand the cause of the type of hybrid weakness, we carried the

following crosses: (1) between SH527 and Taifeng A, (2) between RGD-7S (GD-7S/BL122)

and Taifeng B (the maintainer line of Taifeng A with the same nuclear genome), and we

found that only the F1 progeny from the cross (2) displayed hybrid weakness in the

early-growing seasons. RGD-7S is a thermo-sensitive male sterility line carrying two resistant

genes Pi1 and Pi2 to rice blast, and Taifeng B is a slender-grain good quality maintainer line.

In this study, RGD-7S and Taifeng B were used to conduct the genomic resequencing to

further understand the cause of hybrid weakness. The genomic DNA of RGD-7S and Taifeng

B was extracted from 20-day-old seedlings using modified CTAB (Hexadecyl

trimethylammonium bromide) method and purified by Chloroform-phenol (1:1). RNA was

Page 6 of 41


Genome

Draft

removed by using RNAase A for 30 min at 37°C. The quality of isolated DNA was checked

by Agilent bioanalyzer 1000 (Agilent Technologies, Singapore). The library preparation was

performed according to the manufacturer’s protocol. The genomic re-sequencing was

conducted to generate paired-end 100-base long reads using Illumina Hiseq 2000 platform

(Illumina Technologies) by Macrogen (Seoul, Korea). The raw data generated were filtered

via standard Illumina pipeline. All of sequencing data of RGD-7S and Taifeng B have been

uploaded to NCBI, and the accession numbers of RGD-7S and Taifeng B are

SAMN04320540 and SAMN04320541, respectively.

Bioinformatics analysis

To obtain the clean data, the raw data were processed for base calling, quality evaluation,

removing the adaptor sequence and low-quality sequence using CASAVA (v1.8.2) and

FastQC software. The redundant data was discarded and the high-quality filtered reads were

mapped onto the rice reference genome (Oryza sativa L. japonica cv. Nipponbare, MSU7;

http://rice.plantbiology.msu.edu/index.shtml) using BWA software (v0.5.9-r16). The default

parameters were used in the BWA software. The SAMtools software (v0.1.18) was utilized to

detect SNPs and InDels and investigate their quantity, type and distribution on chromosomes

and gene coding regions. Copy number variations (CNVs) were investigated using

Control-FREEC software (FREEC v6.2) (Boeva et al., 2012). The parameters were set as

follows: ploidy:2, breakpoint threshold: 0.8, window:10 kb, expected GC content:0.35-0.55,

and min mappability per window:0.95. .

Page 7 of 41


Genome

Draft

InDel primers design and experimental validation

Based on the sequencing results, the polymorphic InDels with insertion or deletion size ≥10

bp were selected. The putative polymorphic InDel loci were scanned using Primer Premier

5.0 to design oligonucleotide primers. The optimized input parameters were as follows:

product size: 100–500 bp; primer size: 18–22 bp; primer Tm: 55-60°C; primer GC content:

40-60%. PCR was performed in a reaction volume of 20 µl containing 10 ng of template DNA,

0.2 mM of each primer, 0.2 mM of each dNTP, 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5

mM MgCl2, and 1 U of Taq DNA polymerase. Amplification was carried out in a S1000

thermocycler (Bio-Rad, USA) using the following conditions: 5 min at 94°C; followed by 35

cycles of 0.5 min at 94°C, 0.5 min at 55°C, and 0.5 min at 72°C; and a final extension of 5

min at 72°C. PCR products were separated in 3.0% agarose gels and stained with Gold View

(Applygen Technologies Inc).

Results

Reads mapping to Oryza sativa Nipponbare reference genome

Whole genome re-sequencing of RGD-7S and Taifeng B generated 148,249,780 and

153,044,582 100-bp pair-end reads, and yielded 14,973,227,780 and 15,457,502,782 bases

respectively (Table 1) and their mean depth of coverage was 27.83 X and 28.72 X. The rates

of high-quality (Q30) bases are 91.61% and 89.92% in RGD-7S and Taifeng B, respectively.

These reads were mapped to the Nipponbare genome (Release 7) as the reference rice genome

using Burrows-Wheeler Alignment (BWA) and Picard software. There were 320,035,665 and

Page 8 of 41


Genome

Draft

324,501,839 base-pairs from RGD-7S and Taifeng B mapped to the Nipponbare genome and

covered about 85.5% and 86.69% of the reference genome, respectively. Of these, only about

73% of high-quality reads could be mapped to the Nipponbare genome. These results

indicated that there was significant difference in the genome sequences between Nipponbare

and the two varieties.

Detection and distribution of DNA polymorphisms in RGD-7S and Taifeng B

Compared with the Nipponbare genome sequence, a total of 2,758,740 polymorphic sites

including 2,408,845 SNPs and 349,895 InDels were discovered in RGD-7S and Taifeng B

respectively, using SAMtools software,. Two filter conditions (coverage ≥10 and ≤100) were

applied to minimize the detection of false-positive SNPs and InDels, and heterozygous SNPs

and InDels were removed. Based on these filter conditions, the total numbers of DNA

polymorphisms were respectively 2,082,219 (1,744,556 SNPs and 337,663 InDels) and

2,156,997 (1,821,644 SNPs and 335,353 InDels) in RGD-7S and Taifeng B (Table 2). In

order to understand the distribution of DNA polymorphism, we investigated the variant

frequency (the number of DNA polymorphisms per 100 kb) in RGD-7S and Taifeng B. The

results showed that the variant frequency varied from 459.8 to 675.2 in RGD-7S and from

431.5 to 677.7 in Taifeng B, with the maximum on chromosome 10 (Chr.10) for both samples,

and the minimum on Chr.1 and Chr.9 for RGD-7S and Taifeng B, respectively.

To develop more polymorphic molecular markers for further fine mapping and cloning of the

hybrid weakness genes, DNA polymorphisms between RGD-7S and Taifeng B were further

Page 9 of 41


Genome

Draft

analyzed. Three filter conditions (coverage≥5, deletion of heterozygous sites and InDel ≥5bp)

were applied to screen the polymorphic SNPs and InDels between RGD-7S and Taifeng B.

As both RGD-7S and Taifeng B belong to indica rice varieties, the detected RGD-7S/Taifeng

B DNA polymorphisms included only 961,791 SNPs and 46,640 InDels (Table 3). The

density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for

RGD-7S/Taifeng B, which was far less than those between Nipponbare and them. The

distribution of polymorphic SNPs was different from that of polymorphic InDels. The most

polymorphic SNPs and InDels were found on Chr.12 and Chr 1, respectively, while Chr.10

had the least polymorphic SNPs and InDels..We also investigated the distribution of SNPs in

the different regions of genes, such as upstream, downstream, intergenic, intron and exon. The

distribution of SNPs are mainly in the downstream, upstream, intergenic, intron and exon of

genes, and their proportions are about 30%, 32%, 23%, 7% and 5%, respectively (Table 4).

The proportions of SNPs are only about 0.24%, 0.01%, 0.97% and 0.64% in the splice sites,

start codon regions, 3’and 5’untranslated regions, respectively.

In addition, we further analyzed the distribution of so-called large-effect SNPs which are

considered to potentially affect gene function in RGD-7S and Taifeng B genomes (Table 5).

In the RGD-7S genome, 6141 SNPs were expected to cause premature stop codons, 1135 to

lose the annotated stop codons, 598 to lose the start codons, 1430 to gain start codons, and

909 to disrupt splicing donor or acceptor sites. In Taifeng B, 5987 were supposed to gain

premature stop codons, 1132 to alter annotated stop codons, 558 to lose start codons, 1383 to

Page 10 of 41


Genome

Draft

obtain start condons and 847 SNPs to influence splicing donor or acceptor sites.

Detection of copy number variation (CNV) in RGD-7S and Taifeng B

Copy number variants (CNVs) are currently defined as unbalanced changes in the genome

structure and they include deletions, insertions, and duplications of over 50 bp in size

(Muñoz-Amatriaín et al., 2013). CNVs can influence gene transcriptional and translational

level. We utilized FREEC software to detect the copy number variations (CNVs) in RGD-7S

and Taifeng B, and obtained 2727 and 2010 CNVs in RGD-7S and Taifeng B, respectively

(Table6). Eight hundreds and nineteen CNVs were detected on Chr.9 in RGD-7S, and

surprisingly there were as high as 581 gains in the first 40 kb interval from 0 to 40000 on

Chr.9. Relatively fewer CNVs were found on Chr.11 for both RGD-7S and Taifeng B, and

only 49 and 29 CNVs were detected, respectively. In addition, we found that 126 and 119

segments were absent in RGD-7S and Taifeng B genomes, respectively. In RGD-7S genome,

738 and 1989 CNVs were respectively overlapped in the intergenic regions and 218 genes

including 119 transposable elements, 12 genes with alternative splicing sites and 8 resistance

genes. In Taifeng B genome, 779 and 1231 CNVs respectively occurred in the intergenic

regions and 175 genes containing 80 transposable elements, 21 genes with alternative splicing

sites and 5 resistance genes.

Analysis of SNPs and InDels in RGD-7S and Taifeng B

Based on the nucleotide substitutions, the polymorphic SNPs were divided into two classes,

namely, transitions (C/T and G/A) and transversions (C/A, G/C, A/T, T/G). The total

Page 11 of 41


Genome

Draft

transitions detected were 1,727,621 and 1,745,022 and the transversions detected were

681,066 and 691,942 in RGD-7S and Taifeng B, respectively (Table 7). The ratio of

transitions to transversions (Ts/Tv) was about 2.5 in the two varieties. The numbers of C/T

and G/A transitions are roughly equal, while the G/C transversions were considerably less

than other forms of transversions (C/A, A/T, T/G).

The total InDels detected were 349,895 (177,194 deletions and 172,701 insertions) in

RGD-7S and 352,336 (174388 deletions and 177948 insertions) in Taifeng B. The size of

deletions ranged from 1 to 43 bp for both RGD-7S and Taifeng B, and that of insertions were

up to 28 and 29 bp in RGD-7S and Taifeng B, respectively. About 91% of insertions and

deletions were in the range of 1-9 bp, while large insertions and deletions (≥10-bp) only

accounted for approximately 9% (Figure 1), which was very convenient for gel-based

genotyping in genetic mapping and marker-assisted selection.

Annotation of SNPs and InDels

The annotations of Nipponbare genome (Release 7.0) was used to investigate the distribution

of SNPs and InDels in RGD-7S and Taifeng B. Similar distribution patterns of SNPs and

InDels were found in RGD-7S and Taifeng B (Figure 2). Interestingly, the variants in the

upstream (32.8%) and downstream (30.4%) of the genes were more than those in the

intergenic regions (22.8%) in both lines. Excluding the heterozygous sites, 319,506 and

311,257 SNPs and 15,316 and 14,706 InDels were detected in the coding regions of RGD-7S

and Taifeng B, respectively. In order to minimize the ratio of false position, a rigorous filter

Page 12 of 41


Genome

Draft

condition (coverage ≥10) was used. We detected 254,155 and 263,125 SNPs and 11,233 and

12,518 InDels in 39,641 and 39,338 gene-coding regions annotated in RGD-7S and Taifeng B

genomes, respectively. GO analysis indicated that the variant genes were mainly distributed in

the terms of cell component (9208 and 9236), catalytic activity (6319 and 6341), binding

activity (6637 and 6658), cellular process (7976 and 7982) and metabolic process (8682 and

8675) in Taifeng B and RGD-7S, respectively (Figure 3). Further significant enrichment

analysis showed that the variant genes from Taifeng B and RGD-7S were mainly grouped in

the nearly same categories such as kinase activity, carbohydrate binding, cellular protein

modification process, nucleotide binding, signal transduction, hydrolase activity, and stress

response (Figure 4). The rest of variant genes from Taifeng B were enriched in the categories

of response to biotic stimulus and cell death, while those from RGD-7S were enriched in the

categories of hydrolase activity and response to extracellular stimulus. BLAST analysis

demonstrated that 61 and 117 special variant genes were present in RGD-7S and Taifeng B

(Supplementary Table 1), respectively.

Validation of the partial sequencing results in the surrounding intervals of two hybrid

weakness genes Hw3 and Hw4

To validate the re-sequencing results and develop polymorphic markers for further fine

mapping of the hybrid weakness genes Hw3 and Hw4, we respectively designed 20 and 36

InDel markers around the mapping intervals of Hw3 (Supplementary Table 2) and Hw4

(Supplementary Table 3) based on the re-sequencing results. PCR analysis indicated that 13

Page 13 of 41


Genome

Draft

out of 20 markers showed polymorphism between Taifeng B and RGD-7S in the surrounding

interval of Hw3, except one marker failing in PCR amplification. Three of the 13 polymorphic

InDel markers (InDel1111, InDel1112 and InDel1113) harbored in the mapping interval (136

kb) of Hw3, which will narrow down the candidate interval of Hw3. To further test the

sequencing analysis results, the candidate gene LOC_Os11g44310 for Hw3 was analyzed (Fu

et al. 2013). Reads mapping indicated that high-quality reads from RGD-7S encompassed the

complete sequence of LOC_Os11g44310, while only about 20 reads from Taifeng B were

non-continuously anchored in the region of LOC_Os11g44310 (Figure 4), which perfectly

matched our previous results of PCR amplification. On the other hand, Hw4 gene was mapped

in about 15 cM interval on Chr.7 in our previous study. PCR assay showed that 24 out of 36

InDel markers displayed polymorphism between Taifeng B and RGD-7S in the 15 cM

interval of Hw4 except 3 markers failing in PCR amplification.

Discussion

In this study, we comprehensively studied the genome diversity of two indica rice varieties

RGD-7S and Taifeng B, and obtained a global view of genomic variation of both varieties

including SNPs, InDels and CNVs.

Reads mapping and detection of genome-wide DNA polymorphism

NGS technology enabled us to produce massive sequence output and made high-throughput

DNA marker discovery feasible and cost-effective. The whole genome of two indica rice

varieties were re-sequenced and mapped to Nipponbare as a reference genome to uncover

Page 14 of 41


Genome

Draft

genome-wide DNA variants for further understanding the genetic basis of hybrid weakness

and to develop polymorphic molecular markers between them for map-based cloning of

hybrid weakness genes. The mean depth of coverage was 27.83x and 28.72x for RGD-7S and

Taifeng B, respectively. About 73% of high-quality data obtained from the two varieties

could be mapped on the Nipponbare reference genome, and the mapped reads covered about

86% of the reference genome, which was similar to the coverage reported in the previous

studies (Yamamoto et al., 2010; Subbaiyan et al., 2012; Jain et al., 2014). This indicated that

there was large genetic difference between the two indica rice varieties and Nipponbare and

also reflected the inherent differences in genomes accumulated through genetic differentiation

between the indica-japonica subspecies. In addition, the coverage bias was observed with

libraries prepared using both enzymatic and physical shearing, although use of Illumina

sequencing technology with libraries prepared without amplification led to the least biased

coverage (Quail et al., 2013). Recently, Schatz et al (2014) assembled three rice genome

sequences including Nipponbare, indica rice IR64 and aus rice DJ123 based on Illumina

HiSeq 2000 instruments, and found that their mapped reads from Nipponbare only covered

91.2% of the reference Nipponbare genome (IRGSP 1.0) and found that the

unassembled/unaligned regions are highly enriched for high copy repeats too complex to be

assembled. And the mapped reads from other two rice varieties covered about 88% of the

reference genome. In RGD-7S and Taifeng B, 91.61% and 89.92% of total bases had high

quality scores of 30 (Q30), which was much higher than the rate of 74% reported by Minoche

Page 15 of 41


Genome

Draft

(2012) using HiSeq 2000 PE100.

In the present study, we identified a total of 2,758,740 polymorphic sites including 2,408,845

SNPs and 349,895 InDels between two indica rice varieties and Nipponbare genomes, and

discovered 961,791 SNPs and 46,640 InDels between the two indica rice varieties. The

polymorphic analysis of genome sequence among different rice lines was reported in the

previous studies (Feltus et al., 2004; Subbaiyan et al., 2012; Jain et al., 2014). Apart from the

genetic background of rice lines themselves, the different filtering conditions may result in

significant difference in the density of DNA polymorphism The average density of DNA

polymorphism between RGD-7S and Taifeng B was 256.8 SNPs and 12.5 InDels per 100 kb,

which will facilitate marker development and accelerate the cloning for hybrid weakness

genes.

The nonrandom distribution of SNPs and the base substitutions

Similar to the previous reports, we found that there was uneven distribution of SNPs and

InDels on the chromosomes. The investigation of the number of SNPs and InDels in the two

indica rice varieties showed that the highest variant frequency occurred on Chr.10. The DNA

polymorphism was mainly distributed in the intergenic, upstream, downstream regions and

introns of genes. This is mainly because these regions were subjected to relaxed selection

pressure in evolution and their change could influence the gene expression but not the gene

function. Our study indicated that the proportions of genic SNPs was about 11.5%, 37.9% and

50.6 % in UTRs, coding regions (including exon, splicing site and start codon) and intronic

Page 16 of 41


Genome

Draft

regions, respectively. Compared to McNally’ report (2009), the intronic regions harbored

more SNPs in our study. They found that approximately 2.7% of the rice genes contained

large-effect SNPs. However, there were 8,205 and 7,950 genes with large-effect SNPs in

RGD-7S and Taifeng B, respectively, which was about 20 % of the total number of annotated

rice genes. RGD-7S and Taifeng B belong to indica rice lines, and the japonica rice line

Nipponbare genome was used as the reference genome, which may be biased towards having

more large-effect SNPs. And the identification of large-effect SNPs also depends on the

annotation of gene models (Zheng et al., 2011). It seems possible that variants with

large-effect SNPs may lie within alternate transcripts that have not yet been described or

within transcripts that have been incorrectly annotated as coding. Premature termination

codons are usually destroyed through the process of nonsense-mediated decay. And it was

found that naturally occurring stop codon-gain variants are generally not expressed unless

they have secondary annotations in or near other transcripts (Cirulli et al., 2011). Cao et al

(2011) found 6,197 genes with 12,468 SNPs that caused large effects on gene structure in

Arabidopsis. Similarly, Tan et al (2012) detected 7,602 genes with large effect SNPs in

Arabidopsis.

As for the substitution of bases, the ratio of transition to transversion (Ts/Tv) in both rice

varieties was 2.5 in this study, which indicates a considerable bias in favor of transitions over

transversions similar to the findings from previous reports. For instance, Subbaiyan et al

(2012) reported a Ts/Tv ratio of 2.0 in six rice elite lines. In the recently completed 3000 rice

Page 17 of 41


Genome

Draft

genome project, a Ts/Tv ratio of 2.3 was found from the total number of 20 million rice SNPs

(Alexandrov et al., 2015). Vignal et al (2002) reported that the proportion of Ts/Tv ranged

from 2.3 to 4.0 in chicken genome.Transitions are more likely to maintain the structure of the

DNA double helix than transversions due to their conformational advantage in case of

mispairing and better tolerance during natural selection, which contributes to a higher

frequency of transitional mutations over transversions (Wakeley, 1996).

Copy number variations

CNVs can create new genes, alter gene dosage and reshape gene structures. They are

considered likely major sources of genetic variation, and may influence phenotypic variation,

gene expression and fitness (Yu et al., 2011). We detected a total of 2727 and 2010 CNVs in

RGD-7S and Taifeng B genomes, respectively. Except for some being overlapped with the

intergenic regions, over half of the CNVs were annotated within genes. Most of these

annotated genes belong to transposable elements, while the rest contained the genes with

alternative splicing sites and the resistance genes with the NBS-LRR structure. Similar to

previous reports, we also found that the CNVs tended to be located near the ends of the

chromosomes. Surprisingly, we detected 581 CNVs in the first 40 kb interval from 0 to 40000

on Chr.9. Further analysis indicated that these CNVs were overlapped with the region

between two adjacent retrotransponsons. The transposable elements were considered to play

an important role in the formation of CNVs (Hastings et al., 2009; Conrad et al., 2010) .Yu et

al (2011) also found genes in many CNVs were involved in resistance in rice, and resistance

Page 18 of 41


Genome

Draft

genes in plants tended to cluster at the same loci within genome (Huibert et al., 2001).

Glessne et al (2013) observed that significant enrichment of alternative splicing genes

impacted by the CNVs in human.

Although the phenotypic effects of CNVs in plants have not been confirmed directly, recent

studies in maize showed its potential contribution to the heterosis of this crop during

domestication and disease responses (Springer et al., 2009; Lai et al., 2010). Gene-containing

CNVs may influence the expression level of these genes. Hence, CNVs have the potential to

affect the downstream phenotype and reproductive fitness ultimately (Perry, 2008). We

previously fine mapped the hybrid weakness gene Hw3 and suggested the gene

LOC_Os11g44310 encoding a putative calmodulin binding protein as its candidate gene, and

BLAST analysis indicated that LOC_Os11g44310 only existed in japonica genome (Fu et al.,

2013). In this study, reads mapping confirmed its presence in RGD-7S and absence in Taifeng

B. In most cases of hybrid weakness or necrosis, the causative genes only existed in one of

the parents. The potential combinations of single-copy genes with CNVs (presence or absence)

in hybrids also provide the opportunity for novel gene complements in hybrids relative to the

parental lines (Springer et al., 2009). The type of hybrid weakness we reported only occurred

under the condition of low temperature. The same F1 progenies showed hybrid vigor when

they were planted under high temperatures conditions. We also analyzed the transcriptional

profiles of both parental varieties and their F1 progenies grown under high and low

temperatures conditions, respectively (data not published), and found that certain pathways

Page 19 of 41


Genome

Draft

were specially activated at low temperatures compared with those at high temperatures.

Therefore, understanding gene expression regulated by the single-copy gene containing CNVs

in F1 progeny under different conditions might contribute to building a bridge between hybrid

weakness and vigor and accelerating the elucidation of the molecular mechanism of heterosis.

InDels as molecular markers for fine mapping hybrid weakness genes

In contrast to SNPs, which have been studied extensively, InDels have received less attention.

Although small InDels up to a few base pair in length may be called by sensitive alignment

tools in the routine re-sequencing process (Grimm et al., 2013), InDels were not evaluated in

validation studies in most cases, and if any, only a small subset was examined (Mullaney et

al., 2010).We detected a quantity of InDels in RGD-7S and Taifeng B, which will lay a solid

foundation for map-based cloning of hybrid weakness genes. Based on sequence results, we

designed 20 and 36 InDel markers in the previous mapping interval of Hw3 and Hw4,

respectively, to test the authenticity of these InDels and develop polymorphic markers for

further map-based cloning of hybrid weakness genes. Except 4 markers failing in PCR

amplification, 37 out of 52 successfully amplified InDel markers showed polymorphism

between RGD-7S and Taifeng B. Approximate 71% of InDels could be validated, which was

similar to a previous study in which false negative rate ranged from 10% to 35% (Mullaney et

al., 2010). Currently, pair-end reads are widely used to detect InDels. However, output files

generated by some relative software mostly produce lists of potential variants with probability

scores and coverage values where the exact level of accuracy is unknown (Pelleymounter et

Page 20 of 41


Genome

Draft

al., 2011). We also found that the authenticity of these InDels was not directly associated with

coverage values. In general, the sensitivity of the detection of InDels of a certain size is not

overly dependent on the InDel frequency itself, and for increasing InDel size, the sensitivities

differ for the different mapping tools in datasets of a mean sequencing depth of 18 and 36 bp

short reads (Krawitz et al., 2010). Hence, the detection and validation of InDels remains a

challenge, and bioinformatic algorithm and analytical tools await improvement to provide

more satisfied solution.

Acknowledgements

This work was supported in part by grants from the Guangdong Provincial Natural Science

Fund (2014A030313573 ), the National ‘863’ project (2011AA 10A 101), the earmarked fund

for Modern Agro-industry Technology Research System (CARS-01-10), and the President

Fund of Guangdong Academy of Agricultural Sciences (201402).

Reference

Alexandrov, N., Tai, S., Wang, W., Mansueto, L., Palis, K., Fuentes, R.R., Ulat, V.J.,

Chebotarov, D., Zhang, G., Li, Z., Mauleon, R., Hamilton, R.S., and McNally, K.L. 2015.

SNP-Seek database of SNPs derived from 3000 rice genomes. Nucleic. Acids Res.

43:1023-1027. doi: 10.1093/nar/gku1039.

Arai-Kichise, Y., Shiwa,Y., Nagasaki, H., Ebana, K., Yoshikawa, H., Yano, M., and Wakasa,

K. 2011. Discovery of genome-wide DNA polymorphisms in a landrace cultivar of Japonica

rice by whole-genome sequencing. Plant Cell Physiol. 52: 274–282. doi: 10.1093/pcp/pcr003.

Page 21 of 41


Genome

Draft

Boeva, V., Popova, T., Bleakley, K., Chiche, P., Cappo, J., Schleiermacher, G.,

Janoueix-Lerosey, I., Delattre, O., Barillot, E. 2012. Control-FREEC: a tool for assessing

copy number and allelic content using next-generation sequencing data. Bioinformatics,

28:423–425. doi: 10.1093/bioinformatics/btr670.

Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C.,

Stegle, O., and Lippert, C. 2011.Whole-genome sequencing of multiple Arabidopsis thaliana

populations. Nat. Genet. 43(10): 956–963. doi: 10.1038/ng.911.

Chen, C., Chen, H., Lin, Y.S., Shen, J.B., Shan, J.X., Qi, P., Shi, M., Zhu, M.Z., Huang, X.H.,

Feng, Q., Han, B., Jiang, L.W., Gao, J.P., and Lin, H.X. 2014. A two-locus interaction causes

interspecific hybrid weakness in rice. Nat. Commun. 5:3357. doi: 10.1038/ncomms4357.

Chen, C., Chen, H., Shan, J.X., Zhu, M.Z., Shi, M., Gao, J.P., and Lin, H.X. 2013. Genetic

and physiological analysis of a novel type of interspecific hybrid weakness in rice. Mol. Plant

6: 716–728. doi: 10.1093/mp/sss146.

Cheng, S.H., Cao, L.Y., Zhuang, J.Y., Chen, S.G., Zhan, X.D., Fan, Y.Y., Zhu, D.F., and Min,

S.K. 2007. Super hybrid rice breeding in China: achievements and prospects. J. Integr. Plant

Biol. 49 (6): 805−810. doi: 10.1111/j.1744-7909.2007.00514.x.

Cirulli, E.T., Heinzen, E.L., Dietrich, F.S., Shianna, K.V., Singh, A., Maia, J.M., Goedert, J.J.,

Goldstein, D.B. 2011. A whole-genome analysis of premature termination codons. Genomics.

98:337–342. doi:10.1016/j.ygeno.2011.07.001.

Conrad, D.F., Pinto, D., Redon, R., Feuk, L., Gokcumen, O., Zhang, Y., Aerts, J., Andrews,

Page 22 of 41


Genome

Draft

T.D., Barnes, C., Campbell, P., and Fitzgerald, T. 2010. Origins and functional impact of

copy number variation in the human genome. Nature. 464:704-712. doi: 10.1038/nature08516.

Feltus, A., Wan, J., Schulze, S.R., Estill, J.C., Jiang, N., and Paterson, A.H. 2004. An SNP

resource for rice genetics and breeding based on subspecies indica and japonica genome

alignments. Genome Res. 14: 1812–1819. doi:10.1007/s00122-011-1633-5.

Fu, C.Y., Wang, F., Sun, B.R., Liu, W.G., Li, J.H., Deng, R.F., Liu, D.L., and Liu, Z.R.

2013.Genetic and cytological analysis of a novel type of low temperature-dependent

intrasubspecific hybrid weakness in rice. PLoS ONE. 8(8): e73886.

doi:10.1371/journal.pone.0073886.

Glessner, J.T., Smith, A.V., Panossian, S., Kim, C.E., Takahashi, N., Thomas, K.A., Wang, F.,

Seidler, K., Harris, T.B., and Launer, L.J. 2013. Copy number variations in alternative

splicing gene networks impact lifespan. PloS ONE. 8(1): e53846. doi:

10.1371/journal.pone.0053846.

Grimm, D., Hagmann, J., Koenig, D., Weigel, D., and Borgward, K. 2013. Accurate indel

prediction using paired-end short reads. BMC Genomics, 14: 132. doi:

10.1186/1471-2164-14-132.

Han, B. and Huang, X.H. 2013. Sequencing-based genome-wide association study in rice.

Curr. Opin. Plant Bio. 16: 133–138. doi: 10.1016/j.pbi.2013.03.006.

Hastings, P.J., Lupski, J.R., Rosenberg, S.M., and Ira, G. 2009. Mechanisms of change in

gene copy number. Nature Rev. Genet. 10: 551-564. doi: 10.1038/nrg2593.

Page 23 of 41


Genome

Draft

Huang, X., Lu, T., and Han, B.. 2013. Resequencing rice genomes: an emerging new era of

rice genomics. Trends Genet. 29: 225–232. doi: 10.1016/j.tig.2012.12.001.

Hulbert, S.H., Webb, C.A., Smith, S.M., and Sun, Q. 2001. Resistance gene complexes:

evolution and utilization. Annu. Rev. Phytopathol. 3(9):285-312. doi:

10.1146/annurev.phyto.39.1.285.

Ichitani, K., Namigoshi, K., Sato, M., Taura, S., Aoki, M., Matsumoto, Y., Saitou, T.,

Marubashi, W., and Kuboyama, T. 2007. Fine mapping and allelic dosage effect of Hwc1, a

complementary hybrid weakness gene in rice. Theor. App. Genet. 114: 1407–1415.doi:

10.1007/s00122-007-0526-0.

Ichitani, K., Taura, S., Tezuka, T., Okiyama, Y., and Kuboyama, T. 2011. Chromosomal

location of HWA1 and HWA2, complementary hybrid weakness genes in rice. Rice. 4(2):

29–38. doi 10.1007/s00122-007-0553-x.

Jain, M., Moharana, K.C., Shankar, R., Kumari, R., and Garg, R. 2014. Genome wide

discovery of DNA polymorphisms in rice cultivars with contrasting drought and salinity stress

response and their functional relevance. Plant Biotechnol. J. 12: 253–264. doi:

10.1093/pcp/pcr003.

Krawitz, P., Rodelsperger, C., Jager, M., Jostins, L., Bauer, S., and Robinson, P.N. 2010.

Microindel detection in short-read sequence data. Bioinformatics. 26(6):722–729. doi:

10.1093/bioinformatics/btq027.

Kuboyama, T., Matsumoto, T., Wu, J.Z., Kanamori, H., Taura, S., Sato, M., Marubashi, W.,

and Ichitani, K. 2009. Fine mapping of HWC2, a complementary hybrid weakness gene, and

Page 24 of 41


Genome

Draft

haplotype analysis around the locus in rice. Rice. 2: 93–103. doi:10.1007/s00122-007-0526-0.

Lai, J., Li, R., Xu, X., Jin, W., Xu, M., Zhao, H., Xiang, Z., Song, W., Ying, K., and Zhang,

M. 2010.Genome-wide patterns of genetic variation among elite maize inbred lines. Nat.

Genet. 42: 1027-1030. doi: 10.1038/ng.684.

McCouch, S.R., Zhao, K., Wright, M., Tung, C.W., Ebana, K., Thomson, M., Reynolds, A.,

Wang, D., DeClerck, G., and Ali, M.L. 2010. Development of genome-wide SNP assays for

rice. Breed. Sci. 60: 524–535. doi.org/10.1270/jsbbs.60.524.

McNally, K.L., Childs, K.L., Bohnert, R., Davidson, R.M., Zhao, K., Ulat, V.J., Zeller, G.,

Clark, R.M., Hoen, D.R., and Bureau, T.E. 2009. Genomewide SNP variation reveals

relationships among landraces and modern varieties of rice. Proc. Natl. Acad. Sci. 106:

12273-12278. doi: 10.1073/pnas.0900992106.

Minoche, A.E., Dohm, J.C., and Himmelbauer, H. 2012. Evaluation of genomic

high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems.

Genome Biol. 12: 112. doi: 10.1186/gb-2011-12-11-r112.

Mullaney, J.M., Mills, R.E., Stephen, P.W, and Devine, S.E. 2010. Small insertions and

deletions (INDELs) in human genomes. Hum. Mol. Genet. 19(2): 131–136. doi:

10.1093/hmg/ddq400.

Muñoz-Amatriaín, M., Eichten, S.R., Wicker, T., Richmond, T.A., Mascher, M., Steuernagel,

B., Scholz, U., Ariyadasa, R., Spannagl, M., and Nussbaumer, T.2013. Distribution,

functional impact, and origin mechanisms of copy number variation in the barley genome.

Genome Biol. 14: R58. doi: 10.1186/gb-2013-14-6-r58.

Page 25 of 41


Genome

Draft

Pelleymounter, L.L., Moon, I., Johnson, J.A., Laederach, A., Halvorsen, M., Eckloff, B., Abo,

R., and Rossetti, S. 2011. A novel application of pattern recognition for accurate SNP and

indel discovery from high-throughput data: Targeted resequencing of the glucocorticoid

receptor co-chaperone FKBP5 in a Caucasian population. Mol. Genet. Metab. 104: 457–469.

doi: 10.1016/j.ymgme.2011.08.019.

Perry, G.H. 2008. The evolutionary significance of copy number variation in the human

genome. Cytogenet. Genome Res. 123: 283–287. doi: 10.1159/000184719.

Quail, M.A., Smith, M., Coupland, P., Otto, T.D., Harris, S.R, Connor, T.R., Bertoni, A.,

Swerdlow, H.P., and Gu, Y. 2012. A tale of three next generation sequencing platforms:

comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC

Genomics. 13: 341. doi: 10.1186/1471-2164-13-341.

Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez, W.A., Gurtowski, J., Biggers, E., Lee, H.,

Kramer, M., Antoniou, E., and Ghiban, E. 2014. New whole genome de novo assemblies of

three divergent strains of rice (O. sativa) documents novel gene space of aus and indica.

Genome Biol. 15: 506. doi:10.1186/s13059-014-0506-z.

Springer, N.M., Ying, K., Fu, Y., Ji, T., Yeh, C.T., Jia, Y., Wu, W., Richmond, T., Kitzman,

J., and Rosenbaum, H. 2009. Maize inbreds exhibit high levels of copy number variation and

presence/absence variation (PAV) in genome content. PLoS Genet. 5(11): e1000734. doi:

10.1371/journal.pgen.1000734.

Subbaiyan, G.K., Waters, D.L., Katiyar, S.K., Sadananda, A.R., Vaddadi, S., Jain, M.,

Page 26 of 41


Genome

Draft

Moharana, K.C., Shankar, R., Kumari, R., and Garg, R. 2012. Genome-wide DNA

polymorphisms in elite indica rice inbreds discovered by whole-genome sequencing. Plant

Biotechnol. J. 10: 623-634. doi: 10.1111/j.1467-7652.2011.00676.x.

Tan, S.J., Zhong, Y., Hou, H., Yang, S.H., and Tian, D.C. 2012. Variation of

presence/absence genes among Arabidopsis populations. BMC Evol. Biol. 12:86. doi:

10.1186/1471-2148-12-86.

Vignal, A., Milan, D., Cristobal, M.S., and Eggen, A. 2002. A review on SNP and other types

of molecular markers and their use in animal genetics. Genet. Sel. Evol. 34: 275–305.

doi:10.1186/1297-9686-34-3-275.

Wakeley, J. 1996. The excess of transitions among nucleotide substitutions: new methods of

estimating transition bias underscore its significance. Tree. 11: 158–162.

doi:10.1016/0169-5347(96)10009-4.

Yamamoto, T., Nagasaki, H., Yonemaru, J.I., Ebana, K., Nakajima, M., Shibaya, T., and

Yano, M. 2010. Fine definition of the pedigree haplotypes of closely related rice cultivars by

means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics. 11:

267. doi: 10.1186/1471-2164-11-267.

Yu, P., Wang, C.H., Xu, Q., Feng, Y., Yuan, X.P., Yu, H.Y., Wang, Y.P., Tang, S.X., and

Wei, X.H. 2011. Detection of copy number variations in rice using array-based comparative

genomic hybridization. BMC Genomics. 12: 372. doi: 10.1186/1471-2164-12-372.

Page 27 of 41


Genome

Draft

Zheng, L.Y., Guo, X.S., He, B., Sun, L.J., Peng, Y., Dong, S.S., Liu, T.F. 2011.

Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor).

Genome Biology. 12: R114. doi:10.1186/gb-2011-12-11-r114.

Page 28 of 41


Genome

Draft

Table 1 Coverage of the reads from resequencing of two indica rice varieties using Hiseq

2000 to the Nipponbare genome

RGD-7S Taifeng B

Site(Ref.Length) 374,305,986 374,305,986

Total bases 14,973,227,780 15,457,502,782

Read count 148,249,780 153,044,582

Mapped Site(＞=1X) 320,035,665 324,501,839

Coverage (﹥=1X) 85.50% 86.69%

Q30 (%) 91.61 89.92

Mapped bases rate 73.34% 73.04%

GC (%) 41.51 41.48

MeanDepth 27.83 X 28.72 X

Table 2 The distribution of SNPs and InDels in RGD-7S and Taifeng B relative to

Nipponbare

Chr.

RGD-7S Taifeng B

No. of No. of No. of variant No. of No. of No. of variant

SNPs insertions deletions Frequency SNPs insertions deletions frequency

1 153152 24829 20733 459.8 227939 25220 21251 634.2

2 173186 20163 17455 586.6 195996 20134 17437 649.9

3 190535 20942 17467 628.7 194974 21103 17530 641.5

4 143323 13741 12053 476.3 147269 14052 12186 488.7

5 118289 12783 10719 473.3 126019 13348 11578 503.9

6 152214 14149 12369 572 160527 15174 12812 603.3

7 141106 13016 11733 558.5 138495 12655 11391 547.3

8 138697 13580 11469 575.7 135126 13194 11278 561.1

9 116863 10980 9674 597.6 83191 8635 7468 431.5

10 133108 11830 10536 675.2 133316 12006 10741 677.7

11 158935 13733 12278 637.3 154184 13348 11778 617.9

12 125148 11429 10002 532.4 124608 11364 9670 529

Totle 1744556 181175 156488 564.45 1821644 180233 155120 573.83

Note: The variant frequency is the number of DNA polymorphisms per 100 kb.

Page 29 of 41


Genome

Draft

Table 3 The distribution and frequency of DNA polymorphism between RGD-7S and Taifeng

B

SNP variant

InDel variant

frequency frequency

Chr.1 103424 239 5922 13.7

Chr.2 94899 264.1 5068 14.1

Chr.3 73419 201.6 4498 12.4

Chr.4 69714 196.4 3189 9

Chr.5 74272 248 3577 11.3

Chr.6 84362 270 3963 12.7

Chr.7 75459 254.1 3523 11.9

Chr.8 68249 240 3313 11.6

Chr.9 62397 271.1 3178 13.8

Chr.10 59561 256.6 2823 12.2

Chr.11 84408 290.9 3751 12.9

Chr.12 111627 405.4 3835 13.9

Total 961791 257.7 46640 12.5

Table 4 The number and ratio of variants in different regions

RGD-7S Taifeng B

Count Percent

(%) Count

Percent

(%)

DOWNSTREAM 2,466,715 30.44 2,492,284 30.43

EXON 408,077 5.04 412,706 5.04

INTERGENIC 1,846,976 22.79 1,868,258 22.81

INTRON 573,136 7.07 577,108 7.05

SPLICE SITE 19,295 0.24 19,455 0.24

TRANSCRIPT 851 0.01 832 0.01

START CODON 2,658,060 32.80 2,689,545 32.84

UTR_3_PRIME 78,674 0.97 78,496 0.96

UTR_5_PRIME 51,708 0.64 51,740 0.63

Page 30 of 41


Genome

Draft

Table 5 The distribution of large-effect SNPs in genic regions

RGD-7S Taifeng B

No. No.of annonated genes No. No.of annonated genes

Splicing site* 909 863 847 804

Stop gain 6141 4496 5987 4358

Stop loss 1135 1109 1132 1118

Start gain 1430 1156 1383 1125

Start loss 598 581 558 545

Total 10213 8205 9907 7950

*:including splicing site acceptor and donor

Table 6 The distribution of CNVs in RGD-7S and Taifeng B

Chr. 1 2 3 4 5 6 7 8 9 10 11 12 Total

RGD-7S 217 153 129 243 159 167 153 352 819 228 49 58 2727

Taifeng B 130 116 126 238 214 149 230 133 255 236 29 154 2010

Table 7 The number and ratio of base substitutions

Substitution RGD-7S Taifeng B

Transitions (Ts) A/G 862380 870351

C/T 865241 874671

Transversion (Tv)

C/G 129727 132246

T/A 188201 191670

A/C 182417 184581

G/T 180721 183445

Ts/Tv ratio 2.5366 2.5219

Page 31 of 41


Genome

Draft

Figure captions:

Figure 1 The distribution of different InDel sizes in RGD-7S and Taifeng B

Figure 2 The ratio of DNA variants in different regions of genes in RGD-7S and Taifeng B

Figure 3 The distribution of GO-term enriched of genes containing DNA variations in

RGD-7S and Taifeng B

Figure 4 The distribution of enrichment of variant genes in RGD-7S and Taifeng B

Blue: variant genes in a functional term; Red: the total of background genes in a functional

term. X-axis represents the percentage of variant genes in the total genes enriched in a given

term. It represents significant enrichment of variant genes in a given term that Blue bands are

longer than red bands.

Figure 5 The reads mapping from RGD-7S and Taieng B in the interval of the candidate gene

Hw3 (LOC_Os11g44310)

Approximate 20 reads from Taifeng B were mapped discontinuously in the interval of

LOC_Os11g44310 based on the reference genome, and the reads from RGD-7S completely

covered in the candidate region.

Page 32 of 41


Genome

Draft

209x148mm (300 x 300 DPI)

Page 33 of 41


Genome

Draft

68x51mm (600 x 600 DPI)

Page 34 of 41


Genome

Draft

30x20mm (600 x 600 DPI)

Page 35 of 41


Genome

Draft

243x188mm (300 x 300 DPI)

Page 36 of 41


Genome

Draft

238x131mm (300 x 300 DPI)

Page 37 of 41


Genome

Draft

Supplement table 1 The specific variant genes in RGD-7S and Taifeng B

RGD-7S Taifeng B

LOC_Os01g25130 LOC_Os07g03319 LOC_Os01g01019 LOC_Os04g16794 LOC_Os07g02820 LOC_Os11g10610























Page 38 of 41


Genome

Draft


LOC_Os05g18860 LOC_Os12g11250 LOC_Os03g17330 LOC_Os06g45710 LOC_Os08g40140







Page 39 of 41


Genome

Draft

Supplement table 2 The InDel primers designed in the surround region of Hw3

Forwards primers Reverse primers Size(bp) Coverage

InDel1101 5' TAGGGGCGACTAAATGAAGG 3' 5' TTTTGTTGCTCGTTCACCTG 3 354 17 negative

InDel1102 5' CATTGTTCCACCATCGTCCT 3' 5' ATCAATGGGACAACACTCGC 3' 108 8 positive

InDel1103 5' ATCGCCACCTTGCCGAGTTC 3' 5' AGGGAGGATGTTAGGACCAGC 3' 327 2 positive

InDel1104 5' CCATAATCTCCACAATCGGC 3' 5' AGAGGACGAAGAAGAACGCC 3' 161 5 negative

InDel1105 5' TCTTCGCCACCTTCTCCCAA 3' 5' TACAATGTGTCGTTGCCTGC 3' 148 9 positive

InDel1106 5' ATCTTCGTCTTCCACCTCGC 3' 5' GCCAACTGGGAAACTGTGTC 3' 147 6 positive

InDel1107 5' CAAGAAAGCCACTCCCTGC 3' 5' CGGCTCTTGACCTTGCTGA 3' 272 10 positive

InDel1108 5' TTCGTGACGCTGGAGGTT 3' 5' ACTTGAGGAAATCGTGTGTT 3' 251 7 failure

InDel1109 5' GAACTTATGCGGAGGAGACG 3' 5' CGAGCACATCTCTCTCACATA 3' 162 34 negative

InDel1110 5' GCGATGTCGGTATCATTGCT 3' 5' GGGTGTTGGTCCTCATTGTA 3' 117 8 positive

InDel1111 5' TCACATCACGGAGCAGGAG 3' 5' AAACGGAGGAGAGGCAAGA 3' 141 3 positive

InDel1112 5' AGAGAGAGGGGGAAGAGAGA 3' 5' GGGAATGTTTCAGGTGCGAC 3' 202 14 positive

InDel1113 5' CCTGAAGTTCCCGAAGATTT 3' 5' ACGAGCATTGGAACTCAGAT 3' 138 21 positive

InDel1114 5' CGTGCGAAGTCAGAGGAGTC 3' 5' ACACACCCTAAGCCATCCAA 3' 224 3 positive

InDel1115 5' CTCAGCCAATAGCATCTCCG 3' 5' CAGGAGCAGCAAGAGAAACG 3' 322 10 negative

InDel1116 5' TAATAGTCGTGTCCGAGATGG 3' 5' CAACCCGAAAGCAAGAACT 3' 156 21 negative

InDel1117 5' TCCATTGAAGCCAACATCG 3' 5' TGGGCGAAGTCGTAAGAACAT 3' 313 6 positive

InDel1118 5' CTCCATTGAAGCCAACATCG 3' 5' TGAAGGGAGTTCCTATTGACC 3' 202 18 negative

InDel1119 5' TGATTTGACAATGTGGTGCTAC 3' 5' TTTGGGACGGATGGAGTAAG 3' 328 11 positive

InDel1120 5' TCCTTGTGGTGGTGCCTCA 3' 5' GGTCTCGTCATCTGCTTCAA 3' 429 31 negative

Page 40 of 41


Genome

Draft

Supplement table 3 The InDel primers designed in the interval of Hw4

Forwards primers Reverse primers Size(bp) Coverage

InDel701 5' ATCAGTTTCGCCTATGGACG 3' 5' TCACCGTTCTAACTCCCAGC 3' 113 16 positive

InDel702 5' ATGGCAGATGTAAGCAAGCG 3' 5' CCTTTTTGCTTATGTGGAAC 3' 195 9 positive

InDel703 5' GACGGTAACAAAGTCGTTCAGA 3' 5' ATCAAACAGCGTGTCAAACT 3' 254 23 negative

InDel704 5' CCTCCGTTTTTTCATTCCTG 3' 5' TGAAACGGAGGAGAGTAACAAA 3' 223 14 positive

InDel705 5' GAGGAGGAGGATGAGGGCTA 3' 5' TTCGCTTCTGCCTTTATTCG 3' 370 11 positive

InDel706 5' TTTATGATGGAGGGAGTGTCTG 3' 5' GCCTTCAATCCTCACCAATC 3' 356 19 positive

InDel707 5' GGGAACAATGGTGTGTGCTTT 3' 5' CACAAGATGCGGCGAAGTTT 3' 464 9 positive

InDel708 5' AAATGCTCTGACTTGGGGAC 3' 5' CACGAATGTTTGGTTTACGC 3' 152 6 positive

InDel709 5' TTCTCCGCTCCTTCCGTT 3' 5' TAGGGTTTTAGTGAGGTGGG 3' 109 7 positive

InDel710 5' TGGAACAATGCCACTGCC 3' 5' TGATGACCGCCAGCAAGT 3' 248 13 failure

InDel711 5' TTTCTTCTCCGACCCACCAC 3' 5' GCCACCCCTTTTAGTATTGCT 3' 141 11 positive

InDel712 5' TAAACATAACCCAACAGCCG 3' 5' TTTCACCAGTAATCAGCCGT 3' 245 10 positive

InDel713 5' GGGGGAGAGAGAGAACGAAT 3' 5' CACAGGGGGATGAGGATGAC 3' 249 8 positive

InDel714 5' AGCCCTCCCCCACAAATAAT 3' 5' AAATAGGACAGGCAGCAGCG 3' 309 4 positive

InDel715 5' AGTAACCCGAACCCTTAGCAT 3' 5' AAAAAAAGGTCCTCCGTCCC 3' 256 5 positive

InDel716 5' TGAGTTCTGACGCAATGGGC 3' 5' GCTTGCTCTGACCCTTCCAT 3' 172 32 positive

InDel717 5' GTGCGAAAAACCACGGCTG 3' 5' CTGCGTGTCAAAAGCGGAAT 3' 273 9 positive

InDel718 5' GATGAGTTGTTTGGTTTGTTGC 3' 5' GATTGAGACCGTCGTAAAGATG 3' 339 11 positive

InDel719 5' CGTGAACGAACGAAGAATGAT 3' 5' TGAGGGACGACGACGACAA 3' 425 8 failure

InDel720 5' TCTGTTTTCTTGGCGGTGGT 3' 5' GTTCTGGATTGGGGTTTTCA 3' 262 8 failure

InDel721 5' CCATCGCAGTAGGGTCAGTG 3' 5' GGCGTGACGATAGGTGTGAC 3' 218 8 positive

InDel722 5' AAACGGACTGTAAACTTCGGT 3' 5' AGGTTCTTTGTAGCCTTCTCA 3' 184 5 positive

InDel723 5' CGTTCATCCATCCGTCTTATTT 3' 5' ATGCTATTTGTTGGTGACTCGT 3' 238 6 negative

InDel724 5' TGTCAGTGGGCGAGTGTGTT 3' 5' TGTTTATGGTCGTTGATTGCG 3' 244 5 negative

InDel725 5' GCACAGATTACTCCTCCCGA 3' 5' AATCATCTTGCGGTTGTTGT 3' 435 10 negative

InDel726 5' GCCAACCAAAGCAAATAGAGG 3' 5' CCCCTGATGGATACGAAACTC 3' 395 5 positive

InDel727 5' GGCGACGCTTTTTGAACTAC 3' 5' TGAGATTAGCAAGTTTTTCCCC 3' 411 19 positive

Page 41 of 41


Genome

Draft

InDel728 5' CTGGCACAATGGTGGCAACT 3' 5' CTGGGACTGTGAACACTGGG 3' 198 29 negative

InDel729 5' TTGGATGGTGGTCAATGGT 3' 5' TCCGAAAGGTGATGATAACA 3' 188 20 positive

InDel730 5' CCTGCTTGGTGTGTGCTCTT 3' 5' CTGCTGGTGTCTGAAACTCTG 3' 389 16 positive

InDel731 5' GCGAACTCACCGATGACCAA 3' 5' ATCTTTCCTTCTTGTCGCCG 3' 385 14 negative

InDel732 5' CGTGTCCTTCCTCATCTTGC 3' 5' ACGCTCGTAATGGCTTTGTT 3' 226 3 negative

InDel733 5' TGGCACGACAGAGTAGAGAATG 3' 5' GCTTATGATGCGTTATGCCTG 3' 174 9 positive

InDel734 5' GCTCGGAGAAACGCTCAGAT 3' 5' ACCACGGCACCATTTACCCT 3' 205 8 negative

InDel735 5' AGAAAAGTAATGCGGTGCCA 3' 5' GCAATAGCAATGCGTGTGGT 3' 107 9 positive

InDel736 5' TCTCCTGTCCTCATCTCCATT 3' 5' ACAGGGTTCATACCAGCAGG 3' 147 14 positive

Page 42 of 41


Genome

Download - Draft - University of Toronto T-Space · Draft Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B, revealed by whole genome re-sequencing Chong-Yun Fu

Top Related