supplementary materials for · 2017. 1. 25. · we used soap2 to map all the sequencing reads from...

www.sciencemag.org/content/355/6323/391/suppl/DC1

Supplementary Materials for

A chemical genetic roadmap to improved tomato flavor

Denise Tieman, Guangtao Zhu, Marcio F. R. Resende Jr., Tao Lin, Cuong Nguyen, Dawn Bies, Jose Luis Rambla, Kristty Stephanie Ortiz Beltran, Mark Taylor, Bo Zhang, Hiroki Ikeda,

Zhongyuan Liu, Josef Fisher, Itay Zemach, Antonio Monforte, Dani Zamir, Antonio Granell, Matias Kirst, Sanwen Huang,* Harry Klee*

*Corresponding author. Email: [email protected] (S.H.); [email protected] (H.K.)

Published 27 January 2017, Science 355, 391 (2017)

DOI: 10.1126/science.aal1556

This PDF file includes:

Materials and Methods Figs. S1 to S25 References

Other Supplementary Materials for this manuscript includes the following: (available at www.sciencemag.org/content/355/6323/391/suppl/DC1)

Tables S1 to S8 (Excel)

Supplementary Materials

Plant Material. The 398 tomato accessions used in the Florida study were collected from TGRC (Tomato Genetics Resource Center), EU-SOL (European Union Solanaceae Project), AGIS-CAAS (Agricultural genomics institute at Shenzhen, Chinese Academy of Agricultural Science), U.S. National Plant Germplasm System and the University of Florida. These accessions include 15 S. pimpinellifolium and 83 S. lycopersicum var. cerasiforme and 300 S. lycopersicum tomato varieties (Table S1). Plants were grown in heated greenhouses on the University of Florida campus or in a field in Live Oak FL using recommended commercial practices. All fruits were harvested at full-red ripe stage.

The Israeli set consisted of 352 varieties (Table S6), 261 of which overlapped with the Florida set. In Israel, one month old seedlings from each of the genotypes were transplanted to the greenhouse in Hatzav (Israel) on October 10th 2014. The soil in the greenhouse was sandy and irrigated daily using a drip system according to recommendations for tomato growers in the area. For each of the varieties two to six plants were grown with a distance of 40 cm between plants. Red ripe fruits were harvested in mid-January 2015. A section of the fruit was excised, flash frozen in liquid nitrogen, ground by means of a cryogenic mill and stored at -80ºC until analysis. Each sample consisted on a mixture of at least five fruits.

Transgenes. Artificial genes encoding both the reference and alternate versions of the Lin5 invertase gene were designed for overexpression in tomato (Figure S24). Both versions of the gene were cloned into a plant transformation vector under control of the figwort mosaic virus promoter and transformed into tomato plants by Agrobacterium-mediated transformation using kanamycin as a selectable marker. Expression of the transgene was confirmed by quantitative real-time RT-PCR. The homozygous E8 transgenic line in an Ailsa Craig background was previously described (14) and provided to us by Jim Giovannoni. The presence of the transgene was validated by PCR assay for the NPTII gene.

Analysis of volatiles, sugars and acids: Florida population. Plants from each variety were grown in the field or greenhouse in three randomized replicates. Fruit were obtained from three weekly harvests at the red ripe stage. At least six fruit (two fruit from each replicate) from each variety were used for biochemical analysis. E8 antisense (14), Ailsa Craig, invertase Lin5 overexpressing plants and control FLA 8059 plants were grown in a greenhouse in a randomized plot design. Fruit from three plants was combined for each sample collection. Volatile collection was performed as described previously [16]. Volatile compound identification was determined by gas chromatography-mass spectrometry and co-elution with known standards (Sigma-Aldrich, St. Louis MO). Sugars, acids, and soluble solids were determined as described in (17).

Analysis of volatiles: Israel population. Volatile compounds were captured by means of headspace solid phase microextraction (HS-SPME) and separated and detected by means of gas chromatography coupled to mass spectrometry (GC/MS). Samples were processed similarly as described in Rambla et al. (18). Identification of compounds was performed by the comparison of both retention time and mass spectrum with those of pure standards.

Consumer panels. Consumer panels were performed essentially as previously described (2) between 2010 and 2016. A subset of these varieties were previously analyzed using multivariate analysis (2). All consumer panels were approved by the University of Florida Institutional Review Board. Fully ripe fruit were harvested and used for taste panels with a random subset of fruits were used for biochemical analysis as described above. A total of 160 samples representing 96 different tomato varieties were used in the analysis. Hedonic ratings used a hedonic general labeled magnitude scale (gLMS)(19). Statistical analysis was performed using JMP Pro 12 (SAS Institute, Cary NC).

2

Linear least squares regression analyses of each individual chemical was performed to model the relationship between the dependent variable, overall liking or overall flavor intensity, and the explanatory variable representing the metabolite level for each component. In addition a two-tailed p-value was determined. If the p-value was lower than 0.05, the relationship between the chemical and overall liking or overall flavor intensity was considered significant.

Heritability. A statistical analysis for each metabolite in the Florida population was performed to partition the heritable genetic component to the design and residual terms. Univariate mixed models were performed using the software ASREML (20).

The following model was adjusted:

log 𝑦 ! = 𝜇 + 𝑋!𝑠 + 𝑋!𝑑 + 𝑋!𝑔 + 𝜖

where y corresponds to the level of the i-th metabolite, X corresponds to the incidence matricesrelating the metabolite observations observations to the random effects of site (s), date (d), andvariety (g). Each random effect assumed similar assumptions: 𝑠~𝑁 0, 𝐼𝜎!! ; 𝑑~𝑁(0, 𝐼𝜎!!) ;𝑑~𝑁(0, 𝐼𝜎!!). Heritabilities for each of the metabolic compounds were estimated (Table S8) asfollows:

ℎ!! =𝜎!!

𝜎!! + 𝜎!! + 𝜎!! + 𝜎!!

Whole genome re-sequencing, sequence alignment and SNP identification. The 476 accessions used in this study (398 + 78) were characterized by whole genome re-sequencing. Among them, 245 had been previously genotyped and deposited in the NCBI Sequenced Archive (SRA) under accession SRP045767 and the European Nucleotide Archive under accession PRJEB5235. The other 231 accessions were newly genotyped in this study. All data have been placed in the National Center for Biotechnology Information BioProject site under the accession PRJNA353161. DNA was isolated from young leaves using a CTAB method and sequencing libraries with insert sizes of approximately 500 bp were constructed following Illumina recommendations. The samples were sequenced on an Illumina HiSeq 2000 platform with paired-end 100 bp and 125 bp reads. For each sample, an average of 6.5 Gb of data was generated after removal of adapter sequences and low-quality reads.

We used SOAP2 to map all the sequencing reads from each accession to the tomato reference genome (Version SL2.50) with the following parameters: -m 100, -x 888, -s 35, -l 32, -v 3 (21,22). Mapped reads were filtered to remove PCR duplicates. Both paired-end and single-end mapped reads were then used for SNP calling throughout the entire collection of tomato accessions using SOAPsnp with the following parameters: -L 100 -u -F 1(23). We generated the genotype likelihood across the population for each SNP with quality >= 40 and base quality >= 40. False positive SNPs were filtered in the population following method previously described by Lin (7). The identified SNPs were further categorized as variations in intergenic regions, UTRs, coding sequences and introns according to the tomato genome annotation (release ITAG2.4). SNPs in coding sequences were further classified into synonymous SNPs (not causing amino acid changes) and nonsynonymous SNPs (causing amino acid changes) using Python scripts.

Population Structure. The population structure was estimated based on the discriminant analysis of principal components (DAPC) to cluster genetically similar individuals using a pruned SNP set. This method relies on partitioning the variance within and among groups without any assumptions on Hardy-Weinberg equilibrium or linkage disequilibrium (24). The SNP set was selected by removing SNPs with MAF lower than 5% and missing data above 10%. In addition, this subset was filtered based on a criterion

3

of linkage disequilibrium bellow 0.1 in a genomic window of 1Mb. This pruned set was generated using the R/Bioconductor package SNPRelate (25) and resulted in a total of 5743 SNPs. Principal component analysis and DAPC were performed using R package adegenet (26). The optimum number of clusters was evaluated by Bayesian Information Criteria (BIC) and the cluster number of 5 was selected as the lowest number which the BIC increases or decreases by a negligible amount (Figure S25). All discriminant functions were retained in the analysis, and the first 20 principal components were retained based on the number that achieved the lowest mean squared error on a cross validation run 100 times. Samples with membership probability of less than 80% were not considered for the downstream analysis. The DAPC analysis revealed five clusters defined as: (1) modern (48 members); (2) transitional (46 members); (3) S. lycopersicum var. cerasiforme (27 members); (4) S. pimpinellifolium (27 members); (5) heirloom varieties (236 members). Membership of each cluster is provided in Table S1. The modern clustercontained all of the large fruited modern commercial varieties and inbreds. The transitional cluster contained some varieties generally classified as heirlooms as well as the old commercial varieties Ailsa Craig and Moneymaker.

Genome-wide association analysis. A total of 2,014,488 SNPs (MAF >5% and Missing rate <10%) in the 398 accessions were used to perform the genome-wide association analysis. The efficient Mixed-Model association expedited (EMMAX) was used to conduct all the analyses (27). The matrix of pairwise genetic distances was used as the variance-covariance matrix for random effect, and the first ten principal components were included as fixed effects.

The genome-wide significance thresholds of all the traits were used by a uniform threshold (P =1/n, n is the effective number of independent SNPs). The effective number of independent SNPs was calculated using Genetic type 1 Error Calculator (GEC) software (28). The significant P value threshold of 398 member population was P=4.0 x 10-7.

The Haploview software was used to calculate linkage disequilibrium (LD) with the following parameters: -maxdistance 2000 -minMAF 0.05 -hwcutoff 0 (29). To access the linkage disequlibrium landscape of the different genome regions, the average linkage decay for each 0.5 Mb region of the whole genome was calculated. Pairwise LD between the significant SNPs for each trait were evaluated, selecting the leading SNPs as one signal if they had strong linkage disequilibrium (R2 >0.8) in a 0.5 Mb window.

Genotype and QTLs mapping for F2 population. In addition to the 398 individuals sequenced for the GWAS analysis, a linkage mapping population consisting of 235 F2 individuals from a cross between TS-532 (S. lycopersicum var. cerasiforme) and TS-640 (S. lycopersicum) was also genotyped using Restriction site-associated DNA sequencing (RAD-Seq) (30). In summary, each sample was digested with EcoRI enzyme followed by ligation of barcoded adapters. An average of 0.4 Gb data of data post-quality filtering was generated for each individual. The short reads were aligned against the Heinz reference genome using the Burrows-Wheeler Aligner (BWA), and SNPs were identified using SAMtools (31, 32). A total of 212,024 high quality homozygous SNPs were identified between two parental genomes and defined as a (TS-532) and b (TS-640) genotype. To identify these segregation loci, the genotype of each individual was assigned to a, b or h. Individuals with less than 1000 marker calls were dropped due to high missing data content, which resulted in a total of 197 samples. To impute missing genotypes of each individual, we evaluated the similarity of SNPs between individuals and both parents in a 1 Mb bin. If the evaluated region had similarity to one of the parents higher than 80%, the missing SNPs within this bin was rescaled according to the correspondent parent. Otherwise, the genotype was rescaled into heterozygous regions. Loci associated with difference in biochemical levels were identified using the R/qtl program (33). QTLs were identified by simple interval mapping using a normal model with the EM algorithm (34,35). Genome wide LOD significance thresholds were calculated by permutation test (200 repetitions) with the significance set to p=0.05. Loci with a LOD score greater than 3.0 were considered significant.

4

Figure S1. Population structure based on the discriminant analysis of principal components.

5

Figure S2. Compositional differences of chemicals significantly correlated with consumer liking and overall flavor intensity in modern cultivars. All differences are expressed as percent decrease of modern varieties relative to heirloom S. lycopersicum varieties. Significant differences are indicated by * (p < 0.05).

-60 -40 -20 0 20 40 60 80

benzylcyanide1-nitro-2-phenylethanecitrateglucosesolublesolidsfructoseE-2-pentenalmalate1-penten-3-one1-nitro-3-methylbutane6-methyl-5-hepten-2-one*E-2-heptenal*phenylacetaldehydeisovaleraldehyde*guaiacolE,E-2,4-decadienal*isobutylacetateisovalericacid*β-ionone*1-octen-3-one*2-isobutylthiazole*E-2-hexenal*3-methyl-1-butanol*2-phenylethanol2-methyl-1-butanol*isovaleronitrile*methional*

Percentdecrease

6

Supplementary Figure 3. Frequency distribution of 35 traits in the GWAS population. The values for 6-methyl-5-hepten-2-one, geranylacetone and guaiacol were collected in Florida and Israel.

soluble solids

Freq

uenc

y

4 6 8 10 12

040

8012

0

glucose5 10 15 20 25 30 35 40

020

60

fructose10 20 30 40

040

80

citric acid0 2 4 6 8 10 12 14

010

30

malic acid0 1 2 3 4 5

040

8012

0

1−nitro−2−phenylethane0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

050

150

1−octen−3−one0.00 0.10 0.20 0.30

040

8012

0

1−penten−3−one0 5 10 15

050

100

150

2−isobutylthiazole0 10 20 30 40 50

020

4060

80

2−methyl−1−butanol0 20 40 60 80 100

040

80

2−methylbuteraldehyde0 10 20 30 40

050

100

2−phenyl ethanol0.0 0.5 1.0 1.5 2.0

010

025

0

3−methyl−1−butanol0 50 100 150 200 250

040

80

6−methyl−5−hepten−2−one0 5 10 15

040

80

b−ionone0.0 0.1 0.2 0.3 0.4 0.5 0.6

010

020

0

Z−3−hexen−1−ol0 50 100 150

020

4060

80

Z−3−hexenal0 100 200 300 400

020

60

geranylacetone0 5 10 15 20

010

020

0

hexanal0 100 200 300 400 500

020

4060

hexyl alcohol0 10 20 30 40 50 60

040

8012

0

isobutyl acetate0 5 10 15 20 25 30 35

010

020

0

isovaleraldehyde0 20 40 60 80 100

040

8012

0

isovaleric acid0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

040

8012

0

isovaleronitrile0 20 40 60 80

050

100

150

methional0.0 0.1 0.2 0.3 0.4 0.5 0.6

010

020

0

methylsalicylate0 5 10 15 20 25 30

010

025

0

phenylacetaldehyde0.0 0.5 1.0 1.5 2.0

050

100

E,E−2,4−decadienal0.00 0.05 0.10 0.15 0.20 0.25

050

100

E−2−heptenal0.0 0.5 1.0 1.5 2.0 2.5

020

4060

80

E−2−hexenal0 10 20 30 40 50

040

8012

0

Freq

uenc

yFr

eque

ncy

Freq

uenc

yFr

eque

ncy

Freq

uenc

yFr

eque

ncy

Freq

uenc

y

4080

100

4020

150

100

−6 −4 −2 0 2

010

2030

4050

−8 −6 −4 −2 0 2

010

2030

4050

60

E−2−pentenal0 2 4 6 8 10

040

80

−6 −4 −2 0 2

020

4060

80

6−methyl−5−hepten−2−one

guaiacol0 1 2 3 4 5 6 7

010

020

0

geranylacetone

Freq

uenc

y

guaiacol

7

a b

Supplementary Figure 4. Genome-wide association analysis of soluble solid content (SSC). (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile–quantile plot for the GWAS under MLM. The horizontal axis shows -log10 transformed expected P value, while the vertical axis indicates -log10 transformed observed P value.

a b

Supplementary Figure 5. Genome-wide association analysis of glucose. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile–quantile plot for the GWAS under MLM.

a b

Supplementary Figure 6. Genome-wide association analysis of fructose. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

8

a b

Supplementary Figure 7. Genome-wide association analysis of malic acid. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM. a b

Supplementary Figure 8. Genome-wide association analysis of citric acid. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM. a b

Supplementary Figure 9. Genome-wide association analysis of 1-nitro-2-phenylethane. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

9

a b

Supplementary Figure 10. Genome-wide association analysis of 2-isobutylthiazole. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM. a b

Supplementary Figure 11. Genome-wide association analysis of 2-methyl-1-butanol. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM. a b

Supplementary Figure 12. Genome-wide association analysis of 2-phenylethanol. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

10

a b

Supplementary Figure 13. Genome-wide association analysis of 3-methyl-butanol. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile–quantile plot for the GWAS under MLM. a b

c d

Supplementary Figure 14. Genome-wide association analysis of 6-methyl-5-hepten-2-one in two environments. (a) Manhattan plot for this trait collected in Florida, USA. (b) Quantile–quantile plot for this trait collected in Florida, USA. (c) Manhattan plot for this trait collected in Israel. (d) Quantile–quantile plot for this trait collected Israel.

11

a b

Supplementary Figure 15. Genome-wide association analysis of cis-3-hexen-1-ol. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM. a b

c d

Supplementary Figure 16. Genome-wide association analysis of geranyl acetone in two environments. (a) Manhattan plots for traits collected in Florida, USA. (b) Quantile–quantile plot for traits collected in Florida, USA. (c) Manhattan plots for trait collected in Israel. (d) Quantile–quantile plot for this traits collected in Israel.

12

a b

c d

Supplementary Figure 17. Genome-wide association analysis of guaiacol in two environments. (a) Manhattan plot for traits collected in Florida, USA. (b) Quantile–quantile plot for trait collected in Florida, USA. (c) Manhattan plot for trait collected in Israel. (d) Quantile–quantile plot for trait collected in Israel. a b

Supplementary Figure 18. Genome-wide association analysis of hexanal. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM. a b

13

Supplementary Figure 19. Genome-wide association analysis of isobutyl acetate. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

a b

Supplementary Figure 20. Genome-wide association analysis of methylsalicylate. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

a b

Supplementary Figure 21. Genome-wide association analysis of phenylacetaldehyde. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

14

a b

Supplementary Figure 22. Genome-wide association analysis of E-2-pentenal. (a) Manhattan plot for GWAS on chromosomes 0-12. (b) Quantile-quantile plot for the GWAS under MLM.

15

2-isobutylthiazole 2-methyl-1-butanol

2-methylbutyraldehyde3-methyl-1-butanol

16

isobutyl acetate

isovaleric acid isovaleronitrile

isovaleraldehyde

17

1-nitro-2-phenylethane benzyl cyanide

guaiacol methyl salicylate

18

Z-3-hexen-1-ol

hexyl alcohol

hexanal

E-2-hexenal

19

E2-heptenal

E-2-pentenal 1-octen-3-one

methional

20

β-iononegeranylacetone

1-nitro-3-methylbutane fruit weight

21

fructose glucose

glucose + fructosesoluble solids

22

citric acid malic acid

Figure S23. LOD plots of traits in F2 population derived from a cross between FLA 8059 and Maglia Rosa Cherry.

23

>optimized Lin5 invertase ATGGAACTCTTCATGAAGAATTCTTCATTATGGGGCTTAAAGTTTTACCTCTTCTGCCTCTTCATCATCCTTTCAAATATCAACAGGGCTTTCGCTTCACACAATATTTTCTTAGATCTTCAATCAAGCTCAGCTATCTCTGTTAAGAATGTCCACAGAACTAGGTTCCATTTCCAACCCCCTAAGCACTGGATCAACGATCCAAATGCACCTATGTACTACAATGGCGTTTATCACCTCTTCTATCAGTACAACCCTAAAGGAAGCGTTTGGGGTAACATCATCTGGGCACACTCTGTTAGTAAAGATTTGATTAACTGGATCCACTTAGAGCCCGCTATTTACCCTTCTAAAAAATTTGATAAATATGGCACATGGTCTGGGTCTTCTACCATACTCCCAAATAATAAGCCAGTCATTATATATACTGGAGTTGTTGATTCATACAACAATCAGGTGCAAAACTATGCAATTCCAGCAAATCTTTCTGATCCTTTCCTTAGGAAGTGGATAAAGCCTAATAATAATCCTTTAATCGTTCCTGATAACTCAATTAATCGTACTGAATTCAGGGATCCAACAACTGCATGGATGGGACAAGACGGACTTTGGCGTATATTGATTGCTAGTATGAGGAAGCATAGGGGTATGGCTTTGCTTTATAGGAGTCGAGACTTTATGAAGTGGATTAAGGCTCAACACCCATTGCACTCTTCAACCAATACTGGTAACTGGGAATGCCCAGACTTCTTTCCAGTTCTTTTCAACTCCACAAATGGTCTTGACGTTTCTTACCGTGGAAAGAATGTGAAATATGTGTTGAAAAACAGCCTTGACGTGGCAAGATTTGACTACTATACAATTGGAATGTACCATACTAAGATTGATAGATACATTCCTAACAATAATTCTATCGATGGATGGAAAGGACTTCGAATAGATTATGGAAACTTTTATGCTTCCAAGACTTTCTACGATCCCTCAAGAAACAGAAGGGTTATCTGGGGCTGGAGCAATGAGTCTGATGTGCTTCCAGATGATGAAATCAAAAAGGGATGGGCAGGAATCCAAGGAATTCCTAGACAGGTTTGGCTTAATCTTTCAGGAAAGCAACTTCTGCAGTGGCCAATCGAAGAGCTTGAAACTCTCAGAAAGCAGAAAGTTCAACTTAATAATAAAAAATTATCTAAGGGGGAGATGTTCGAAGTGAAAGGAATTAGCGCATCTCAGGCAGATGTGGAAGTCTTGTTTTCATTTTCTTCCCTCAATGAGGCAGAGCAATTTGATCCACGATGGGCAGATTTGTATGCTCAAGACGTGTGTGCCATCAAAGGCAGTACCATTCAGGGGGGATTAGGGCCTTTTGGACTTGTTACCCTG GCTTCAAAAAATCTTGAAGAGTATACACCCGTTTTTTTCAGGGTGTTTAAAGCTCAAAAGTCTTATAAGATCTTGATGTGTTCTGATGCCAGGAGAAGTTCAATGAGGCAAAACGAAGCTATGTATAAACCATCTTTTGCAGGTTATGTTGATGTTGATCTGGAAGATATGAAAAAGCTTTCTTTGCGAAGCTTGATAGACAACTCCGTTGTAGAGTCATTTGGTGCTGGAGGAAAAACATGTATTACAAGCCGAGTGTACCCAACATTAGCAATCTACGATAACGCTCATCTCTTTGTGTTTAATAATGGATCAGAGACTATAACTATTGAAACCCTCAACGCTTGGTCTATGGATGCTTGCAAAATGAATTGA

>optimized alternate Lin5 invertase ATGGAACTCTTCATGAAGAATTCTTCATTATGGGGCTTAAAGTTTTACCTCTTCTGCCTCTTCATCATCCTTTCAAATATCAACAGGGCTTTCGCTTCACACAATATTTTCTTAGATCTTCAATCAAGCTCAGCTATCTCTGTTAAGAATGTCCACAGAACTAGGTTCCATTTCCAACCCCCTAAGCACTGGATCAACGATCCAAATGCACCTATGTACTACAATGGCGTTTATCACCTCTTCTATCAGTACAACCCTAAAGGAAGCGTTTGGGGTAACATCATCTGGGCACACTCTGTTAGTAAAGATTTGATTAACTGGATCCACTTAGAGCCCGCTATTTACCCTTCTAAAAAATTTGATAAATATGGCACATGGTCTGGGTCTTCTACCATACTCCCAAATAATAAGCCAGTCATTATATATACTGGAGTTGTTGATTCATACAACAATCAGGTGCAAAACTATGCAATTCCAGCAAATCTTTCTGATCCTTTCCTTAGGAAGTGGATAAAGCCTAATAATAATCCTTTAATCGTTCCTGATAACTCAATTAATCGTACTGAATTCAGGGATCCAACAACTGCATGGATGGGACAAGACGGACTTTGGCGTATATTGATTGCTAGTATGAGGAAGCATAGGGGTATGGCTTTGCTTTATAGGAGTCGAGACTTTATGAAGTGGATTAAGGCTCAACACCCATTGCACTCTTCAACCAATACTGGTAACTGGGAATGCCCAGACTTCTTTCCAGTTCTTTTCAACTCCACAAATGGTCTTGACGTTTCTTACCGTGGAAAGAATGTGAAATATGTGTTGAAAAACAGCCTTGACGTGGCAAGATTTGACTACTATACAATTGGAATGTACCATACTAAGATTGATAGATACATTCCTAACAATAATTCTATCGATGGATGGAAAGGACTTCGAATAGATTATGGAAACTTTTATGCTTCCAAGACTTTCTACGATCCCTCAAGAAACAGAAGGGTTATCTGGGGCTGGAGCAATGAGTCTGATGTGCTTCCAGATGATGAAATCAAAAAGGGATGGGCAGGAATCCAAGGAATTCCTAGACAGGTTTGGCTTgATCTTTCAGGAAAGCAACTTCTGCAGTGGCCAATCGAAGAGCTTGAAACTCTCAGAAAGCAGAAAGTTCAACTTAATAATAAAAAATTATCTAAGGGGGAGATGTTCGAAGTGAAAGGAATTAGCGCATCTCAGGCAGATGTGGAAGTCTTGTTTTCATTTTCTTCCCTCAATGAGGCAGAGCAATTTGATCCACGATGGGCAGATTTGTATGCTCAAGACGTGTGTGCCATCAAAGGCAGTACCATTCAGGGGGGATTAGGGCCTTTTGGACTTGTTACCCTGGCTTCAAAAAATCTTGAAGAGTATACACCCGTTTTTTTCAGGGTGTTTAAAGCTCAAAAGTCTTATAAGATCTTGATGTGTTCTGATGCCAGGAGAAGTTCAATGAGGCAAAACGAAGCTATGTATAAACCATCTTTTGCAGGTTATGTTGATGTTGATCTGGAAGATATGAAAAAGCTTTCTTTGCGAAGCTTGATAGACAACTCCGTTGTAGAGTCATTTGGTGCTGGAGGAAAAACATGTATTACAAGCCGAGTGTACCCAACATTAGCAATCTACGATAACGCTCATCTCTTTGTGTTTAATAATGGATCAGAGACTATAACTATTGAAACCCTCAACGCTTGGTCTATGGATGCTTGCAAAATGAATTGA

Figure S24. Optimized sequences encoding the reference and alternate Lin5 proteins.

24

Fig S25 – Bayesian Information Criteria (BIC) for each number of clusters evaluated. Red circle indicates the chosen number of clusters (5).

25

References and Notes 1. Food and Agriculture Organization of the United Nations.

http://faostat.fao.org/site/339/default.aspx

2. D. Tieman, P. Bliss, L. M. McIntyre, A. Blandon-Ubeda, D. Bies, A. Z. Odabasi, G. R. Rodríguez, E. van der Knaap, M. G. Taylor, C. Goulet, M. H. Mageroy, D. J. Snyder, T. Colquhoun, H. Moskowitz, D. G. Clark, C. Sims, L. Bartoshuk, H. J. Klee, The chemical interactions underlying tomato flavor preferences. Curr. Biol. 22, 1035–1039 (2012). doi:10.1016/j.cub.2012.04.016 Medline

3. R. G. Buttery, R. Teranishi, R. A. Flath, L. C. Ling, Fresh tomato volatiles: Composition and sensory studies. Am. Chem. Soc. Symp. 388, 213–222 (1987).

4. E. A. Baldwin, J. W. Scott, C. K. Shewmaker, W. Schuch, Flavor trivia and tomato aroma: Biochemistry and possible mechanisms for control of important aroma components. HortScience 35, 1013–1022 (2000).

5. J. Vogel, D. M. Tieman, C. Sims, A. Odabasi, D. G. Clark, H. J. Klee, Carotenoid content impacts taste perception in tomato (Solanum lycopersicum). J. Sci. Food Agric. 90, 2233–2240 (2010). doi:10.1002/jsfa.4076 Medline

6. B. Zhang, D. M. Tieman, C. Jiao, Y. Xu, K. Chen, Z. Fe, J. J. Giovannoni, H. J. Klee, Chilling-induced tomato flavor loss is associated with altered volatile synthesis and transient changes in DNA methylation. Proc. Natl. Acad. Sci. U.S.A. 113, 12580–12585 (2016). doi:10.1073/pnas.1613910113 Medline

7. T. Lin, G. Zhu, J. Zhang, X. Xu, Q. Yu, Z. Zheng, Z. Zhang, Y. Lun, S. Li, X. Wang, Z. Huang, J. Li, C. Zhang, T. Wang, Y. Zhang, A. Wang, Y. Zhang, K. Lin, C. Li, G. Xiong, Y. Xue, A. Mazzucato, M. Causse, Z. Fei, J. J. Giovannoni, R. T. Chetelat, D. Zamir, T. Städler, J. Li, Z. Ye, Y. Du, S. Huang, Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014). doi:10.1038/ng.3117 Medline

8. E. Fridman, F. Carrari, Y.-S. Liu, A. R. Fernie, D. Zamir, Zooming in on a quantitative trait for tomato yield using interspecific introgressions. Science 305, 1786–1789 (2004). doi:10.1126/science.1101666 Medline

9. M. I. Zanor, S. Osorio, A. Nunes-Nesi, F. Carrari, M. Lohse, B. Usadel, C. Kühn, W. Bleiss, P. Giavalisco, L. Willmitzer, R. Sulpice, Y.-H. Zhou, A. R. Fernie, RNA interference of LIN5 in tomato confirms its role in controlling Brix content, uncovers the influence of sugars on the levels of fruit hormones, and demonstrates the importance of sucrose cleavage for normal fruit development and fertility. Plant Physiol. 150, 1204–1218 (2009). doi:10.1104/pp.109.136598 Medline

10. D. Tieman, M. Zeigler, E. Schmelz, M. G. Taylor, S. Rushing, J. B. Jones, H. J. Klee, Functional analysis of a tomato salicylic acid methyl transferase and its role in synthesis of the flavor volatile methyl salicylate. Plant J. 62, 113–123 (2010). doi:10.1111/j.1365-313X.2010.04128.x Medline

11. M. I. Zanor, J. L. Rambla, J. Chaïb, A. Steppa, A. Medina, A. Granell, A. R. Fernie, M. Causse, Metabolic characterization of loci affecting sensory attributes in tomato allows

26

http://faostat.fao.org/site/339/default.aspx

http://dx.doi.org/10.1016/j.cub.2012.04.016

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=22633806&dopt=Abstract

http://dx.doi.org/10.1002/jsfa.4076


http://dx.doi.org/10.1073/pnas.1613910113


http://dx.doi.org/10.1038/ng.3117


http://dx.doi.org/10.1126/science.1101666


http://dx.doi.org/10.1104/pp.109.136598


http://dx.doi.org/10.1111/j.1365-313X.2010.04128.x

http://dx.doi.org/10.1111/j.1365-313X.2010.04128.x


an assessment of the influence of the levels of primary metabolites and volatile organic contents. J. Exp. Bot. 60, 2139–2154 (2009). doi:10.1093/jxb/erp086 Medline

12. B. Zierler, B. Siegmund, W. Pfannhauser, Determination of off-flavour compounds in apple juice caused by microorganisms using headspace solid phase microextraction–gas chromatography–mass spectrometry. Anal. Chim. Acta 520, 3–11 (2004). doi:10.1016/j.aca.2004.03.084

13. J. Deikman, R. Kline, R. L. Fischer, Organization of ripening and ethylene regulatory regions in a fruit-specific promoter from tomato (Lycopersicon esculentum). Plant Physiol. 100, 2013–2017 (1992). doi:10.1104/pp.100.4.2013 Medline

14. L. Peñarrubia, M. Aguilar, L. Margossian, R. L. Fischer, An antisense gene stimulates ethylene hormone production during tomato fruit ripening. Plant Cell 4, 681–687 (1992). doi:10.1105/tpc.4.6.681 Medline

15. E. Lewinsohn, Y. Sitrit, E. Bar, Y. Azulay, A. Meir, D. Zamir, Y. Tadmor, Carotenoid pigmentation affects the volatile composition of tomato and watermelon fruits, as revealed by comparative genetic analyses. J. Agric. Food Chem. 53, 3142–3148 (2005). doi:10.1021/jf047927t Medline

16. A. E. Oltman, S. M. Jervis, M. A. Drake, Consumer attitudes and preferences for fresh market tomatoes. J. Food Sci. 79, S2091–S2097 (2014). doi:10.1111/1750-3841.12638 Medline

17. D. M. Tieman, M. Zeigler, E. A. Schmelz, M. G. Taylor, P. Bliss, M. Kirst, H. J. Klee, Identification of loci affecting flavour volatile emissions in tomato fruits. J. Exp. Bot. 57, 887–896 (2006). doi:10.1093/jxb/erj074 Medline

18. J. L. Rambla, C. Alfaro, A. Medina, M. Zarzo, J. Primo, A. Granell, Tomato fruit volatile profiles are highly dependent on sample processing and capturing methods. Metabolomics 11, 1708–1720 (2015). doi:10.1007/s11306-015-0824-5

19. L. M. Bartoshuk, V. B. Duffy, K. Fast, B. G. Green, J. Prutkin, D. J. Snyder, Labeled scales (e.g., category, Likert, VAS) and invalid across-group comparisons. What we have learned from genetic variation in taste. Food Qual. Prefer. 14, 125–138 (2003). doi:10.1016/S0950-3293(02)00077-0

20. A. B. Gilmour, B. Gogel, B. Cullis, R. Thompson R ASReml User Guide Release 3.0. VSN International. Hemel Hempstead, UK (2009).

21. R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, J. Wang, SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009). doi:10.1093/bioinformatics/btp336 Medline

22. S. Sato, S. Tabata, H. Hirakawa, E. Asamizu, K. Shirasawa, S. Isobe, T. Kaneko, Y. Nakamura, D. Shibata, K. Aoki, M. Egholm, J. Knight, R. Bogden, C. Li, Y. Shuang, X. Xu, S. Pan, S. Cheng, X. Liu, Y. Ren, J. Wang, A. Albiero, F. Dal Pero, S. Todesco, J. Van Eck, R. M. Buels, A. Bombarely, J. R. Gosselin, M. Huang, J. A. Leto, N. Menda, S. Strickler, L. Mao, S. Gao, I. Y. Tecle, T. York, Y. Zheng, J. T. Vrebalov, J. M. Lee, S. Zhong, L. A. Mueller, W. J. Stiekema, P. Ribeca, T. Alioto, W. Yang, S. Huang, Y. Du, Z. Zhang, J. Gao, Y. Guo, X. Wang, Y. Li, J. He, C. Li, Z. Cheng, J. Zuo, J. Ren, J. Zhao, L. Yan, H. Jiang, B. Wang, H. Li, Z. Li, F. Fu, B. Chen, B. Han, Q. Feng, D. Fan, Y.

27

http://dx.doi.org/10.1093/jxb/erp086


http://dx.doi.org/10.1016/j.aca.2004.03.084

http://dx.doi.org/10.1104/pp.100.4.2013


http://dx.doi.org/10.1105/tpc.4.6.681


http://dx.doi.org/10.1021/jf047927t


http://dx.doi.org/10.1111/1750-3841.12638


http://dx.doi.org/10.1093/jxb/erj074


http://dx.doi.org/10.1007/s11306-015-0824-5

http://dx.doi.org/10.1016/S0950-3293(02)00077-0

http://dx.doi.org/10.1093/bioinformatics/btp336


Wang, H. Ling, Y. Xue, D. Ware, W. Richard McCombie, Z. B. Lippman, J.-M. Chia, K. Jiang, S. Pasternak, L. Gelley, M. Kramer, L. K. Anderson, S.-B. Chang, S. M. Royer, L. A. Shearer, S. M. Stack, J. K. C. Rose, Y. Xu, N. Eannetta, A. J. Matas, R. McQuinn, S. D. Tanksley, F. Camara, R. Guigó, S. Rombauts, J. Fawcett, Y. Van de Peer, D. Zamir, C. Liang, M. Spannagl, H. Gundlach, R. Bruggmann, K. Mayer, Z. Jia, J. Zhang, Z. Ye, G. J. Bishop, S. Butcher, R. Lopez-Cobollo, D. Buchan, I. Filippis, J. Abbott, R. Dixit, M. Singh, A. Singh, J. Kumar Pal, A. Pandit, P. Kumar Singh, A. Kumar Mahato, V. Dogra, K. Gaikwad, T. Raj Sharma, T. Mohapatra, N. Kumar Singh, M. Causse, C. Rothan, T. Schiex, C. Noirot, A. Bellec, C. Klopp, C. Delalande, H. Berges, J. Mariette, P. Frasse, S. Vautrin, M. Zouine, A. Latché, C. Rousseau, F. Regad, J.-C. Pech, M. Philippot, M. Bouzayen, P. Pericard, S. Osorio, A. Fernandez del Carmen, A. Monforte, A. Granell, R. Fernandez-Muñoz, M. Conte, G. Lichtenstein, F. Carrari, G. De Bellis, F. Fuligni, C. Peano, S. Grandillo, P. Termolino, M. Pietrella, E. Fantini, G. Falcone, A. Fiore, G. Giuliano, L. Lopez, P. Facella, G. Perrotta, L. Daddiego, G. Bryan, M. Orozco, X. Pastor, D. Torrents, M. G. M. van Schriek, R. M. C. Feron, J. van Oeveren, P. de Heer, L. daPonte, S. Jacobs-Oomen, M. Cariaso, M. Prins, M. J. T. van Eijk, A. Janssen, M. J. J. van Haaren, S.-H. Jo, J. Kim, S.-Y. Kwon, S. Kim, D.-H. Koo, S. Lee, C.-G. Hur, C. Clouser, A. Rico, A. Hallab, C. Gebhardt, K. Klee, A. Jöcker, J. Warfsmann, U. Göbel, S. Kawamura, K. Yano, J. D. Sherman, H. Fukuoka, S. Negoro, S. Bhutty, P. Chowdhury, D. Chattopadhyay, E. Datema, S. Smit, E. G. W. M. Schijlen, J. van de Belt, J. C. van Haarst, S. A. Peters, M. J. van Staveren, M. H. C. Henkens, P. J. W. Mooyman, T. Hesselink, R. C. H. J. van Ham, G. Jiang, M. Droege, D. Choi, B.-C. Kang, B. Dong Kim, M. Park, S. Kim, S.-I. Yeom, Y.-H. Lee, Y.-D. Choi, G. Li, J. Gao, Y. Liu, S. Huang, V. Fernandez-Pedrosa, C. Collado, S. Zuñiga, G. Wang, R. Cade, R. A. Dietrich, J. Rogers, S. Knapp, Z. Fei, R. A. White, T. W. Thannhauser, J. J. Giovannoni, M. Angel Botella, L. Gilbert, R. Gonzalez, J. Luis Goicoechea, Y. Yu, D. Kudrna, K. Collura, M. Wissotski, R. Wing, H. Schoof, B. C. Meyers, A. Bala Gurazada, P. J. Green, S. Mathur, S. Vyas, A. U. Solanke, R. Kumar, V. Gupta, A. K. Sharma, P. Khurana, J. P. Khurana, A. K. Tyagi, T. Dalmay, I. Mohorianu, B. Walts, S. Chamala, W. Brad Barbazuk, J. Li, H. Guo, T.-H. Lee, Y. Wang, D. Zhang, A. H. Paterson, X. Wang, H. Tang, A. Barone, M. Luisa Chiusano, M. Raffaella Ercolano, N. D’Agostino, M. Di Filippo, A. Traini, W. Sanseverino, L. Frusciante, G. B. Seymour, M. Elharam, Y. Fu, A. Hua, S. Kenton, J. Lewis, S. Lin, F. Najar, H. Lai, B. Qin, C. Qu, R. Shi, D. White, J. White, Y. Xing, K. Yang, J. Yi, Z. Yao, L. Zhou, B. A. Roe, A. Vezzi, M. D’Angelo, R. Zimbello, R. Schiavon, E. Caniato, C. Rigobello, D. Campagna, N. Vitulo, G. Valle, D. R. Nelson, E. De Paoli, D. Szinay, H. H. de Jong, Y. Bai, R. G. F. Visser, R. M. Klein Lankhorst, H. Beasley, K. McLaren, C. Nicholson, C. Riddle, G. Gianese, S. Sato, S. Tabata, L. A. Mueller, S. Huang, Y. Du, C. Li, Z. Cheng, J. Zuo, B. Han, Y. Wang, H. Ling, Y. Xue, D. Ware, W. Richard McCombie, Z. B. Lippman, S. M. Stack, S. D. Tanksley, Y. Van de Peer, K. Mayer, G. J. Bishop, S. Butcher, N. Kumar Singh, T. Schiex, M. Bouzayen, A. Granell, F. Carrari, G. De Bellis, G. Giuliano, G. Bryan, M. J. T. van Eijk, H. Fukuoka, D. Chattopadhyay, R. C. H. J. van Ham, D. Choi, J. Rogers, Z. Fei, J. J. Giovannoni, R. Wing, H. Schoof, B. C. Meyers, J. P. Khurana, A. K. Tyagi, T. Dalmay, A. H. Paterson, X. Wang, L. Frusciante, G. B. Seymour, B. A. Roe, G. Valle, H. H. de Jong, R. M. Klein Lankhorst; Tomato Genome Consortium, The tomato genome sequence provides insights

28

into fleshy fruit evolution. Nature 485, 635–641 (2012). doi:10.1038/nature11119 Medline

23. Y. Li, W. Chen, E. Y. Liu, Y. H. Zhou, Single nucleotide polymorphism (SNP) detection and genotype calling from massively parallel sequencing (MPS) data. Stat. Biosci. 5, 3–25 (2013). doi:10.1007/s12561-012-9067-4 Medline

24. T. Jombart, S. Devillard, F. Balloux, Discriminant analysis of principal components: A new method for the analysis of genetically structured populations. BMC Genet. 11, 94 (2010). doi:10.1186/1471-2156-11-94 Medline

25. X. Zheng, D. Levine, J. Shen, S. M. Gogarten, C. Laurie, B. S. Weir, A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012). doi:10.1093/bioinformatics/bts606 Medline

26. T. Jombart, adegenet: A R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405 (2008). doi:10.1093/bioinformatics/btn129 Medline

27. H. M. Kang, J. H. Sul, S. K. Service, N. A. Zaitlen, S. Y. Kong, N. B. Freimer, C. Sabatti, E. Eskin, Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010). doi:10.1038/ng.548 Medline

28. M. X. Li, J. M. Yeung, S. S. Cherny, P. C. Sham, Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012). doi:10.1007/s00439-011-1118-2 Medline

29. J. C. Barrett, B. Fry, J. Maller, M. J. Daly, Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2004). doi:10.1093/bioinformatics/bth457 Medline

30. D. W. Craig, J. V. Pearson, S. Szelinger, A. Sekar, M. Redman, J. J. Corneveaux, T. L. Pawlowski, T. Laub, G. Nunn, D. A. Stephan, N. Homer, M. J. Huentelman, Identification of genetic variants using bar-coded multiplexed sequencing. Nat. Methods 5, 887–893 (2008). doi:10.1038/nmeth.1251 Medline

31. H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). doi:10.1093/bioinformatics/btp324 Medline

32. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). doi:10.1093/bioinformatics/btp352 Medline

33. K. W. Broman, H. Wu, S. Sen, G. A. Churchill, R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003). doi:10.1093/bioinformatics/btg112 Medline

34. E. S. Lander, D. Botstein, Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199 (1989). Medline

35. A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977).

29

http://dx.doi.org/10.1038/nature11119


http://dx.doi.org/10.1007/s12561-012-9067-4


http://dx.doi.org/10.1186/1471-2156-11-94


http://dx.doi.org/10.1093/bioinformatics/bts606


http://dx.doi.org/10.1093/bioinformatics/btn129


http://dx.doi.org/10.1038/ng.548


http://dx.doi.org/10.1007/s00439-011-1118-2


http://dx.doi.org/10.1093/bioinformatics/bth457


http://dx.doi.org/10.1038/nmeth.1251






http://dx.doi.org/10.1093/bioinformatics/btg112



supplementary materials for · 2017. 1. 25. · we used soap2 to map all the sequencing reads from...

Documents