genome - media.nature.com · the ancestral genome reconstruction procedure is illustrated for agk1...

19
1 Supplementary Figure 1 Method used for ancestral genome reconstruction. MRCA (Most Recent Common Ancestor), AMK (Ancestral Monocot Karyotype), AEK (Ancestral Eudicot Karyotype), AGK (Ancestral Grass Karyotype) were reconstructed according to a two-stage procedure. While the ancestral karyotype is reconstructed in the first stage, the ancestral gene content of such karyotypes, as the gene order, are inferred in the second stage. Schematic representation below of the two-steps method involving: (1) karyotyping (based on core-pPGs), yielding the ancestral protochromosomes, and (2) gene order enrichment (based on pPGs) defining the complete set of oPGs on the protochromosomes, using CIP/CALP Blast parameters as well as DRIMM-Synteny, MGRA tools as detailed in the ‘Methods’ section of the manuscript. pPG conserved in all species (core-pPG). Genome alignment (pPG) Blast - CIP/CALP SB filtering (<5 core-pPG) SB merging (MGRA) Syntenic Blocks with pPG (DRIMM Synteny) Ancestral chromosome number Syntenic Blocks (SB) (DRIMM Synteny) Enrichment of ancestral karyotype with pPG Ancestral gene order (oPG) Stage 1: Karyotype Stage 2: Gene content & order PROTOGENES ANCESTRAL GENOME GENE ORDER CHROMOSOMES Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 Nature Genetics: doi:10.1038/ng.3813

Upload: vokhue

Post on 02-May-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

1

Supplementary Figure 1

Method used for ancestral genome reconstruction.

MRCA (Most Recent Common Ancestor), AMK (Ancestral Monocot Karyotype), AEK (Ancestral Eudicot Karyotype), AGK (Ancestral Grass Karyotype) were reconstructed according to a two-stage procedure. While the ancestral karyotype is reconstructed in the first stage, the ancestral gene content of such karyotypes, as the gene order, are inferred in the second stage. Schematic representation below of the two-steps method involving: (1) karyotyping (based on core-pPGs), yielding the ancestral protochromosomes, and (2) gene order enrichment (based on pPGs) defining the complete set of oPGs on the protochromosomes, using CIP/CALP Blast parameters as well as DRIMM-Synteny, MGRA tools as detailed in the ‘Methods’ section of the manuscript.

pPG conserved in all species (core-pPG).

Genome alignment (pPG)Blast - CIP/CALP

SB filtering (<5 core-pPG)

SB merging(MGRA)

Syntenic Blocks with pPG (DRIMM Synteny)

Ancestral chromosomenumber

Syntenic Blocks (SB) (DRIMM Synteny)

Enrichment of ancestral karyotype with pPG

Ancestral gene order (oPG)

Sta

ge

1: K

ary

oty

pe

Sta

ge

2: G

en

e co

nte

nt &

ord

er

PROTOGENES

ANCESTRAL GENOME

GENE ORDER

CHROMOSOMES

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Nature Genetics: doi:10.1038/ng.3813

Page 2: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

2

Supplementary Figure 2

Schematic representation of ancestral genome reconstruction.

The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes r1 (rice), s3 (sorghum), b2 (Brachypodium with telomere, tel.) defining A1 (cf dotplots and chromosome schema entitled CAR1, left top) and orthologous chromosomes r5, s9, b2 (centromere, cent.) defining A5 (cf dotplots and chromosome schema entitled CAR5, left bottom), based on pPG and core-PG defining oPG (color code). The post-duplication A5 and A1 protochromosomes are ultimately fused into a pre-duplication AGK1 protochromosome based on the paralogy observed between r1-r5, s3-s9, b2 tel-cent (cf dotplots entitled CAR 1/5 paralogy, middle left), based on pPG and core-PG defining oPG (cf color code) as defined in the ‘Methods’ section of the manuscript.

r1

s3 b2 (tel.)

r5

s9 b2 (cent.)

r1 s3 b2 (tel.)

r5 s9 b2 (cent.)

CA

R 1

/5 p

ara

log

yC

AR

1 (

A1

gra

sse

s)

Ancestors post-WGD

Ancestor pre-WGD

r1 s3 b2 (tel.)

r5 s9 b2 (cent.)

A1

A5

pPGcore-pPG

pPGcore-pPG

core-pPG

An

cesto

rA

nc

es

tor

r1 s3 b2 (tel. / cent.)

An

cesto

r

s9r5

pPG

r1

r5

s3

s9

b2

b2T C T

T

T

C

oPG

oPG

oPG

Ancestor pre-WGD

Modern synteny/paralogy

CA

R 5

(A

5 g

rasses)

AGK1

Nature Genetics: doi:10.1038/ng.3813

Page 3: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

3

Supplementary Figure 3

AGK (pre-ρ / post-ρ) reconstruction.

AGK was reconstructed in comparing sorghum, rice and Brachypodium, following the strategy detailed in the ‘Methods’ section of the manuscript and delivering a post-ρ AGK with 12 chromosomes and 14241 genes and pre-ρ AGK with 7 chromosomes and 7010 genes. a. Detailed procedure for AGK reconstruction from modern genomes (top) and the identification of pPGs, core-pPGs and oPGs

following karyotyping and enrichment procedures for pre-ρ and post-ρ ancestors. These newly defined AGKs (pre-ρ / post-ρ) clearly refine previously reported grass ancestors based on solely 16464 pPGs by Murat et al. (Ref 25, cf manuscript), corresponding to 84 % of the 19430 pPGs defined here for the new 12-protochromosomes AGK and 6246 oPGs by Murat et al. (Ref 25, cf manuscript), corresponding to 89 % of the 7010 oPGs defined here for the new 7-protochromosomes AGK. b. Dotplot representation of the

comparison between AGK ancestors (n=7, n=12; y-axis) and modern species (x-axis) with a five-colors code illuminating AMK chromosomes. The dotplot representation of post-ρ AGK and pre-ρ AGK against the sorghum-rice-Brachypodium genomes demonstrated that the reconstructed AGKs are robust and accurate with all chromosomes from either the modern species or from the ρ-block pairs entirely covered by the inferred karyotypes (i.e. 100% modern genomes/chromosomes integrated into the post-ρ and pre-ρ AGK protochromosomes). c. The inferred n=7 AGKs (pre-ρ, AGK1-7) defines precise and exhaustive syntenic chromosome

relationships between the modern grasses as detailed in the table. Following the proposed method applied on rice, Brachypodium and sorghum, we inferred a post-ρ AGK consisting of 12 protochromosomes containing 14241 ordered protogenes (oPGs). Alignment of the two subgenomes generated by ρ made it possible to reconstruct a pre-ρ AGK, consisting of 7010 ordered protogenes (oPGs) on 7 protochromosomes. The transition from the pre-ρ to post-ρ AGK involved seven known paralogous ancestral chromosome pairs (using the rice chromosome nomenclature): A1-A5, A2-A4, A2-A6, A3-A7, A3-A10, A8-A9, A11-A12.

a

Oryza sativa v7.0

12 chromosomes

38864 genes

Brachypodium distachyon v2.1

5 chromosomes

26523 genes

Sorghum bicolor v1.0

10 chromosomes

27412 genes

Step II. 16177 pPG conserved in the three species (Core-pPG).

Step I.19430 Potential protogenes (pPG)

Step VII. 15122 pPG In Syntenic Blocks.

Step IV. 11193 pPG after filtering

Step II. 835 pPG (duplicated)

Step VI. 16453 pPG non core-pPG.

Step III. 11373 pPG In Syntenic Blocks (SB).

Step VIII. 14241 oPG in 12 Protochromosomes.

Step V. 11193 protogenes (oPG) in 12 Protochromosomes (SB).

Step III. 794 pPG / 88 SB.

Step V. 755 oPG / 68 SB7 Protochromosomes.

Step IV. 755 pPG / 68 SB.

Karyotyping

EnrichmentEnrichment

Karyotyping

Step VI’. 607 oPG / 68 groups7 Protochromosomes.

Step VII’. 7010 oPG in 7 Protochromosomes.

Ancestor Post-ρ Ancestor Pre-ρ

Nature Genetics: doi:10.1038/ng.3813

Page 4: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

4

b

An

ce

sto

r P

re-ρ

An

ce

sto

r P

os

t-ρ

Rice Brachy SorghumAncestor Post-ρ1 12 1 12 1 5 1 10

1

12

Rice1 12 Brachy1 5 1 10Sorghum

1

7

c

AGK Pre-ρ AGK Post-ρ Rice Brachypodium Sorghum AGK1 1-5 1-5 2-2 3-9 AGK 2 3-10 3-10 1-3 1-1 AGK 3 3-7 3-7 1-1 1-2 AGK 4 2-6 2-6 1-3 4-10 AGK 5 2-4 2-4 3-5 4-6 AGK 6 8-9 8-9 3-4 2-7 AGK 7 11-12 11-12 4-4 5-8

.

Nature Genetics: doi:10.1038/ng.3813

Page 5: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

5

Supplementary Figure 4

Oil palm (pre-p / pre-τ) ancestor reconstruction.

The oil palm paleohistory was reconstructed following the strategy detailed in the ‘Methods’ section of the manuscript and delivering a pre-p oil palm ancestor of 10 chromosomes with 13916 genes and a pre-τ oil palm (or AMK) of 5 chromosomes with 6707 genes. a.

Detailed procedure for AMK (pre-p / pre-τ) reconstruction from the oil palm genome, leading to the identification of pPGs, core-pPGs and oPGs following the karyotyping and enrichment procedures for pre-τ and post-τ ancestors. The analysis unveiled the p duplication (p1-p6, p3-p7, p4-p11, p12-p16, p5-p14, p2-p10, p5-p10, p13-p15, p8-p10, p2-p8, p2-p9, p7-p8, p3-p10, p3-p7 paralogous blocks) as well as the τ duplication (p1-p6/p3-p7, p2-p8-p10/p2-p7-p8-p9, p4-p11/p12-p16, p5-p14/p2-p5-p10, p3-p7-p10/p13-p15 paralogous blocks) b. Dotplot representation of the comparison between oil palm (16 chromosomes), oil palm pre-p (n=10), AEK (pre-γ n=7, post-γ

n=21) and AGK (pre-ρ n=7, post- ρ n=12) with a five-colors code illuminating AMK chromosomes. The observed synteny and paralogy are entirely integrated in 5 CARs (diagonal color code) delivering a pre-p oil palm ancestor of 10 chromosomes (13916 ordered protogenes, oPGs) and a pre-τ AMK of 5 chromosomes (6707 ordered protogenes, oPGs).

a

Ancestor Pre-τ

Elaeis guinensis

(20965 genes).

Step II. 2998 pPG

Step I.2998 Potential proto-genes (pPG)

Step IV. 2861 pPG after filtering.

Step V. 2439 oPG in 10 Protochromosomes.

Step III. 2902 pPG In Syntenic Blocks (SB).

Karyotyping

Step VI’. 2141 oPG (duplicated)

Step VII’. 13916 oPG in 10 ProtoChromosomes

Enrichment

Ancestor Pre-p

Step II. 1064 pPG

Step I.1064 Potential proto-genes (pPG)

Step IV. 301 pPG after filtering.

Step V. 301 oPG in 5 Protochromosomes.

Step III. 301 pPG In Syntenic Blocks.

Karyotyping

Enrichment

Step VI’. 301 oPG (duplicated)

Step VII’. 6707 oPG in 5 Protochromosomes

Nature Genetics: doi:10.1038/ng.3813

Page 6: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

6

b

Oil

pa

lm

1

10

1

21

1

16

1

10

1

12

1 1 17 10 7

1 17 7

5452 oPG13916 oPG5143 oPG

6284 oPG 7010 oPGAGK (pre-ρ)

AGK (pre-ρ)

Oil palm pre-pAEK (pre-γ)

AEK (pre-γ)

Oil

pa

lm p

re-p

AE

K (

po

st-γ

)

Oil

pa

lm p

re-p

AG

K (

po

st-ρ

)

.

Nature Genetics: doi:10.1038/ng.3813

Page 7: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

7

Supplementary Figure 5

Oil palm (pre-p) and AMK (pre-τ) ancestors.

The oil palm genome allowed the characterization of the shared ancestral τ duplication through a mixed approach including Ks and synteny inference of paralogs for the lineage-specific p duplication (see also previous Supplementary Figure 4) as well as the monocot τ duplication. a. Schematic representation of Ks distribution separating two WGDs (recent p WGD in red and ancient τ WGD in blue, top

illustration) in the oil palm genome; and the corresponding gene pairs defining paralogous blocks (diagonals) on the oil palm-oil palm dotplot comparison (bottom illustration) with p duplication in red (p1-p6, p3-p7, p4-p11, p12-p16, p5-p14, p2-p10, p5-p10, p13-p15, p8-p10, p2-p8, p2-p9, p7-p8, p3-p10, p3-p7 paralogous blocks) and τ duplication in blue (p1-p6/p3-p7, p2-p8-p10/p2-p7-p8-p9, p4-p11/p12-p16, p5-p14/p2-p5-p10, p3-p7-p10/p13-p15 paralogous blocks). b. Schematic representation of the deduced most

parsimonious (reduced number of fusion and fission events) evolutionary model from a pre-τ AMK structured in 5 chromosomes (with 6707 genes) deriving a n=10 post-τ AMK (13916 genes) followed by a chromosomal fusion (between A7-A9) to derive a n=9 intermediate. The second WGD (p) derives a n=18 (2x9) intermediate followed by 5 chromosomal fissions (breaks) and 7 fusions to reach the modern oil palm genome (n=16), top illustration. Dotplot diagonals correspond to the p and τ WGDs from the oil palm-oil palm comparison with a colour code highlighting the five AMK (oil palm pre- τ) protochromosomes (right), and entirely covering the modern oil pam genome (with pre-τ AMK ancestor as color code at the center).

a b

Oil

Palm

Oil Palm

Oil

Palm

Oil Palm

ks

#Paralogous pairs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Nature Genetics: doi:10.1038/ng.3813

Page 8: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

8

Supplementary Figure 6

Dotplot-based deconvolution of the AMK (pre-τ), AGK (pre-ρ) and oil palm genomes into 5 CARs.

The complete dotplot-based comparative genomics deconvolution into reconstructed CARs of the observed synteny and paralogy between the oil palm (pre-p with 10 chromosomes and 13916 genes), AMK (pre-τ with 5 chromosomes and 6707 genes) and AGK (pre-ρ with 7 chromosomes and 7010 genes) genomes validates the five proposed protochromosomes as the origin of monocots, i.e. AMK. AGK-AMK (right), AGK-oil palm (left) and oil palm-AMK (centre) dotplot comparisons illustrate the complete deconvolution of synteny/paralogy relationships (diagonals) into 5 CARs (color code), as illuminated as a case example for AK1 (external red arrows) deriving orthologous/paralogous blocks on AGK (chromosomes A1-4-6) and oil palm (chromosomes 1-3-6-7). Oil pam (16 chromosomes, z-axis) and AGK (7 ancestral chromosomes from rice, Brachypodium and sorghum comparison, y-axis) genomes are entirely covered with 5 independent ancestral chromosome (AMK CARs, x-axis) that do not share any orthologous or paralogous relationship.

AMK

AGK

Oil Palm

1

2

3

4

5

6

7

AK1 AK2 AK3 AK4 AK5

AMK

Nature Genetics: doi:10.1038/ng.3813

Page 9: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

9

Supplementary Figure 7

Dotplot-based deconvolution of the AMK (pre-τ), pineapple genomes into 5 CARs.

The dotplot-based comparative genomics of the pineapple (25 chromosomes and 23892 genes) genome and the pre-τ AMK (5 chromosomes and 6707 genes) genome revealed the precise nature of the σ event as a hexaploidization event, with a clear 1-to-6 (two regions inherited from τ and three from σ) chromosomal relationships between AMK and pineapple genomes. Dotplot (top) comparison of AMK (post-τ, y-axis) and pineapple (x-axis) illustrating a 1-to-6 relationship (horizontal) observed between the 5 AMK protochromosomes (or CARs, diagonal color code) and pineapple chromosomes. As a case example for AK1 (bottom), the number of identified conserved genes (y-axis) is shown as a distribution of 6 pineapple orthologous chromosomes (x-axis). The 1-to-6 relationship observed between AMK and pineapple involves τ (reported as a tetraploidization) and σ, a paleohexaploidization event. The observed difference in ancestral gene retention between the triplicated regions inherited from σ in pineapple (chromosomes 11-12-16 and chromosomes 1-5-10 as case example) is in favour of a post-polyploidy subgenome dominance with chromosomes 10-11 as the least fractionated (LF), chromosomes 5-16 as the medium fractionated (MF1) and chromosomes 1-12 as the most fractionated (MF2) compartments. The comparison of the pineapple genome and AMK (pre-τ, 5 protochromosomes with 6707 ordered protogenes) allowed us to unveil the precise nature of the σ event as a hexaploidization with a clear 1-to-6 (two regions inherited from τ and three from σ) chromosome relationships between the investigated genomes, then deriving the AMK pre-σ with 9 chromosomes (from the fusion of two ancestral A7-A9 post-τ chromosomes) and the AMK post-σ with 27 chromosomes. The σ hexaploidization event shows the characteristic of the subgenome dominance phenomenon following hexaploidization involving MF2 (most fractionated), MF1 (medium fractionated) and LF (least fractionated) compartments as reported in the case of the ancestral hexaploidization in rosids by Murat et al. (Ref. 24, cf manuscript) and Brassicacae by Murat et al. (Ref. 32, cf manuscript).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 202122232425

Pine Apple

AM

K

1

2

3

4

5

τ

σ

LF MF1 MF2 LF MF1

MF2

LF MF

PA11 PA16 PA12 PA10 PA5 PA1PA1 PA5 PA10 PA11 PA12 PA16

Nature Genetics: doi:10.1038/ng.3813

Page 10: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

10

Supplementary Figure 8

Dotplot-based deconvolution of the AMK (pre-τ and post-σ) and oil palm (pre-p) genomes into 5 CARs.

The complete dotplot-based comparative genomics deconvolution of the observed synteny between the oil palm (pre-p, 10 chromosomes and 13916 genes), AMK pre-τ (5 chromosomes and 6707 genes), and AMK post-σ (27 chromosomes and 2708 genes) genomes into five independent CARs validated the origin of monocots from the five proposed protochromosomes, the tetraploid and hexaploid nature of respectively the τ (referenced as x2) and σ (referenced as x3) polyploidization events. a. AMK (pre-τ, 5

chromosomes)/AMK (post-σ, 27 chromosomes) (right), AMK (post-σ, 27 chromosomes)/oil palm (pre-p, 10 chromosomes) (left) and AMK (pre-τ, 5 chromosomes)/ oil palm (pre-p, 10 chromosomes) (centre) dotplot comparisons illustrate the complete deconvolution of synteny/paralogy relationships (diagonals) into 5 CARs (color code, bottom tight). b. AMK pre-τ, AMK post-σ and oil palm pre-p, are

entirely covered with 5 independent ancestral chromosomes (AMK CARs) defining a 1-to-6 (τ(x2) and σ(x3)) chromosomal relationship between AMK pre-τ / AMK post-σ, a 1-to-6 (τ(x2) and σ(x3)) chromosomal relationship between AMK post-σ / oil palm pre-p and a 1-to-2 (τ(x2)) chromosomal relationship between AMK pre-τ / oil palm pre-p as AMK post-σ / AMK post-σ and detailed in the table with the ancestral chromosomes referred to as AK’X’ and modern chromosome as ’X'.

a

AMK

AMK post-σ

Pre-P

x6 = τ(x2) + σ(x3)

123

456789101112

131415161718

19202122

2726252324

x6 = τ(x2) + σ(x3)

AK1 AK2 AK3 AK4 AK5

AMK

b AMK Pre-τ AMK Post-τ/Pre-p

(x2) Oil palm

(x4) AMK Post-σ

(x6) AGK Pre-ρ (oil palm)

(x6)

Pineapple (x6)

AK1 1-2 1-3-6-7 1-2-3-4-5-6 1-4-6 1-5-10-11-12-16

AK2 3-4 4-11-12-16 7-8-9-10-11-12 2-5 3-7-8-9-22-24-25

AK3 5-6 2-5-10-14 13-14-15-16-17-18 1-3-7 4-12-13-14-18-23

AK4 7-8 3-7-10-13-15 19-20-21-22-23-24 2-3-4-5-6-7 1-2-6-12-15-20

AK5 9-10 2-7-8-9-10 19-20-21-25-26-27 4-5-6-7 1-2-5-6-12-14-17-19-20-21

Nature Genetics: doi:10.1038/ng.3813

Page 11: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

11

Supplementary Figure 9

AEK (pre-γ, post- γ) reconstruction.

AEK was reconstructed in comparing cocoa, grape and peach, following the strategy detailed in the ‘Methods’ section of the manuscript, delivering a post-γ AEK with 21 chromosomes with 9022 genes and pre-γ AEK with 7 chromosomes with 6284 genes. a.

Detailed procedure for AEK reconstruction from modern genomes (top) and the identification of pPGs, core-pPGs and oPGs following the karyotyping and enrichment procedures for pre-γ and post-γ AEK ancestors. The proposed pre-γ and post-γ AEK ancestors considerably refine previous reported AEK based on solely 7072 pPGs by Murat et al. (Ref. 24, cf manuscript), corresponding to 48 % of the 14730 newly defined pPGs for the 21-protochromosome AEK, and 626 oPGs by Murat et al. (Ref. 24, cf manuscript), corresponding to 10 % of the 6284 newly defined oPGs for the 7-protochromosomes AEK b. Dotplot representation of the comparison

between AEK ancestors (n=7, n=21; y-axis) and modern species (x-axis) with a five-colors code illuminating the AMK chromosomes. The accuracy of the reconstructed post-γ AEK and pre-γ AEK is validated by the dotplot comparisons, which unveil 100% coverage of all chromosomes either from modern species or from the γ-block triplets into the reconstructed n=7 and n=21 AEK protochromosomes. c. The inferred n=7 AEKs (pre- γ, AEK1-7) defines exhaustive and accurate syntenic chromosome relationships between the modern

eudicots as detailed in the table. The transition from the pre-γ AEK to the post-γ AEK involved seven known paralogous ancestral chromosome triplets (using the grape chromosome nomenclature): A1-A14-A17, A2-A15-A12-A16, A3-A4-A7-A18, A4-A9-A11, A5-A7-A14, A6-A8-A13, A10-A12-A19.

a

Theobroma cacao CocoaGen DB

10 chromosomes

32862 genes

Prunus persica v1.0

8 chromosomes

27256 genes

Vitis vinifera Genoscope 12X

19 chromosomes

23647 genes

Step II. 11500 pPG conserved in the three species (core-pPG).

Step I. 14730 Potential protogenes (pPG)

Step VII. 11310 pPG In Syntenic Blocks.

Step IV. 5780 pPG after filtering.

Step V. 5720 protogenes (oPG) in 21 Protochromosomes.

Step VI. 13093 pPG non core-pPG.

Step III. 11269 pPG in Syntenic Blocks (SB).

Step VIII. 9022 oPG in 21 Protochromosomes .

Karyotyping

Step II. 499 pPG (duplicated)

Step III. 361 pPG / 42 SB.

Step V. 352 oPG / 41 SB7 Protochromosomes.

Step IV. 361 pPG / 42 SB.

Step VI’. 335 oPG / 41 groups7 Protochromosomes.

Step VII’. 6284 oPG in 7 Protochromosomes.

EnrichmentEnrichment

Karyotyping

Ancestor Pre- γAncestor Post-γ

Nature Genetics: doi:10.1038/ng.3813

Page 12: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

12

b

Grape Peach Cacao

1

An

ce

sto

r P

re-γ

7

Grape Peach Cacao

An

ce

sto

r P

os

t-γ

21

1

1 119 8 10

1 18 101 191 21Ancestor Post-γ

11

c

AEK Pre-γ AEK Post-γ Grape Peach Cocoa AEK1 1-2-3 2-15-16 2-2-5 1-3-3 AEK 2 4-5-6 4-9-11 1-3-7 6-9-9 AEK 3 7-8-9 5-7-14 1-6-8 1-4-5 AEK 4 10-11-12 6-8-13 2-4-6-7 5-9-10 AEK 5 13-14-15 3-4-7-18 1-4-6-7-8 1-2-8 AEK 6 16-17-18 1-14-17 1-3-4-5 2-3-4 AEK 7 19-20-21 10-12-19 3-4-4 1-2-6-7

Nature Genetics: doi:10.1038/ng.3813

Page 13: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

13

Supplementary Figure 10

Dotplot-based deconvolution of the AMK, AEK and Amborella genomes into 15 CARs.

In order to identify the most parsimonious structure of the MRCA for eudicots/monocots from the comparison of AEK and AMK, an outgroup species is required. Amborella trichopoda is a basal angiosperm with no specific WGD event making it the most interesting candidate in reconstructing eudicots/monocots MRCA. Amborella genome contains 13 chromosomes for an estimated size of 870 Mb. The available genome is assembled into 5745 scaffolds (706 Mb) and 26846 annotated genes. 38 scaffolds are assigned by FISH to 12 over the 13 chromosomes, representing 34% of the genome (i.e. 9252 annotated genes, 204 Mb). The dotplot-based comparative genomics of the reconstructed AEK and AMK unveiled a synteny signal between these two principal families of flowering plants, in the form of 15 CARs recovered from 1175 orthologs that appeared to display a higher degree of structural conservation (with one-to-one chromosome relationships on average, i.e 1.3) between the AMK and the Amborella genome, than between the AEK and Amborella (with one-to-two chromosome relationships on average, i.e 1.9). The complete deconvolution of the observed syntenies between the Amborella, AMK (pre-τ) and AEK (pre- γ) genomes into independent CARs ultimately validated the existence of a angiosperm progenitor with a minimum of fifteen proposed protochromosomes. a. AMK (pre-τ, 5 chromosomes)/AEK (pre-γ, 7 chromosomes) (right), AMK (pre-τ, 5 chromosomes)/Amborella (13 chromosomes) (left) and AEK (pre-γ, 7 chromosomes)/Amborella (13

chromosomes) (centre) dotplot comparisons illustrate the complete deconvolution of synteny/paralogy relationships (diagonals, color code from the 5 AMK CARs, top) into 15 CARs (rectangles). 3-D dotplot comparison is provided at the centre with the associated 2-D dotplots provided for details. b. chromosome-to-chromosome synteny relationships (table lines corresponding to colored arrows on the panel A) illustrate a 1.3 ratio between Amborella and AMK chromosomes and a 1.9 ratio between Amborella and AEK chromosomes. c.

Illustration of the close structural relationship at the chromosome level between Amborella and AMK compared to Amborella and AEK with Amborella chromosome 4, structurally conserved with AMK chromosome 4, whereas two orthologous regions are identified in AEK chromosomes 5 (CAR number 14 on the panel A) and 6 (CAR number 8 on the panel A). d. Illustration of the synteny relationships (15 CARs from the panel A and detailed in the table from panel B) as colored blocks (reflecting the 5 AMK CARs) between Amborella AEK

and AMK, suggesting AMK as closest representative of the angiosperm MRCA of 15 CARs. The comparison of the reconstructed AEK (pre-γ with 7 chromosomes and 6284 protogenes) and AMK (pre-τ with 5 chromosomes and 6707 protogenes) deliver a syntenic signal between these two main families of flowering plants consisting in 15 CARs recovered from 1175 orthologs.

a

AK1 AK2 AK3 AK4 AK5

AMK

AEK

4 2

11

12

107

6

113

15

14

8

5

9

3

Amborellatrichopoda

8

14

12

4

10

3

7

2411

12

10

6

73

113

1514 8

95

5

11

13

2

6

11

5124

11

814

2

37

10

6

1

1

Nature Genetics: doi:10.1038/ng.3813

Page 14: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

14

b c

B

D

1 2 3 4 5 6 7 8 9 101112 13

x

Amborella trichopoda AEK AMK1 7 1 5

Eudicots MonocotsBasal angiosperms

Angiosperms

MRCA E-M1 15

C

AEK chr5

Amborella chr4

AMK chr4

chr6

CAR 14 CAR 8

CARs AEK-AMK

AEK chr (n=7)

AMK chr (n=5)

Amborellatrichopodachr (n=13)

Nb AEK regions

Nb AMK regions

5 2 5 1

3 212 7 1 1

4 1 1 1

11 6 1 22 2

13 3 3 2

8 6 4 42 1

14 5 4 4

2 2 1 5 1 1

3 1 2 7

3 17 4 2 7

10 3 2 7

6 5 2 9 1 1

1 4 3 8|12 1 1

9 6 5 NANA NA

15 2 4 NA

Average 1.9 1.3

d

1 2 3 4 5 6 7 8 9 101112 13

x

Amborella trichopoda AEK AMK1 7 1 5

Eudicots MonocotsBasal angiosperms

Angiosperms

MRCA1 15

Nature Genetics: doi:10.1038/ng.3813

Page 15: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

15

Supplementary Figure 11

Gene Ontology enrichment comparing MRCA and extant genomes.

We compared 37 plant species (13 monocots, 20 eudicots, one basal angiosperm (Amborella trichopoda), and three outgroup species (Picea abies as a representative of gymnosperms, Physcomitrella patens as a representative of mosses and Chlamydomonas reinhardtii as a representative of single-celled green algae)) to identify 22899 orthologous gene clusters representative of the angiosperm MRCA gene pool with 10263 gene groups specific to flowering plants (absent from the outgroup species). We performed a GO (Gene Ontology) enrichment analysis at the Biological process, Cellular component and Molecular function levels. Enriched GO terms (at the Biological process, Cellular component and Molecular function levels) in comparing 10263 angiosperm-specific gene clusters to the 22899 MRCA gene pool (a), in comparing the 22899 MRCA gene pool to that of modern species, considering Arabidopisis thaliana as a reference with the most robust GO terms and classification (b), and in comparing the modern genomes to the 22899 MRCA gene pool (c). GO analysis from the panel a delivers enriched terms in angiosperms compared to outgroups. GO analysis from the panel b delivers enriched terms in MRCA compared to extant species, i.e. basic functions or processes already present in the ancestor. Finally, GO analysis from panel c delivers enriched terms in extant species compared to MRCA, i.e. functions amplified during evolution.

a

Biological process Cellular component Molecular function10263 angiosperm-specific genes compared to 22899 MRCA genes

b

22899 MRCA genes compared to modern genomes (Arabidopsis)

Biological process Cellular component Molecular function

Nature Genetics: doi:10.1038/ng.3813

Page 16: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

16

c

Modern genomes (Arabidopsis) compared to 22899 MRCA genesBiological process Cellular component Molecular function

Nature Genetics: doi:10.1038/ng.3813

Page 17: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

17

Supplementary Figure 12

Genomic evolutionary plasticity of the AEK and AGK genomes.

Comparisons of the genomes of the MRCA and extant modern species provided insight into the structural plasticity of angiosperm genomes and diversification in the response to polyploidy where: (1) post-duplication chromosomes have similar gene distributions, from the gene-rich telomeric regions to the gene-poor centromeric regions; (2) lineage-specific genes (i.e. not mapped onto the ancestral karyotypes and referred to as ‘non-AK genes’) are preferentially located in pericentromeric and subtelomeric regions; (3) the ancestral gene pool is partitioned between paralogous blocks, forming the MF (most fractionated (MF) also described as sensitive (S)) and LF (least fractionated (LF) also described as dominant (D)) chromosomal compartments; (4) gene pairs evolved differentially between rapidly evolving and slowly evolving species (such as rice for the monocots and grape for the dicots); (5) ancestral genes tend to have larger numbers of exons than non-ancestral genes; (6) genes conserved in grasses have a higher GC content than those conserved in eudicots. Detailed analysis of the genomic features (gene number, AK genes, Ks, exon numbers, GC content) are reported in columns, for the monocot and eudicot species investigated (in rows). Statistically significant differences (t-test p<0.05) are indicated by red stars for each genomic feature tested (columns). Avg = average, SD = standard deviation, % = percentage, spe=specific, LF=least fractionated, MF=most fractionated.

Genes

AK genes

Ks (media

n)

Exons n

umber (a

vg)

% G

C conte

nt (avg

)

% G

C conte

nt (SD

)

Specie

s

Partiti

onning

Monocots

Rice LF 39049 4503 1,14 5,8 57 9,7

MF 2225 5,8 57 9,8

Spe 32321 3,8 58 10

Brachy LF 26552 4514 1,21 6 56 9

MF 2274 6 57 9

Spe 19764 4,7 56 9

Sorghum LF 27607 5084 1,37 5,8 57 9,6

MF 2533 5,7 57 9,7

Spe 19990 4,5 57 10

Eudicots

Grape LF 26346 3021 1,32 7,4 46 3,3

MF1 2086 7,2 46 3,4

MF2 1512 7,5 46 3,7

Spe 19727 5,5 44 6

Cacao LF 46144 2609 1,9 7 45 3

MF1 1854 6,8 44 3

MF2 1326 7,3 45 3

Spe 40355 4,3 41 5,5

Peach LF 27864 2719 1,65 6,7 46 3,7

MF1 1877 6,6 46 3,7

MF2 1314 7 46 4

Spe 21954 4,4 45 4,8

Monocots

Rice LF 39049 4503 1,14 5,8 57 9,7

MF 2225 5,8 57 9,8

Spe 32321 3,8 58 10

Brachy LF 26552 4514 1,21 6 56 9

MF 2274 6 57 9

Spe 19764 4,7 56 9

Sorghum LF 27607 5084 1,37 5,8 57 9,6

MF 2533 5,7 57 9,7

Spe 19990 4,5 57 10

Eudicots

Grape LF 26346 3021 1,32 7,4 46 3,3

MF1 2086 7,2 46 3,4

MF2 1512 7,5 46 3,7

Spe 19727 5,5 44 6

Cacao LF 46144 2609 1,9 7 45 3

MF1 1854 6,8 44 3

MF2 1326 7,3 45 3

Spe 40355 4,3 41 5,5

Peach LF 27864 2719 1,65 6,7 46 3,7

MF1 1877 6,6 46 3,7

MF2 1314 7 46 4

Spe 21954 4,4 45 4,8

Clade

P-value <0,05

Nature Genetics: doi:10.1038/ng.3813

Page 18: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

18

Supplementary Figure 13

Dating the Angiosperm origin.

286 orthologous gene clusters, from the 1175 angiosperm MRCA genes, containing single-gene copies from peach, grape, cocoa, Brachypodium, rice and sorghum were used to assess the age of angiosperm origin (considered here as the monocot/eudicot speciation date) using BEAUTi, BEAST and visualized in FigTree and DensiTree (see ‘Methods’ section of the main manuscript) with a minimum age calibration from fossils of 65 mya for grasses and 125 mya for the eudicots. We inferred that angiosperms originated 214 (between 190-238) mya. The figure illustrates the maximum clade credibility tree from divergence time estimates of angiosperms based on 286 ancestral genes conserved in peach, grape, cocoa, Brachypodium, rice and sorghum. The branch length illustrates the average substitutions per sites according to the time scale (in million years, my) associated with major geological epochs (bottom). The 95% highest posterior density (HPD) estimates for each well-supported speciation events are represented by horizontal blue bars and with age intervals mentioned in brackets. Red asterisks represent minimum age fossil calibrations with 65 my for grasses and 125 my for the eudicots. Dashed horizontal black lines represent age estimates of the angiosperm origin from the literature (references cited) and timetree (http://www.timetree.org/).

050100150200250300350

CRETACEOUS TERTIARYJURASSICTRIASSICPERMIANCARBON.

CenozoicMesozoicPaleozoic

QU

AT

Eudicots

Million years

*

*

Sorghum

Rice

Brachy

Grape

Peach

Cocoa

[87-109]

[64-81]

[65-81]

[60-75]

73

68

98

73

Angiosperms

Smith et al. (2010)[182-257]

Seng et al. (2014)[225-240]

Magalloon and Sanderson (2005)[176-317]

Timetree[168-194]

[190-238]

214

Grasses

Nature Genetics: doi:10.1038/ng.3813

Page 19: GENOME - media.nature.com · The ancestral genome reconstruction procedure is illustrated for AGK1 (Ancestral Grass Karyotype chromosome 1, right) deriving from the orthologous chromosomes

19

Supplementary Table 5

Synteny relationships between AEK, AGK, AMK based on MRCA.

Chromosomal conservation (rows) between MRCA (CARs), AMK (AK1-AK5), AEK (n=7 pre-γ and n=21 post-γ), AGK (n=7 pre-ρ and n=12 post-ρ) and the modern monocot (rice as representative, with chromosomes 1 to 12) and eudicot (grape as representative, with chromosomes 1 to 19) species (columns).

MRCA Pre-γ (AEK n=7) Post-γ (AEK n=21) Grape (reference)

(reference)

AMK (n=5) Pre-ρ (AGK n=7) Post-ρ (AGK n=12) Rice (reference)

CAR 4 Ch1 Ch1-2-3 Ch2-15-16

Ch1 Ch1-5 Ch1-5

CAR 2 Ch2 Ch4-5-6 Ch4-9-11

AK1 Ch4 Ch2-6 Ch2-6 CAR 11 Ch6 Ch16-17-18 Ch1-14-17

Ch6 Ch8-9 Ch8-9

CAR 12 Ch7 Ch19-20-21 Ch10-12-19 Ch3 Ch3-7 Ch3-7

CAR 3 Ch1 Ch1-2-3 Ch2-15-16 Ch2 Ch3-10 Ch3-10 CAR 10 Ch3 Ch7-8-9 Ch5-7-14 AK2 CAR 7 Ch4 Ch10-11-12 Ch6-8-13 Ch5 Ch2-4 Ch2-4 CAR 6 Ch5 Ch13-14-15 Ch3-4-7-18

Ch1 Ch1-5 Ch1-5 CAR 13 Ch3 Ch7-8-9 Ch5-7-14 AK3 Ch3 Ch3-7 Ch3-7 CAR 1 Ch4 Ch10-11-12 Ch6-8-13 Ch7 Ch11-12 Ch3-7

Ch2 Ch3-10 Ch3-10 Ch3 Ch3-7 Ch3-7 CAR 15 Ch2 Ch4-5-6 Ch4-9-11 AK4 Ch4 Ch2-6 Ch2-6 CAR 14 Ch5 Ch13-14-15 Ch3-4-7-18 Ch5 Ch2-4 Ch2-4 CAR 8 Ch6 Ch16-17-18 Ch1-14-17 Ch6 Ch8-9 Ch8-9 Ch7 Ch11-12 Ch11-12

Ch4 Ch2-6 Ch2-6 CAR 5 Ch2 Ch4-5-6 Ch4-9-11 Ch5 Ch2-4 Ch2-4 AK5 Ch6 Ch8-9 Ch8-9 CAR 9 Ch6 Ch16-17-18 Ch1-14-17 Ch7 Ch11-12 Ch11-12

Nature Genetics: doi:10.1038/ng.3813