dealing with composition convergence to place plastids among … · 2013. 4. 16. ·...

Post on 28-Feb-2021

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dealing with composition convergence toplace plastids among Cyanobacteria

Blaise Li

Centro de Ciências do Mar, Universidade do Algarve, Portugal

Institut für Populationsgenetik - 18/04/2013

Blaise Li Plastids, Cyanobacteria and composition biases

The endosymbiotic origin of plastids

Cyanobacteria

Glaucophyta

green algae

red algae

land plants

chromalveolates. . .

euglenids

primary endosymbiosis

secondary endosymbiosis

secondary endosymbiosis

(after Keeling, 2010)Blaise Li Plastids, Cyanobacteria and composition biases

The endosymbiotic origins of plastids

There is, however, a large number of endosymbioticrelationships seemingly based on photosynthesis thatare less well understood and vary across the entirespectrum of integration, from passing associations tolong term and seemingly well-developed partnerships(e.g. Rumpho et al. 2008). Indeed, the line between

what is an organelle and what is an endosymbiont isan arbitrary one. There are a few different, specific cri-teria that have been argued to distinguish the two, themost common being the genetic integration of the twopartners, and the establishment of a protein-targetingsystem. Most photosynthetic endosymbionts probably

primary endosymbiosis

primary endosymbiosis

secondary endosymbiosis

secondary endosymbiosis

secondary endosymbiosis

serial secondary endosymbiosis

(green alga)

tertiary endosymbiosis(diatom)

stramenopiles

ciliates

Dinophysis

Lepididinium

euglenids

chlorarachniophytes

Paulinella

dinoflagellatesApicomplexa

green algae

Durinskia

Karlodinium

red algae

glaucophytes

tertiary endosymbiosis(cryptomonad)

tertiary endosymbiosis(haptophyte)

haptophytes

cryptomonads

land plants

?

Figure 2. (Caption opposite.)

732 P. J. Keeling Review. The origin and fate of plastids

Phil. Trans. R. Soc. B (2010)

on May 13, 2011rstb.royalsocietypublishing.orgDownloaded from

Blaise Li Plastids, Cyanobacteria and composition biases

The endosymbiotic origin of plastids

plastids

section I

section III section IV

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Old events are generally difficult to resolve:

I mutational saturationI changes in evolution modalitiesI enough time for divergences and convergences in these

modalities or in their consequences→ Simple evolutionary models may not be appropriate.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Old events are generally difficult to resolve:I mutational saturation

I changes in evolution modalitiesI enough time for divergences and convergences in these

modalities or in their consequences→ Simple evolutionary models may not be appropriate.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Old events are generally difficult to resolve:I mutational saturationI changes in evolution modalities

I enough time for divergences and convergences in thesemodalities or in their consequences

→ Simple evolutionary models may not be appropriate.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Old events are generally difficult to resolve:I mutational saturationI changes in evolution modalitiesI enough time for divergences and convergences in these

modalities or in their consequences

→ Simple evolutionary models may not be appropriate.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Old events are generally difficult to resolve:I mutational saturationI changes in evolution modalitiesI enough time for divergences and convergences in these

modalities or in their consequences→ Simple evolutionary models may not be appropriate.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Difficulty amplified because of endosymbiosis:

I simplificationI gene relocationI changes in biochemical context

→ sequences missing or with modified evolutionary trends(potentially misleading)

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Difficulty amplified because of endosymbiosis:I simplification

I gene relocationI changes in biochemical context

→ sequences missing or with modified evolutionary trends(potentially misleading)

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Difficulty amplified because of endosymbiosis:I simplificationI gene relocation

I changes in biochemical context→ sequences missing or with modified evolutionary trends(potentially misleading)

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Difficulty amplified because of endosymbiosis:I simplificationI gene relocationI changes in biochemical context

→ sequences missing or with modified evolutionary trends(potentially misleading)

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Difficulty amplified because of endosymbiosis:I simplificationI gene relocationI changes in biochemical context

→ sequences missing or with modified evolutionary trends(potentially misleading)

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Too straightforward analyses give conflicting results.

I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular

CyanobacteriaWhat is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastids

I protein coding gene data: plastids close to pluricellularCyanobacteria

What is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular

Cyanobacteria

What is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular

CyanobacteriaWhat is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.

Blaise Li Plastids, Cyanobacteria and composition biases

Phylogenetic difficulties

Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular

CyanobacteriaWhat is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:

I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:I NOST-1 (section IV)

I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)

I SPM-3, SO-6, GBACT, UNIT+ (section I)I 75 protein-coding genes, but 452 missing sequences (i.e.

14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I Analyses using RAxML

I GTR+I+Γ for nucleotidesI CPREV+I+Γ for amino-acidsI 200 bootstrap pseudo-replicates

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I Analyses using RAxMLI GTR+I+Γ for nucleotides

I CPREV+I+Γ for amino-acidsI 200 bootstrap pseudo-replicates

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I Analyses using RAxMLI GTR+I+Γ for nucleotidesI CPREV+I+Γ for amino-acids

I 200 bootstrap pseudo-replicates

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I Analyses using RAxMLI GTR+I+Γ for nucleotidesI CPREV+I+Γ for amino-acidsI 200 bootstrap pseudo-replicates

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

"basal" GBACT

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation

"basal" GBACT

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

pluricellulars

grade of Cyanobacteria

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation

"core"Cyanobacteria

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

pluricellulars

grade of Cyanobacteria

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation

"core"Cyanobacteria

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I cp75 is a direct translation of cg75

→ The trees should be the same.I But the analyses conflict in the identification of the

plastid sister-group.→ Something is not well modelled.

→ Can we have confidence in one of these trees?

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of theplastid sister-group.→ Something is not well modelled.

→ Can we have confidence in one of these trees?

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of theplastid sister-group.

→ Something is not well modelled.→ Can we have confidence in one of these trees?

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of theplastid sister-group.→ Something is not well modelled.

→ Can we have confidence in one of these trees?

Blaise Li Plastids, Cyanobacteria and composition biases

First ML bootstrap analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of theplastid sister-group.→ Something is not well modelled.

→ Can we have confidence in one of these trees?

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotides or amino-acids?

I Here, low bootstrap suggests conflicting signals fornucleotides

I Nucleotide sequences are more likely to randomize withtime

I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I Selection on protein function stabilizes the amino-acidsequence

I But estimation of substitution matrix is easier fornucleotides (less states)

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotides or amino-acids?

I Here, low bootstrap suggests conflicting signals fornucleotides

I Nucleotide sequences are more likely to randomize withtime

I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I Selection on protein function stabilizes the amino-acidsequence

I But estimation of substitution matrix is easier fornucleotides (less states)

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotides or amino-acids?

I Here, low bootstrap suggests conflicting signals fornucleotides

I Nucleotide sequences are more likely to randomize withtime

I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I Selection on protein function stabilizes the amino-acidsequence

I But estimation of substitution matrix is easier fornucleotides (less states)

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotides or amino-acids?

I Here, low bootstrap suggests conflicting signals fornucleotides

I Nucleotide sequences are more likely to randomize withtime

I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I Selection on protein function stabilizes the amino-acidsequence

I But estimation of substitution matrix is easier fornucleotides (less states)

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact:nucleotide composition attraction.

I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to

that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact:nucleotide composition attraction.

I mutation can be variously biased across taxa

I codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to

that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact:nucleotide composition attraction.

I mutation can be variously biased across taxaI codon preference also

I this influences the composition of the genomesI sites under lower selection constraint tend to conform to

that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact:nucleotide composition attraction.

I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomes

I sites under lower selection constraint tend to conform tothat composition

→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact:nucleotide composition attraction.

I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to

that composition

→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact:nucleotide composition attraction.

I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to

that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attractionT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Nucleotide composition attractionT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75Blaise Li Plastids, Cyanobacteria and composition biases

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75

3rd pos. G+C

Blaise Li Plastids, Cyanobacteria and composition biases

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75

1st pos. G+C

Blaise Li Plastids, Cyanobacteria and composition biases

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75

1st pos. G+C

ArgA bias

LeuT bias

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

I One frequent approach: removing 3rd codon positionswhen doing large-scale phylogeny

I A less frequent approach is to use a model thatacknowledges these composition bias differences

I will present a series of analyses starting from the applicationof the first approach to our nucleotide dataset.

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

I One frequent approach: removing 3rd codon positionswhen doing large-scale phylogeny

I A less frequent approach is to use a model thatacknowledges these composition bias differences

I will present a series of analyses starting from the applicationof the first approach to our nucleotide dataset.

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

I One frequent approach: removing 3rd codon positionswhen doing large-scale phylogeny

I A less frequent approach is to use a model thatacknowledges these composition bias differences

I will present a series of analyses starting from the applicationof the first approach to our nucleotide dataset.

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removalT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removalT C A G

T

TT-Phe

TC-

Ser

TA-Tyr

TG-Cys

TT- TC- TA- TG-TT-

LeuTC- TA-

TerTG- Ter

TT- TC- TA- TG- Trp

C

CT-

Leu

CC-

Pro

CA-His

CG-

ArgCT- CC- CA- CG-CT- CC- CA-

GlnCG-

CT- CC- CA- CG-

A

AT-Ile

AC-

Thr

AA-Asn

AG-Ser

AT- AC- AA- AG-AT- AC- AA-

LysAG-

ArgAT- Met AC- AA- AG-

G

GT-

Val

GC-

Ala

GA-Asp

GG-

GlyGT- GC- GA- GG-GT- GC- GA-

GluGG-

GT- GC- GA- GG-

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

SO-6

UNIT+

NOST-1

SPM-3

OSC-2

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

1.000.99

1.001.00

1.000.99

1.00

0.54

0.881.00

0.99

1.00

cg75_no3Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

I UNIT+ monophyly restored

I But some signal not corresponding to synonymoussubstitutions was lost

I This signal can be saved by recoding instead of removing

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

I UNIT+ monophyly restoredI But some signal not corresponding to synonymous

substitutions was lost

I This signal can be saved by recoding instead of removing

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position removal

I UNIT+ monophyly restoredI But some signal not corresponding to synonymous

substitutions was lostI This signal can be saved by recoding instead of removing

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position recodingT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position recodingT C A G

T

TTYPhe

TCN

Ser

TAYTyr

TGYCys

TTY TCN TAY TGYTTN

LeuTCN TAR

TerTGR Ter

TTN TCN TAR TGG Trp

C

CTN

Leu

CCN

Pro

CAYHis

CGN

ArgCTN CCN CAY CGNCTN CCN CAR

GlnCGN

CTN CCN CAR CGN

A

ATHIle

ACN

Thr

AAYAsn

AGNSer

ATH ACN AAY AGNATH ACN AAR

LysAGN

ArgATG Met ACN AAR AGN

G

GTN

Val

GCN

Ala

GAYAsp

GGN

GlyGTN GCN GAY GGNGTN GCN GAR

GluGGN

GTN GCN GAR GGN

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position recoding

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

SO-6

UNIT+

NOST-1

SPM-3

OSC-2

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

1.000.99

1.001.00

1.000.99

1.00

0.60

0.891.00

0.98

1.00

cg75_degen3

degenerate at 3rd pos.

(27.35% recoded)

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position recoding

I Similar effect as no3: UNIT+ monophyly restored

I But codon degeneracy exists at other positions, associatedwith switches between Leu, Arg and Ser families.→ We looked at the effect of this signal by selectivelyremoving it.

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position recoding

I Similar effect as no3: UNIT+ monophyly restoredI But codon degeneracy exists at other positions, associated

with switches between Leu, Arg and Ser families.

→ We looked at the effect of this signal by selectivelyremoving it.

Blaise Li Plastids, Cyanobacteria and composition biases

3rd codon position recoding

I Similar effect as no3: UNIT+ monophyly restoredI But codon degeneracy exists at other positions, associated

with switches between Leu, Arg and Ser families.→ We looked at the effect of this signal by selectivelyremoving it.

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positionsT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positionsT C A G

T

TTTPhe

WST

Ser

TATTyr

TGTCys

TTC WSC TAC TGCYTA

LeuWSA TAA

TerTGA Ter

YTG WSG TAG TGG Trp

C

YTT

Leu

CCT

Pro

CATHis

MGT

ArgYTC CCC CAC MGCYTA CCA CAA

GlnMGA

YTG CCG CAG MGG

A

ATTIle

ACT

Thr

AATAsn

WSTSer

ATC ACC AAC WSCATA ACA AAA

LysMGA

ArgATG Met ACG AAG MGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)

Acaryochloris_marina (UNIT+)Thermosynechococcus_elongatus (UNIT)Cyanothece_PCC7425 (UNIT+)SPM-3

NOST-1OSC-2

Prochlorococcus_marinus (SO-6)Rhodophyta

GlaucophytaStreptophytaChlorophyta

0.970.97

1.001.001.00

0.600.99

0.600.590.600.601.000.691.00

cg75_degenerate12

degenerate at 1st and 2nd pos.

(7.62% recoded)

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)

Acaryochloris_marina (UNIT+)Thermosynechococcus_elongatus (UNIT)Cyanothece_PCC7425 (UNIT+)SPM-3

NOST-1OSC-2

Prochlorococcus_marinus (SO-6)Rhodophyta

GlaucophytaStreptophytaChlorophyta

0.970.97

1.001.001.00

0.600.99

0.600.590.600.601.000.691.00

cg75_degenerate12

degenerate at 1st and 2nd pos.

(7.62% recoded)

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

I UNIT+ monophyly not restored.

I SO-6 split.→ Is the removed signal actually useful? (more later)

I Lower supports suggest conflicting signals.(Prochlorococcus and Rhodophyta misplacement)

→ Let’s try to neutralize all synonymous substitutions. . .

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

I UNIT+ monophyly not restored.I SO-6 split.

→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.

(Prochlorococcus and Rhodophyta misplacement)→ Let’s try to neutralize all synonymous substitutions. . .

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

I UNIT+ monophyly not restored.I SO-6 split.

→ Is the removed signal actually useful? (more later)

I Lower supports suggest conflicting signals.(Prochlorococcus and Rhodophyta misplacement)

→ Let’s try to neutralize all synonymous substitutions. . .

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

I UNIT+ monophyly not restored.I SO-6 split.

→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.

(Prochlorococcus and Rhodophyta misplacement)→ Let’s try to neutralize all synonymous substitutions. . .

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

I UNIT+ monophyly not restored.I SO-6 split.

→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.

(Prochlorococcus and Rhodophyta misplacement)

→ Let’s try to neutralize all synonymous substitutions. . .

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating 1st and 2nd codon positions

I UNIT+ monophyly not restored.I SO-6 split.

→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.

(Prochlorococcus and Rhodophyta misplacement)→ Let’s try to neutralize all synonymous substitutions. . .

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positionsT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positionsT C A G

T

TTYPhe

WSN

Ser

TAYTyr

TGYCys

TTY WSN TAY TGYYTN

LeuWSN TAR

TerTGR Ter

YTN WSN TAR TGG Trp

C

YTN

Leu

CCN

Pro

CAYHis

MGN

ArgYTN CCN CAY MGNYTN CCN CAR

GlnMGN

YTN CCN CAR MGN

A

ATHIle

ACN

Thr

AAYAsn

WSNSer

ATH ACN AAY WSNATH ACN AAR

LysMGN

ArgATG Met ACN AAR MGN

G

GTN

Val

GCN

Ala

GAYAsp

GGN

GlyGTN GCN GAY GGNGTN GCN GAR

GluGGN

GTN GCN GAR GGN

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positions

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.80

1.00

0.980.64

0.75

0.59

cg75_degenBlaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positions

I Core Cyanobacteria sister to plastids, like when usingamino-acids

I 1st and 2nd position signal actually contributes tocomposition attraction.(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)

I What happens? Is it "good" or "bad" signal?

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positions

I Core Cyanobacteria sister to plastids, like when usingamino-acids

I 1st and 2nd position signal actually contributes tocomposition attraction.

(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)

I What happens? Is it "good" or "bad" signal?

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positions

I Core Cyanobacteria sister to plastids, like when usingamino-acids

I 1st and 2nd position signal actually contributes tocomposition attraction.(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)

I What happens? Is it "good" or "bad" signal?

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all synonymous codon positions

I Core Cyanobacteria sister to plastids, like when usingamino-acids

I 1st and 2nd position signal actually contributes tocomposition attraction.(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)

I What happens? Is it "good" or "bad" signal?

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

Serine synonymy is different from other synonymies

I AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)

Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)

Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)

Either a double simultaneous substitution, either a non-Serintermediate

→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)

Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions

and therefore may containuseful signal

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)

Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2T C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2T C A G

T

TTTPhe

WST

Ser

TATTyr

TGTCys

TTC WSC TAC TGCTTA

LeuWSA TAA

TerTGA Ter

TTG WSG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

WSTSer

ATC ACC AAC WSCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)

Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2Prochlorococcus_marinus (SO-6)

RhodophytaGlaucophyta

StreptophytaChlorophyta

0.990.99

1.001.001.00

0.90

0.92

0.97

0.910.920.910.920.920.780.821.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×−+ ×

−+×

−+ ×

cg75_degen12SBlaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)

Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2Prochlorococcus_marinus (SO-6)

RhodophytaGlaucophyta

StreptophytaChlorophyta

0.990.99

1.001.001.00

0.90

0.92

0.97

0.910.920.910.920.920.780.821.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×−+ ×

−+×

−+ ×

cg75_degen12S

3rd pos. G+C

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)

Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2Prochlorococcus_marinus (SO-6)

RhodophytaGlaucophyta

StreptophytaChlorophyta

0.990.99

1.001.001.00

0.90

0.92

0.97

0.910.920.910.920.920.780.821.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×−+ ×

−+×

−+ ×

cg75_degen12S

ArgA bias

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating serine codon positions 1 and 2

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)

Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2Prochlorococcus_marinus (SO-6)

RhodophytaGlaucophyta

StreptophytaChlorophyta

0.990.99

1.001.001.00

0.90

0.92

0.97

0.910.920.910.920.920.780.821.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×−+ ×

−+×

−+ ×

cg75_degen12S

1st pos. G+CLeuT bias

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.

Two hypotheses:1. significant historical signal2. important but conflicting misleading signal

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.Two hypotheses:

1. significant historical signal2. important but conflicting misleading signal

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.Two hypotheses:1. significant historical signal

2. important but conflicting misleading signal

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.Two hypotheses:1. significant historical signal2. important but conflicting misleading signal

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates.

Is it the case here?I Recoding of the data in 23 aminoacids (distinct states for

the families of Leu, Arg and Ser)I MCMC under a GTR+I+Γ model, topology fixed to the

one obtained with the normal amino-acid dataI Inferred substitution matrix: highest rates are ArgA ↔

ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?

I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)

I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data

I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.

→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?

I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)

I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data

I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.

→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?

I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)

I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data

I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.

→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?

I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)

I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data

I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.

→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?

I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)

I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data

I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.

→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.

I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected

I A vs. T at first positionI G vs. C at second position

Which groupings of taxa may be favoured by such biases?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.

I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected

I A vs. T at first positionI G vs. C at second position

Which groupings of taxa may be favoured by such biases?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.

I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected

I A vs. T at first position

I G vs. C at second positionWhich groupings of taxa may be favoured by such biases?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.

I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected

I A vs. T at first positionI G vs. C at second position

Which groupings of taxa may be favoured by such biases?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.

I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected

I A vs. T at first positionI G vs. C at second position

Which groupings of taxa may be favoured by such biases?

Blaise Li Plastids, Cyanobacteria and composition biases

Composition at position 1

Trichodesmium erythraeum (OSC-2)

Prochlorococcus marinus (SO-6)

AA+T

GC+G

0.600

0.625

0.650

0.675

0.525 0.550 0.575 0.600 0.625 0.650 0.675

Blaise Li Plastids, Cyanobacteria and composition biases

Composition at position 2

Trichodesmium erythraeum (OSC-2)Prochlorococcus marinus (SO-6)

AA+T

GC+G

0.425

0.435

0.445

0.455

0.465

0.475

0.435 0.445 0.455 0.465 0.475 0.485 0.495

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.

I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.

I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.

→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.

I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.

I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.

→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.

I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.

I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.

→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?

Blaise Li Plastids, Cyanobacteria and composition biases

Do SerAG ↔ SerTC bring accurate information?

I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.

I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.

I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.

→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2T C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2T C A G

T

TTYPhe

TCN

Ser

TAYTyr

TGYCys

TTY TCN TAY TGYYTN

LeuTCN TAR

TerTGR Ter

YTN TCN TAR TGG Trp

C

YTN

Leu

CCN

Pro

CAYHis

MGN

ArgYTN CCN CAY MGNYTN CCN CAR

GlnMGN

YTN CCN CAR MGN

A

ATHIle

ACN

Thr

AAYAsn

AGNSer

ATH ACN AAY AGNATH ACN AAR

LysMGN

ArgATG Met ACN AAR MGN

G

GTN

Val

GCN

Ala

GAYAsp

GGN

GlyGTN GCN GAY GGNGTN GCN GAR

GluGGN

GTN GCN GAR GGN

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.80

1.00

0.980.64

0.75

0.59

cg75_degen

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

NOST-1

SPM-3

OSC-2

SO-6

1.001.00

1.00

1.00

1.00

1.000.83

1.00

0.950.72

0.53

cg75_degenLR3Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2

I Not much change

I Signal associated to switches between Ser families has amitigating effect on artefacts associated with G+Ccomposition biases.

I But no visible effect on the topology when combined withdata already purged from potentially misleading signal

I (In the present study...)

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2

I Not much changeI Signal associated to switches between Ser families has a

mitigating effect on artefacts associated with G+Ccomposition biases.

I But no visible effect on the topology when combined withdata already purged from potentially misleading signal

I (In the present study...)

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2

I Not much changeI Signal associated to switches between Ser families has a

mitigating effect on artefacts associated with G+Ccomposition biases.

I But no visible effect on the topology when combined withdata already purged from potentially misleading signal

I (In the present study...)

Blaise Li Plastids, Cyanobacteria and composition biases

Degenerating all but Ser codon positions 1 and 2

I Not much changeI Signal associated to switches between Ser families has a

mitigating effect on artefacts associated with G+Ccomposition biases.

I But no visible effect on the topology when combined withdata already purged from potentially misleading signal

I (In the present study...)

Blaise Li Plastids, Cyanobacteria and composition biases

Conclusions

I Incongruence between nucleotide and amino-acid datamainly due to G+C convergence biases. It is likely thatplastids diverged early from the Cyanobacteria.

I rDNA have direct selective contraints ont their sequence,hence the results similar to amino-acid data.

I Codon-degeneracy recoding permits the removal ofmisleading signal while retaining a part of the signal notpresent at the amino-acid level. This could lead to moreaccurate results.

Blaise Li Plastids, Cyanobacteria and composition biases

Conclusions

I Incongruence between nucleotide and amino-acid datamainly due to G+C convergence biases. It is likely thatplastids diverged early from the Cyanobacteria.

I rDNA have direct selective contraints ont their sequence,hence the results similar to amino-acid data.

I Codon-degeneracy recoding permits the removal ofmisleading signal while retaining a part of the signal notpresent at the amino-acid level. This could lead to moreaccurate results.

Blaise Li Plastids, Cyanobacteria and composition biases

Conclusions

I Incongruence between nucleotide and amino-acid datamainly due to G+C convergence biases. It is likely thatplastids diverged early from the Cyanobacteria.

I rDNA have direct selective contraints ont their sequence,hence the results similar to amino-acid data.

I Codon-degeneracy recoding permits the removal ofmisleading signal while retaining a part of the signal notpresent at the amino-acid level. This could lead to moreaccurate results.

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I NDCH model: composition can be different at differentnodes of the tree.

I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.

I And so less compostition convergence artefacts areexpected.

I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I NDCH model: composition can be different at differentnodes of the tree.

I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.

I And so less compostition convergence artefacts areexpected.

I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I NDCH model: composition can be different at differentnodes of the tree.

I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.

I And so less compostition convergence artefacts areexpected.

I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I NDCH model: composition can be different at differentnodes of the tree.

I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.

I And so less compostition convergence artefacts areexpected.

I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Proteobacteria

Chlorobi

Chloroflexi

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

SO-6

UNIT+

NOST-1

SPM-3

OSC-2

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

1.000.99

1.001.00

1.001.00

1.00

0.99

1.001.00

1.00

1.00

cg75_p4CV2Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I Similar effect as no3 and degen3: UNIT+ monophylyrestored

High support because Bayesian posterior probabilities, notbootstrap supports.

I 2 composition vectors corresponding to high and lowG+C %

I Why not as efficient as full degeneracy?

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I Similar effect as no3 and degen3: UNIT+ monophylyrestoredHigh support because Bayesian posterior probabilities, notbootstrap supports.

I 2 composition vectors corresponding to high and lowG+C %

I Why not as efficient as full degeneracy?

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I Similar effect as no3 and degen3: UNIT+ monophylyrestoredHigh support because Bayesian posterior probabilities, notbootstrap supports.

I 2 composition vectors corresponding to high and lowG+C %

I Why not as efficient as full degeneracy?

Blaise Li Plastids, Cyanobacteria and composition biases

Modelling composition variations

I Similar effect as no3 and degen3: UNIT+ monophylyrestoredHigh support because Bayesian posterior probabilities, notbootstrap supports.

I 2 composition vectors corresponding to high and lowG+C %

I Why not as efficient as full degeneracy?

Blaise Li Plastids, Cyanobacteria and composition biases

top related