dealing with composition convergence to place plastids among … · 2013. 4. 16. ·...
TRANSCRIPT
Dealing with composition convergence toplace plastids among Cyanobacteria
Blaise Li
Centro de Ciências do Mar, Universidade do Algarve, Portugal
Institut für Populationsgenetik - 18/04/2013
Blaise Li Plastids, Cyanobacteria and composition biases
The endosymbiotic origin of plastids
Cyanobacteria
Glaucophyta
green algae
red algae
land plants
chromalveolates. . .
euglenids
primary endosymbiosis
secondary endosymbiosis
secondary endosymbiosis
(after Keeling, 2010)Blaise Li Plastids, Cyanobacteria and composition biases
The endosymbiotic origins of plastids
There is, however, a large number of endosymbioticrelationships seemingly based on photosynthesis thatare less well understood and vary across the entirespectrum of integration, from passing associations tolong term and seemingly well-developed partnerships(e.g. Rumpho et al. 2008). Indeed, the line between
what is an organelle and what is an endosymbiont isan arbitrary one. There are a few different, specific cri-teria that have been argued to distinguish the two, themost common being the genetic integration of the twopartners, and the establishment of a protein-targetingsystem. Most photosynthetic endosymbionts probably
primary endosymbiosis
primary endosymbiosis
secondary endosymbiosis
secondary endosymbiosis
secondary endosymbiosis
serial secondary endosymbiosis
(green alga)
tertiary endosymbiosis(diatom)
stramenopiles
ciliates
Dinophysis
Lepididinium
euglenids
chlorarachniophytes
Paulinella
dinoflagellatesApicomplexa
green algae
Durinskia
Karlodinium
red algae
glaucophytes
tertiary endosymbiosis(cryptomonad)
tertiary endosymbiosis(haptophyte)
haptophytes
cryptomonads
land plants
?
Figure 2. (Caption opposite.)
732 P. J. Keeling Review. The origin and fate of plastids
Phil. Trans. R. Soc. B (2010)
on May 13, 2011rstb.royalsocietypublishing.orgDownloaded from
Blaise Li Plastids, Cyanobacteria and composition biases
The endosymbiotic origin of plastids
plastids
section I
section III section IV
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Old events are generally difficult to resolve:
I mutational saturationI changes in evolution modalitiesI enough time for divergences and convergences in these
modalities or in their consequences→ Simple evolutionary models may not be appropriate.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Old events are generally difficult to resolve:I mutational saturation
I changes in evolution modalitiesI enough time for divergences and convergences in these
modalities or in their consequences→ Simple evolutionary models may not be appropriate.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Old events are generally difficult to resolve:I mutational saturationI changes in evolution modalities
I enough time for divergences and convergences in thesemodalities or in their consequences
→ Simple evolutionary models may not be appropriate.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Old events are generally difficult to resolve:I mutational saturationI changes in evolution modalitiesI enough time for divergences and convergences in these
modalities or in their consequences
→ Simple evolutionary models may not be appropriate.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Old events are generally difficult to resolve:I mutational saturationI changes in evolution modalitiesI enough time for divergences and convergences in these
modalities or in their consequences→ Simple evolutionary models may not be appropriate.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Difficulty amplified because of endosymbiosis:
I simplificationI gene relocationI changes in biochemical context
→ sequences missing or with modified evolutionary trends(potentially misleading)
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Difficulty amplified because of endosymbiosis:I simplification
I gene relocationI changes in biochemical context
→ sequences missing or with modified evolutionary trends(potentially misleading)
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Difficulty amplified because of endosymbiosis:I simplificationI gene relocation
I changes in biochemical context→ sequences missing or with modified evolutionary trends(potentially misleading)
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Difficulty amplified because of endosymbiosis:I simplificationI gene relocationI changes in biochemical context
→ sequences missing or with modified evolutionary trends(potentially misleading)
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Difficulty amplified because of endosymbiosis:I simplificationI gene relocationI changes in biochemical context
→ sequences missing or with modified evolutionary trends(potentially misleading)
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Too straightforward analyses give conflicting results.
I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular
CyanobacteriaWhat is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastids
I protein coding gene data: plastids close to pluricellularCyanobacteria
What is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular
Cyanobacteria
What is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular
CyanobacteriaWhat is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.
Blaise Li Plastids, Cyanobacteria and composition biases
Phylogenetic difficulties
Too straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein coding gene data: plastids close to pluricellular
CyanobacteriaWhat is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein codinggenes from plastids (or relocated in the plant host nucleus)and their (cyano)bacterial homologues.
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:
I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:I NOST-1 (section IV)
I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)
I SPM-3, SO-6, GBACT, UNIT+ (section I)I 75 protein-coding genes, but 452 missing sequences (i.e.
14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (greenplants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes, but 452 missing sequences (i.e.14% overall, and up to 38 genes missing for one of theoutgroup taxa)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I Analyses using RAxML
I GTR+I+Γ for nucleotidesI CPREV+I+Γ for amino-acidsI 200 bootstrap pseudo-replicates
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I Analyses using RAxMLI GTR+I+Γ for nucleotides
I CPREV+I+Γ for amino-acidsI 200 bootstrap pseudo-replicates
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I Analyses using RAxMLI GTR+I+Γ for nucleotidesI CPREV+I+Γ for amino-acids
I 200 bootstrap pseudo-replicates
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I Analyses using RAxMLI GTR+I+Γ for nucleotidesI CPREV+I+Γ for amino-acidsI 200 bootstrap pseudo-replicates
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
"basal" GBACT
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation
"basal" GBACT
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
pluricellulars
grade of Cyanobacteria
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation
"core"Cyanobacteria
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
pluricellulars
grade of Cyanobacteria
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation
"core"Cyanobacteria
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I cp75 is a direct translation of cg75
→ The trees should be the same.I But the analyses conflict in the identification of the
plastid sister-group.→ Something is not well modelled.
→ Can we have confidence in one of these trees?
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of theplastid sister-group.→ Something is not well modelled.
→ Can we have confidence in one of these trees?
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of theplastid sister-group.
→ Something is not well modelled.→ Can we have confidence in one of these trees?
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of theplastid sister-group.→ Something is not well modelled.
→ Can we have confidence in one of these trees?
Blaise Li Plastids, Cyanobacteria and composition biases
First ML bootstrap analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of theplastid sister-group.→ Something is not well modelled.
→ Can we have confidence in one of these trees?
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotides or amino-acids?
I Here, low bootstrap suggests conflicting signals fornucleotides
I Nucleotide sequences are more likely to randomize withtime
I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I Selection on protein function stabilizes the amino-acidsequence
I But estimation of substitution matrix is easier fornucleotides (less states)
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotides or amino-acids?
I Here, low bootstrap suggests conflicting signals fornucleotides
I Nucleotide sequences are more likely to randomize withtime
I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I Selection on protein function stabilizes the amino-acidsequence
I But estimation of substitution matrix is easier fornucleotides (less states)
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotides or amino-acids?
I Here, low bootstrap suggests conflicting signals fornucleotides
I Nucleotide sequences are more likely to randomize withtime
I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I Selection on protein function stabilizes the amino-acidsequence
I But estimation of substitution matrix is easier fornucleotides (less states)
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotides or amino-acids?
I Here, low bootstrap suggests conflicting signals fornucleotides
I Nucleotide sequences are more likely to randomize withtime
I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I Selection on protein function stabilizes the amino-acidsequence
I But estimation of substitution matrix is easier fornucleotides (less states)
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact:nucleotide composition attraction.
I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to
that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact:nucleotide composition attraction.
I mutation can be variously biased across taxa
I codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to
that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact:nucleotide composition attraction.
I mutation can be variously biased across taxaI codon preference also
I this influences the composition of the genomesI sites under lower selection constraint tend to conform to
that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact:nucleotide composition attraction.
I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomes
I sites under lower selection constraint tend to conform tothat composition
→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact:nucleotide composition attraction.
I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to
that composition
→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact:nucleotide composition attraction.
I mutation can be variously biased across taxaI codon preference alsoI this influences the composition of the genomesI sites under lower selection constraint tend to conform to
that composition→ similar mutation biases and codon preferences may induceconvergence in the nucleotide sequence, especially at 3rdcodon position
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attractionT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Nucleotide composition attractionT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75Blaise Li Plastids, Cyanobacteria and composition biases
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75
3rd pos. G+C
Blaise Li Plastids, Cyanobacteria and composition biases
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75
1st pos. G+C
Blaise Li Plastids, Cyanobacteria and composition biases
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75
1st pos. G+C
ArgA bias
LeuT bias
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
I One frequent approach: removing 3rd codon positionswhen doing large-scale phylogeny
I A less frequent approach is to use a model thatacknowledges these composition bias differences
I will present a series of analyses starting from the applicationof the first approach to our nucleotide dataset.
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
I One frequent approach: removing 3rd codon positionswhen doing large-scale phylogeny
I A less frequent approach is to use a model thatacknowledges these composition bias differences
I will present a series of analyses starting from the applicationof the first approach to our nucleotide dataset.
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
I One frequent approach: removing 3rd codon positionswhen doing large-scale phylogeny
I A less frequent approach is to use a model thatacknowledges these composition bias differences
I will present a series of analyses starting from the applicationof the first approach to our nucleotide dataset.
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removalT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removalT C A G
T
TT-Phe
TC-
Ser
TA-Tyr
TG-Cys
TT- TC- TA- TG-TT-
LeuTC- TA-
TerTG- Ter
TT- TC- TA- TG- Trp
C
CT-
Leu
CC-
Pro
CA-His
CG-
ArgCT- CC- CA- CG-CT- CC- CA-
GlnCG-
CT- CC- CA- CG-
A
AT-Ile
AC-
Thr
AA-Asn
AG-Ser
AT- AC- AA- AG-AT- AC- AA-
LysAG-
ArgAT- Met AC- AA- AG-
G
GT-
Val
GC-
Ala
GA-Asp
GG-
GlyGT- GC- GA- GG-GT- GC- GA-
GluGG-
GT- GC- GA- GG-
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
SO-6
UNIT+
NOST-1
SPM-3
OSC-2
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
1.000.99
1.001.00
1.000.99
1.00
0.54
0.881.00
0.99
1.00
cg75_no3Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
I UNIT+ monophyly restored
I But some signal not corresponding to synonymoussubstitutions was lost
I This signal can be saved by recoding instead of removing
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
I UNIT+ monophyly restoredI But some signal not corresponding to synonymous
substitutions was lost
I This signal can be saved by recoding instead of removing
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position removal
I UNIT+ monophyly restoredI But some signal not corresponding to synonymous
substitutions was lostI This signal can be saved by recoding instead of removing
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position recodingT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position recodingT C A G
T
TTYPhe
TCN
Ser
TAYTyr
TGYCys
TTY TCN TAY TGYTTN
LeuTCN TAR
TerTGR Ter
TTN TCN TAR TGG Trp
C
CTN
Leu
CCN
Pro
CAYHis
CGN
ArgCTN CCN CAY CGNCTN CCN CAR
GlnCGN
CTN CCN CAR CGN
A
ATHIle
ACN
Thr
AAYAsn
AGNSer
ATH ACN AAY AGNATH ACN AAR
LysAGN
ArgATG Met ACN AAR AGN
G
GTN
Val
GCN
Ala
GAYAsp
GGN
GlyGTN GCN GAY GGNGTN GCN GAR
GluGGN
GTN GCN GAR GGN
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position recoding
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
SO-6
UNIT+
NOST-1
SPM-3
OSC-2
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
1.000.99
1.001.00
1.000.99
1.00
0.60
0.891.00
0.98
1.00
cg75_degen3
degenerate at 3rd pos.
(27.35% recoded)
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position recoding
I Similar effect as no3: UNIT+ monophyly restored
I But codon degeneracy exists at other positions, associatedwith switches between Leu, Arg and Ser families.→ We looked at the effect of this signal by selectivelyremoving it.
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position recoding
I Similar effect as no3: UNIT+ monophyly restoredI But codon degeneracy exists at other positions, associated
with switches between Leu, Arg and Ser families.
→ We looked at the effect of this signal by selectivelyremoving it.
Blaise Li Plastids, Cyanobacteria and composition biases
3rd codon position recoding
I Similar effect as no3: UNIT+ monophyly restoredI But codon degeneracy exists at other positions, associated
with switches between Leu, Arg and Ser families.→ We looked at the effect of this signal by selectivelyremoving it.
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positionsT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positionsT C A G
T
TTTPhe
WST
Ser
TATTyr
TGTCys
TTC WSC TAC TGCYTA
LeuWSA TAA
TerTGA Ter
YTG WSG TAG TGG Trp
C
YTT
Leu
CCT
Pro
CATHis
MGT
ArgYTC CCC CAC MGCYTA CCA CAA
GlnMGA
YTG CCG CAG MGG
A
ATTIle
ACT
Thr
AATAsn
WSTSer
ATC ACC AAC WSCATA ACA AAA
LysMGA
ArgATG Met ACG AAG MGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)
Acaryochloris_marina (UNIT+)Thermosynechococcus_elongatus (UNIT)Cyanothece_PCC7425 (UNIT+)SPM-3
NOST-1OSC-2
Prochlorococcus_marinus (SO-6)Rhodophyta
GlaucophytaStreptophytaChlorophyta
0.970.97
1.001.001.00
0.600.99
0.600.590.600.601.000.691.00
cg75_degenerate12
degenerate at 1st and 2nd pos.
(7.62% recoded)
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)
Acaryochloris_marina (UNIT+)Thermosynechococcus_elongatus (UNIT)Cyanothece_PCC7425 (UNIT+)SPM-3
NOST-1OSC-2
Prochlorococcus_marinus (SO-6)Rhodophyta
GlaucophytaStreptophytaChlorophyta
0.970.97
1.001.001.00
0.600.99
0.600.590.600.601.000.691.00
cg75_degenerate12
degenerate at 1st and 2nd pos.
(7.62% recoded)
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
I UNIT+ monophyly not restored.
I SO-6 split.→ Is the removed signal actually useful? (more later)
I Lower supports suggest conflicting signals.(Prochlorococcus and Rhodophyta misplacement)
→ Let’s try to neutralize all synonymous substitutions. . .
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
I UNIT+ monophyly not restored.I SO-6 split.
→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.
(Prochlorococcus and Rhodophyta misplacement)→ Let’s try to neutralize all synonymous substitutions. . .
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
I UNIT+ monophyly not restored.I SO-6 split.
→ Is the removed signal actually useful? (more later)
I Lower supports suggest conflicting signals.(Prochlorococcus and Rhodophyta misplacement)
→ Let’s try to neutralize all synonymous substitutions. . .
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
I UNIT+ monophyly not restored.I SO-6 split.
→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.
(Prochlorococcus and Rhodophyta misplacement)→ Let’s try to neutralize all synonymous substitutions. . .
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
I UNIT+ monophyly not restored.I SO-6 split.
→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.
(Prochlorococcus and Rhodophyta misplacement)
→ Let’s try to neutralize all synonymous substitutions. . .
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating 1st and 2nd codon positions
I UNIT+ monophyly not restored.I SO-6 split.
→ Is the removed signal actually useful? (more later)I Lower supports suggest conflicting signals.
(Prochlorococcus and Rhodophyta misplacement)→ Let’s try to neutralize all synonymous substitutions. . .
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positionsT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positionsT C A G
T
TTYPhe
WSN
Ser
TAYTyr
TGYCys
TTY WSN TAY TGYYTN
LeuWSN TAR
TerTGR Ter
YTN WSN TAR TGG Trp
C
YTN
Leu
CCN
Pro
CAYHis
MGN
ArgYTN CCN CAY MGNYTN CCN CAR
GlnMGN
YTN CCN CAR MGN
A
ATHIle
ACN
Thr
AAYAsn
WSNSer
ATH ACN AAY WSNATH ACN AAR
LysMGN
ArgATG Met ACN AAR MGN
G
GTN
Val
GCN
Ala
GAYAsp
GGN
GlyGTN GCN GAY GGNGTN GCN GAR
GluGGN
GTN GCN GAR GGN
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positions
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.80
1.00
0.980.64
0.75
0.59
cg75_degenBlaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positions
I Core Cyanobacteria sister to plastids, like when usingamino-acids
I 1st and 2nd position signal actually contributes tocomposition attraction.(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)
I What happens? Is it "good" or "bad" signal?
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positions
I Core Cyanobacteria sister to plastids, like when usingamino-acids
I 1st and 2nd position signal actually contributes tocomposition attraction.
(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)
I What happens? Is it "good" or "bad" signal?
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positions
I Core Cyanobacteria sister to plastids, like when usingamino-acids
I 1st and 2nd position signal actually contributes tocomposition attraction.(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)
I What happens? Is it "good" or "bad" signal?
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all synonymous codon positions
I Core Cyanobacteria sister to plastids, like when usingamino-acids
I 1st and 2nd position signal actually contributes tocomposition attraction.(It’s neutralization helps when 3rd position synonymoussignal is also neutralized.)
I What happens? Is it "good" or "bad" signal?
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
Serine synonymy is different from other synonymies
I AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)
Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)
Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)
Either a double simultaneous substitution, either a non-Serintermediate
→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)
Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions
and therefore may containuseful signal
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
Serine synonymy is different from other synonymiesI AGY (Ser) ↔ ACY (Thr) ↔ TCY (Ser)I AGY (Ser) ↔ TGY (Cys) ↔ TCY (Ser)
Either a double simultaneous substitution, either a non-Serintermediate→ might lend itself less to composition convergence thanother synonymous substitutions and therefore may containuseful signal
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2T C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2T C A G
T
TTTPhe
WST
Ser
TATTyr
TGTCys
TTC WSC TAC TGCTTA
LeuWSA TAA
TerTGA Ter
TTG WSG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
WSTSer
ATC ACC AAC WSCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)
Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2Prochlorococcus_marinus (SO-6)
RhodophytaGlaucophyta
StreptophytaChlorophyta
0.990.99
1.001.001.00
0.90
0.92
0.97
0.910.920.910.920.920.780.821.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×−+ ×
−+×
−+ ×
cg75_degen12SBlaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)
Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2Prochlorococcus_marinus (SO-6)
RhodophytaGlaucophyta
StreptophytaChlorophyta
0.990.99
1.001.001.00
0.90
0.92
0.97
0.910.920.910.920.920.780.821.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×−+ ×
−+×
−+ ×
cg75_degen12S
3rd pos. G+C
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)
Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2Prochlorococcus_marinus (SO-6)
RhodophytaGlaucophyta
StreptophytaChlorophyta
0.990.99
1.001.001.00
0.90
0.92
0.97
0.910.920.910.920.920.780.821.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×−+ ×
−+×
−+ ×
cg75_degen12S
ArgA bias
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating serine codon positions 1 and 2
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
Synechococcus_elongatus (SO-6)Synechococcus_RCC307 (SO-6)
Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2Prochlorococcus_marinus (SO-6)
RhodophytaGlaucophyta
StreptophytaChlorophyta
0.990.99
1.001.001.00
0.90
0.92
0.97
0.910.920.910.920.920.780.821.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×−+ ×
−+×
−+ ×
cg75_degen12S
1st pos. G+CLeuT bias
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.
Two hypotheses:1. significant historical signal2. important but conflicting misleading signal
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.Two hypotheses:
1. significant historical signal2. important but conflicting misleading signal
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.Two hypotheses:1. significant historical signal
2. important but conflicting misleading signal
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
When SerAG ↔ SerTC is removed, composition biases at thirdand first codon positions seem to lead to more artefacts.Two hypotheses:1. significant historical signal2. important but conflicting misleading signal
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates.
Is it the case here?I Recoding of the data in 23 aminoacids (distinct states for
the families of Leu, Arg and Ser)I MCMC under a GTR+I+Γ model, topology fixed to the
one obtained with the normal amino-acid dataI Inferred substitution matrix: highest rates are ArgA ↔
ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?
I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)
I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data
I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.
→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?
I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)
I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data
I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.
→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?
I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)
I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data
I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.
→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?
I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)
I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data
I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.
→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The literature says that SerAG ↔ SerTC occurs easily throughThr and Cys intermediates. Is it the case here?
I Recoding of the data in 23 aminoacids (distinct states forthe families of Leu, Arg and Ser)
I MCMC under a GTR+I+Γ model, topology fixed to theone obtained with the normal amino-acid data
I Inferred substitution matrix: highest rates are ArgA ↔ArgC, LeuC ↔ LeuT, and SerAG ↔ SerTC.
→ SerAG ↔ SerTC is more frequent than any non-synonymoussubstitution.
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.
I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected
I A vs. T at first positionI G vs. C at second position
Which groupings of taxa may be favoured by such biases?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.
I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected
I A vs. T at first positionI G vs. C at second position
Which groupings of taxa may be favoured by such biases?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.
I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected
I A vs. T at first position
I G vs. C at second positionWhich groupings of taxa may be favoured by such biases?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.
I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected
I A vs. T at first positionI G vs. C at second position
Which groupings of taxa may be favoured by such biases?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
The relatively high frequency of SerAG ↔ SerTC may beassociated with composition biases.
I AG vs. TC at positions 1 and 2→ no correlation with global G+C % expected
I A vs. T at first positionI G vs. C at second position
Which groupings of taxa may be favoured by such biases?
Blaise Li Plastids, Cyanobacteria and composition biases
Composition at position 1
Trichodesmium erythraeum (OSC-2)
Prochlorococcus marinus (SO-6)
AA+T
GC+G
0.600
0.625
0.650
0.675
0.525 0.550 0.575 0.600 0.625 0.650 0.675
Blaise Li Plastids, Cyanobacteria and composition biases
Composition at position 2
Trichodesmium erythraeum (OSC-2)Prochlorococcus marinus (SO-6)
AA+T
GC+G
0.425
0.435
0.445
0.455
0.465
0.475
0.435 0.445 0.455 0.465 0.475 0.485 0.495
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.
I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.
I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.
→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.
I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.
I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.
→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.
I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.
I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.
→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?
Blaise Li Plastids, Cyanobacteria and composition biases
Do SerAG ↔ SerTC bring accurate information?
I Prochlorococcus (SO-6) and Trichodesmium (OSC-2) arelikely attracted to plastids because of low G+C % at 1stand 3rd position.
I But they do not stand out when it comes to biasespossibly associated with Ser codon degeneracy at 1st and2nd position.
I So signal associated to these position will not reinforcetheir tendency to be artefactually placed, and may evencontribute to placing them at more correct positions.
→ What happens if we remove all synonymy-associated signalexcept at first and third positions of Ser codon?
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2T C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2T C A G
T
TTYPhe
TCN
Ser
TAYTyr
TGYCys
TTY TCN TAY TGYYTN
LeuTCN TAR
TerTGR Ter
YTN TCN TAR TGG Trp
C
YTN
Leu
CCN
Pro
CAYHis
MGN
ArgYTN CCN CAY MGNYTN CCN CAR
GlnMGN
YTN CCN CAR MGN
A
ATHIle
ACN
Thr
AAYAsn
AGNSer
ATH ACN AAY AGNATH ACN AAR
LysMGN
ArgATG Met ACN AAR MGN
G
GTN
Val
GCN
Ala
GAYAsp
GGN
GlyGTN GCN GAY GGNGTN GCN GAR
GluGGN
GTN GCN GAR GGN
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.80
1.00
0.980.64
0.75
0.59
cg75_degen
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
NOST-1
SPM-3
OSC-2
SO-6
1.001.00
1.00
1.00
1.00
1.000.83
1.00
0.950.72
0.53
cg75_degenLR3Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2
I Not much change
I Signal associated to switches between Ser families has amitigating effect on artefacts associated with G+Ccomposition biases.
I But no visible effect on the topology when combined withdata already purged from potentially misleading signal
I (In the present study...)
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2
I Not much changeI Signal associated to switches between Ser families has a
mitigating effect on artefacts associated with G+Ccomposition biases.
I But no visible effect on the topology when combined withdata already purged from potentially misleading signal
I (In the present study...)
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2
I Not much changeI Signal associated to switches between Ser families has a
mitigating effect on artefacts associated with G+Ccomposition biases.
I But no visible effect on the topology when combined withdata already purged from potentially misleading signal
I (In the present study...)
Blaise Li Plastids, Cyanobacteria and composition biases
Degenerating all but Ser codon positions 1 and 2
I Not much changeI Signal associated to switches between Ser families has a
mitigating effect on artefacts associated with G+Ccomposition biases.
I But no visible effect on the topology when combined withdata already purged from potentially misleading signal
I (In the present study...)
Blaise Li Plastids, Cyanobacteria and composition biases
Conclusions
I Incongruence between nucleotide and amino-acid datamainly due to G+C convergence biases. It is likely thatplastids diverged early from the Cyanobacteria.
I rDNA have direct selective contraints ont their sequence,hence the results similar to amino-acid data.
I Codon-degeneracy recoding permits the removal ofmisleading signal while retaining a part of the signal notpresent at the amino-acid level. This could lead to moreaccurate results.
Blaise Li Plastids, Cyanobacteria and composition biases
Conclusions
I Incongruence between nucleotide and amino-acid datamainly due to G+C convergence biases. It is likely thatplastids diverged early from the Cyanobacteria.
I rDNA have direct selective contraints ont their sequence,hence the results similar to amino-acid data.
I Codon-degeneracy recoding permits the removal ofmisleading signal while retaining a part of the signal notpresent at the amino-acid level. This could lead to moreaccurate results.
Blaise Li Plastids, Cyanobacteria and composition biases
Conclusions
I Incongruence between nucleotide and amino-acid datamainly due to G+C convergence biases. It is likely thatplastids diverged early from the Cyanobacteria.
I rDNA have direct selective contraints ont their sequence,hence the results similar to amino-acid data.
I Codon-degeneracy recoding permits the removal ofmisleading signal while retaining a part of the signal notpresent at the amino-acid level. This could lead to moreaccurate results.
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I NDCH model: composition can be different at differentnodes of the tree.
I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.
I And so less compostition convergence artefacts areexpected.
I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I NDCH model: composition can be different at differentnodes of the tree.
I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.
I And so less compostition convergence artefacts areexpected.
I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I NDCH model: composition can be different at differentnodes of the tree.
I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.
I And so less compostition convergence artefacts areexpected.
I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I NDCH model: composition can be different at differentnodes of the tree.
I This mitigates the likelihood cost associated withgrouping taxa with diverging composition.
I And so less compostition convergence artefacts areexpected.
I Number of composition vectors increased until simulateddata has a composition heterogeneity compatible withthat of the real data.
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Proteobacteria
Chlorobi
Chloroflexi
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
SO-6
UNIT+
NOST-1
SPM-3
OSC-2
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
1.000.99
1.001.00
1.001.00
1.00
0.99
1.001.00
1.00
1.00
cg75_p4CV2Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I Similar effect as no3 and degen3: UNIT+ monophylyrestored
High support because Bayesian posterior probabilities, notbootstrap supports.
I 2 composition vectors corresponding to high and lowG+C %
I Why not as efficient as full degeneracy?
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I Similar effect as no3 and degen3: UNIT+ monophylyrestoredHigh support because Bayesian posterior probabilities, notbootstrap supports.
I 2 composition vectors corresponding to high and lowG+C %
I Why not as efficient as full degeneracy?
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I Similar effect as no3 and degen3: UNIT+ monophylyrestoredHigh support because Bayesian posterior probabilities, notbootstrap supports.
I 2 composition vectors corresponding to high and lowG+C %
I Why not as efficient as full degeneracy?
Blaise Li Plastids, Cyanobacteria and composition biases
Modelling composition variations
I Similar effect as no3 and degen3: UNIT+ monophylyrestoredHigh support because Bayesian posterior probabilities, notbootstrap supports.
I 2 composition vectors corresponding to high and lowG+C %
I Why not as efficient as full degeneracy?
Blaise Li Plastids, Cyanobacteria and composition biases