1,2 , courtney w. stairs , daniel j. g. lahr , robert e. · 29.04.2020  · 58 opisthokonta,...

67
1 The integrin-mediated adhesome complex, essential to multicellularity, is present in the most 1 recent common ancestor of animals, fungi, and amoebae. 2 3 Seungho Kang 1,2 , Alexander K. Tice 1,2 , Courtney W. Stairs 3 , Daniel J. G. Lahr 4 , Robert E. 4 Jones 1,2 , Matthew W. Brown 1,2* 5 6 1. Department of Biological Sciences, Mississippi State University USA 7 2. Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, USA 8 3. Department of Cell and Molecular Biology, Uppsala University, Sweden. 9 4. Department of Zoology, University of São Paulo, Brazil 10 *Correspondence: Matthew W. Brown, [email protected] 11 12 Keywords: Integrin, Amoebozoa, Protein Domain architecture, amoeba, reductive evolution 13 14 Abstract: Integrins are transmembrane receptor proteins that activate signal transduction 15 pathways upon extracellular matrix binding. The Integrin Mediated Adhesion Complex (IMAC), 16 mediates various cell physiological process. The IMAC was thought to be an animal specific 17 machinery until over the last decade these complexes were discovered in Obazoa, the group 18 containing animals, fungi, and several microbial eukaryote lineages. Amoebozoa is the eukaryotic 19 supergroup sister to Obazoa. Even though Amoebozoa represents the closest outgroup to Obazoa, 20 little genomic-level data and attention to gene inventories has been given to the supergroup. To 21 examine the evolutionary history of the IMAC, we examine gene inventories of deeply sampled 22 set of 100+ Amoebozoa taxa, including new data from several taxa. From these robust data 23 sampled from the entire breadth of known amoebozoan clades, we show the presence of an 24 ancestral complex of integrin adhesion proteins that predate the evolution of the Amoebozoa. Our 25 results highlight that many of these proteins appear to have evolved earlier in eukaryote evolution 26 than previously thought. Co-option of an ancient protein complex was key to the emergence of 27 animal type multicellularity. The role of the IMAC in a unicellular context is unknown but must 28 also play a critical role for at least some unicellular organisms. 29 . CC-BY-NC-ND 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435 doi: bioRxiv preprint

Upload: others

Post on 20-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

1

The integrin-mediated adhesome complex, essential to multicellularity, is present in the most 1 recent common ancestor of animals, fungi, and amoebae. 2 3 Seungho Kang1,2, Alexander K. Tice1,2, Courtney W. Stairs3, Daniel J. G. Lahr4, Robert E. 4 Jones1,2, Matthew W. Brown1,2* 5 6 1. Department of Biological Sciences, Mississippi State University USA 7 2. Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, USA 8 3. Department of Cell and Molecular Biology, Uppsala University, Sweden. 9 4. Department of Zoology, University of São Paulo, Brazil 10 *Correspondence: Matthew W. Brown, [email protected] 11 12 Keywords: Integrin, Amoebozoa, Protein Domain architecture, amoeba, reductive evolution 13

14

Abstract: Integrins are transmembrane receptor proteins that activate signal transduction 15

pathways upon extracellular matrix binding. The Integrin Mediated Adhesion Complex (IMAC), 16

mediates various cell physiological process. The IMAC was thought to be an animal specific 17

machinery until over the last decade these complexes were discovered in Obazoa, the group 18

containing animals, fungi, and several microbial eukaryote lineages. Amoebozoa is the eukaryotic 19

supergroup sister to Obazoa. Even though Amoebozoa represents the closest outgroup to Obazoa, 20

little genomic-level data and attention to gene inventories has been given to the supergroup. To 21

examine the evolutionary history of the IMAC, we examine gene inventories of deeply sampled 22

set of 100+ Amoebozoa taxa, including new data from several taxa. From these robust data 23

sampled from the entire breadth of known amoebozoan clades, we show the presence of an 24

ancestral complex of integrin adhesion proteins that predate the evolution of the Amoebozoa. Our 25

results highlight that many of these proteins appear to have evolved earlier in eukaryote evolution 26

than previously thought. Co-option of an ancient protein complex was key to the emergence of 27

animal type multicellularity. The role of the IMAC in a unicellular context is unknown but must 28

also play a critical role for at least some unicellular organisms. 29

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 2: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

2

30

INTRODUCTION 31

Integrins are transmembrane signaling and adhesion heterodimers, made up of a-integrin (ITGA) 32

and b-integrin (ITGB) protein subunits [1]. They are important signal transduction molecules 33

across the cell membrane and associate with a set of intracellular proteins that act on the actin 34

cytoskeleton, together known as the integrin mediated adhesion complex (IMAC) (figure 1A) [2]. 35

The intracellular component of the IMAC includes adhesome proteins that are 1) integrin bound 36

proteins that can bind to actin directly (talin, α-actinin, and filamin); 2) integrin bound proteins 37

that can bind indirectly to the cytoskeleton (integrin linked kinase (ILK), particularly interesting 38

new cysteine-histidine rich protein (PINCH), and Paxillin); and 3) non-integrin binding proteins 39

such as vinculins. 40

41

The IMAC is best known for its critical role in cellular aspects of behavior including development, 42

migration, proliferation, survival and metabolism [1,3,4]. They function by relaying messages 43

through outside/in and inside/out between the extracellular matrix and the internal cytoskeleton[5]. 44

45

Since integrin proteins have been demonstrated to be absent in other complex multicellular 46

lineages of eukaryotes (i.e., plants and fungi) as well as in the choanoflagellates, which are the 47

closest protistan relatives of Metazoa, integrins were initially believed to be an animal specific 48

evolutionary innovation. However, over the last decade, investigations into the genomic content 49

of other close protistan relatives of animals have led to the discovery of integrins and other IMAC 50

components outside of Metazoa. Integrins and complete components of the IMAC have been 51

reported outside Metazoa in unicellular opisthokonts, such as Capsaspora owczarzaki[6], 52

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 3: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

3

Pigoraptor spp.[7], Syssomonas multiformis[7], Corallochytrium limacisporum[8], Creolimax 53

fragrantissima[9], Sphaeroforma arctica[10], as well in the apusomonad Thecamonas trahens [6], 54

and breviate Pygsuia biforma[11]. Thus, the IMAC complex and other associated proteins have a 55

much more ancient evolutionary origin than previously expected, well outside of the animals, but 56

IMAC remains to be only reported the in lineage called Obazoa, which is composed of 57

Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 58

Obazoa, or have they appeared even deeper in evolutionary history? 59

60

Amoebozoa is the eukaryotic supergroup that is sister to Obazoa, altogether grouped in the 61

Amorphea[12]. Despite the relatively close evolutionary proximity of Amoebozoa to the animals 62

containing lineage, genomic-level data and relevant research into the supergroup remains sparse. 63

The predicted or inferred genomic complement present has yet to be undetermined in the last 64

common ancestor of Amoebozoa. Recent works revealed that in Amoebozoa (Dictyostelium 65

discoideum and Acanthamoeba castellanii) crucial components of the IMAC were missing [13], 66

namely ITGA and ITGB. Was this result simply due to a lack of data or is it an evolutionary trend? 67

Here, we examine the gene repertories across the Amoebozoa supergroup to investigate the 68

evolutionary history of the adhesome associated with the integrin proteins. The absence of the 69

integrin proteins reported previously may simply be due to the great paucity of taxonomically 70

broad genomic data from this clade, which is a limiting factor in our knowledge of IMAC evolution. 71

72

To test if the lack of IMAC is truly absence or rather unobserved due to a lack of data from a broad 73

taxonomic sampling, we took a comparative genomic/transcriptomic approach utilizing data 74

recently used to resolve the tree of Amoebozoa [14]. Here, we have found a number of predicted 75

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 4: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

4

IMAC proteins across a wide breadth of Amoebozoa and in some unexamined obazoans 76

(Ministeria vibrans (Opisthokonta [12]) and Lenisia limosa (Breviatea[15])), including integrins, 77

both ITGA and ITGB (figure 1B). Across the breadth of these taxa, we have identified a nearly 78

complete IMAC. Here we demonstrate that IMAC is not exclusive to Obazoa. Rather it was in the 79

last common ancestor of Amorphea (i.e., Obazoa + Amoebozoa), which from our interpretations 80

of the gene distributions probably had a full set of these components (figure 1C). 81

82

BACKGROUND 83

Molecular details on Integrin proteins 84

The overall architecture of both integrin proteins includes a large extracellular N-terminal ligand-85

binding head domain and C-terminal stalk (also termed “legs”). Integrin activation is dependent 86

on the binding of a ligand (such as extracellular matrix proteins (collagen, fibronectin, and laminin) 87

[5]and the binding of divalent cations (primarily Ca2+, Mg2+, or Mn2+) on unique but well-88

conserved cation binding motifs[16–19]. In animals, there are universally five cation binding sites 89

on the ITGA and three on the ITGB paralogs[16,17,19,20]. Intracellular integrin engagement with 90

cytoskeletal proteins in IMAC can regulate cell-signaling pathways that control differentiation, 91

survival, and motility making the integrins one of the most functionally diverse family of cell 92

adhesion molecules [5,21]. 93

94

Integrin Alpha (ITGA) 95

Metazoa ITGA 96

In metazoan ITGA, which we refer herein to as “canonical ITGA proteins”, there is a β-propeller 97

made of seven blades (IPR013519)[18], an integrin leg domain (IPR013649)[18], a 98

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 5: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

5

transmembrane domain, and a C-terminal cytoplasmic tail amino acid motif (KXGFFXR - 99

IPR018184) [22](numbers are INTERPRO ids derived from INTERPROSCAN version 100

5.27.66 (IPRSCAN)[23]). These proteins are typically 700-800 amino acids in animals. The β-101

propeller blade motif is composed of an FG-GAP domain (IPR013517), and in the final 3-4 blades 102

there is a cation binding site motif between an “FG” and the “GAP” motifs. Of note, the consensus 103

pattern of the FG-GAP domain is highly variable[20], and not necessarily those amino acids (i.e., 104

FG and GAP)[20]. The cation binding blades of the β-propeller influences extracellular ligand 105

affinity and ligand specificity[24,25]. The cation motifs follow the consensus pattern of D/E-h-106

D/N-x-D/N-G-h-x-D/E (h=hydrophobic residue; x=any residue)[16]. In metazoan ITGA, there are 107

three ITGA leg domains (IPR013649) located next to the ITGA head. These ITGA leg 108

(IPR013649) domains are structurally and functionally similar to immunoglobulin (Ig) fold-like 109

beta sandwich[18]. 110

111

The classification of an authentic ITGA protein outside Metazoa would be a membrane docked 112

protein with an ITGA head composed of a β-propeller made of several β-propeller blades 113

(IPR013519) (figure 2), some with FG-GAP domains (IPR013517) only, and some with a cation 114

motif flanked by FG and GAP motifs (figure 4). However, we do not discard the possibility of 115

truncated ITGA because they occur in metazoans [26,27] and in the opisthokont Creolimax 116

fragrantissima [9]. 117

118

119

Integrin Beta (ITGB) 120

Metazoan ITGB 121

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 6: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

6

In metazoan ITGB, which we refer herein to as “canonical ITGB proteins”, there are seven 122

domains; a plexin-semaphorin-integrin (PSI) (IPR002165), a hybrid domain (described below), 123

three to four EGF modules (IPR013111), an integrin b tail subunit (IPR012896), transmembrane, 124

and signal peptide domains[18]. The metazoan hybrid domain is composed of a unique von 125

Willebrand factor A (known as ITGB “inserted” domain (or b-I) (IPR002369) and a Ig-like domain 126

(IPR013783) upstream of the b-I domain [18] (figure 3). EGF domains make up a cysteine rich 127

stalk motif (CRSM) that corresponds to a b-subunit stalk region[18]. The consensus pattern of the 128

CRSM in Opisthokonta is CXCXXCXC[6,28], in Metazoa there are typically 3-4 CRSMs. The 129

metazoan transmembrane domain consists of GXXXG membrane motif, important for 130

oligomerization of integrin proteins[29]. 131

132

In Metazoa, the b-I-domain is made up of three metal ion binding sites (comprising a ca. 200 133

amino-acid region in total)[18]. These are, a metal ion dependent adhesion site (MIDAS), site 134

adjacent to MIDAS (called ADMIDAS), and a synergistic metal ion binding site (SyMBS)[16]. 135

These metal ion binding sites are key regulators for all integrin-ligand interactions and require the 136

divalent cation ions such as manganese, calcium, and magnesium ions[16]. Three metal binding 137

sites have a consensus motif of DXSYSX95EX30D (at site 150 in Human ITGB1 (NCBI:P05556.2) 138

(MIDAS), SXXDDX121DX83A (at site 154 in Human ITGB1 (NCBI:P05556.2) (ADMIDAS), and 139

D/EX55NDXPXE (at site 189 in Human ITGB1 (NCBI:P05556.2) (SyMBS)[17]. The MIDAS and 140

SyMBS (the positive regulatory site for integrin activation[16]) motifs are essential for integrin-141

ligand binding[16,18], but ADMIDAS (the negative regulatory [30] site for integrin inhibition) is 142

not absolutely necessary for integrin activation[30]. The ITGB tail subunit (IPR012896) consists 143

of two conserved motifs responsible for activating the integrin complex: the cytosolic tail motif 144

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 7: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

7

(NPXY/F - to scaffold cytoskeletal proteins) located after the transmembrane region, and the 145

ITGA interacting motif (LLXXXHDRKE – a highly electrically charged amino acid motif) 146

downstream (C-terminus) of the NPXY/F motif, which interacts with scaffolding and signaling 147

proteins[31]. 148

149

The classification of an authentic ITGB protein outside Metazoa would be a membrane docked 150

protein with an ITGB head composed of a von Willebrand factor A (IPR002369) with three metal 151

ion binding sites, and cysteine rich regions at downstream (N-terminal) (figure 3). Again, we do 152

not discard the possibility of truncated ITGB since these are present in both metazoans [26,27] and 153

Creolimax fragrantissima[9]. 154

155

RESULTS 156

Integrin Alpha 157

Amoebozoan ITGA: General observations on ITGA 158

Our results show ITGA homologs are found in 23 out of 113 amoebozoan transcriptomes 159

examined (figure 2). Of those 23 amoebozoans, eight have more than one ITGA paralog (figure 160

2). Amoebozoan ITGAs vary greatly in size ranging from 523-1200 amino acids in length 161

(supplemental table 1). Many of the proteins we identified are predicted to have a diverse number 162

of β-propellers from a few (3-5 b-propeller blades), to canonical (6-7 b-propeller blades), to many 163

(more than 8 b-propeller blades) and cation binding sites (1 to 5 sites). However, all amoebozoan 164

ITGA proteins have at least three β-propeller blade (IPR013517) domains (figure 2), with an FG-165

GAP domain and FG-GAP/cation binding motif (figure 3A). The positions of the cation binding 166

motif/FG-GAP domains are seemingly distributed randomly (supplemental table 1), and not 167

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 8: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

8

necessarily in the last 3-4 blades as in canonical animal ITGA. We noticed the presence of a C-168

terminal transmembrane anchor (IPR021157) domain in three amoebozoan ITGAs. Animal-like 169

KXGFFXKR cytoplasmic tail motif is absent in all but one amoebozoan (figure 3E; supplemental 170

table 1). 171

172

We find amoebozoan ITGAs are present in all three major clade, but interestingly these proteins 173

are not in eight amoebozoan taxa that have genome sequences, (Dictyostelium spp., Acytostelium 174

spp., Heterostelium spp., Speleostelium caveatum, Physarum polycephalum, Entamoeba spp., 175

Acanthamoeba spp., and Protostelium fungiovrum[32]). However very importantly, we were able 176

to confidently find ITGA in Mastigamoeba balamuthi (figure 2), which has a complete genome 177

available [33]. To confirm the presence of these genes in the genome, we used a BLAST-based 178

homology search to identify 2 ITGA genes of M. balamuthi to the genome data (CBKX00000000) 179

on NCBI (see Methods) [86]. For ITGA, there are no introns in either paralog 1 180

(CBKX010025823.1) or paralog 2 (CBKX010005624.1). 181

182

We classified ITGA into two representative types based on their predicted IPRSCAN domains, 183

cation motifs, their similarity to metazoans ITGA, and novel domains that are previously 184

unobserved in ITGA proteins. 185

186

Amoebozoa ITGA: Type I ITGA: similar architecture to Metazoa 187

For communication purposes we have classified the amoebozoan ITGA homologs as ‘Type I’ and 188

‘Type II’ based on their similar or divergent domain complements compared to metazoan ITGA 189

sequences, respectively (figure 2). From the 23 amoebozoan taxa in which we identified an ITGA, 190

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 9: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

9

19 taxa have Type I (figure 2). One amoebozoan (Amphizonella sp. 9 paralog 2) has a Type I ITGA 191

with a domain architecture and cation motifs that are virtually identical to head region of metazoan 192

ITGA (figure 2, supplemental table 1). The rest of the amoebozoan Type I ITGAs have canonical 193

beta propellers, but they have varying numbers of cation motifs that differs from the canonical 194

ITGA proteins (supplemental table 1). Most of the cation motifs of all but four amoebozoan Type 195

I ITGAs follow the consensus metazoan amino acid pattern (figure 3A, supplemental table 1). 196

thirteen of the amoebozoan Type I ITGAs have both a predicted signal peptide and transmembrane 197

region (figure 2). The rest of the amoebozoans Type I ITGAs have only a predicted signal peptide 198

region or transmembrane region. Amoebozoa ITGA without transmembrane region may be the 199

result of endogenous soluble integrin. In a few cases we did not observe the signal peptide or the 200

transmembrane region. This may be due to truncation as we are predicting proteins from a 201

transcriptome or the product of shedding [34]. However, in the genome of Mastigamoeba 202

balamuthi all ITGA paralogs lack signal peptides and one lacks a transmembrane region (paralog 203

1). 204

205

Out of 19 Type I amoebozoan ITGA homologs, four ITGAs have an extracellular Ig-fold-like 206

(IPR013783) or cadherin-like (IPR015919) domain next to the ITGA head (figure 2). A cadherin 207

like domain (IPR015919) and an Ig-fold-like domain (IPR013783) are part of an immunoglobulin 208

fold like beta sandwich protein class, which fold in a similar structure[35]. The three amoebozoan 209

ITGAs that contain either an Ig-fold-like or a cadherin-like domain also have canonical b-210

propellers (6-7 blades) (figure 2), although two of the four Type I ITGA have an extra cation motif 211

(supplemental table 1). 212

213

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 10: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

10

Amoebozoan ITGA is a Type II ITGA with novel domains 214

We designate any ITGA homolog with a domain flanking the ITGA head that has not previously 215

been reported in an ITGA protein as a “Type II” ITGA (figure 2). From the 23 amoebozoan taxa 216

in which we identified an ITGA, 12 taxa have Type II (figure 2). Of the 23 Type II amoebozoan 217

ITGAs we discovered, two have a protein kinase domain (IPR000719) (figure 2), five 218

amoebozoans have a proline rich extension motif called “extensins” (PR01217) (numbers are 219

SPRINTS ids derived from PRINTS version 42.0 [36] and INTERPROSCAN version 5.27.66 220

[23](supplemental methods)) (figure 3F; supplemental table 1). 221

222

In one Type II ITGA, one amoebozoan sequence (Flabellula citata paralog 1) uniquely has an 223

epidermal growth factor (EGF)-like domain (IPR000742) as its only novel domain located 224

downstream (C-terminal) to the ITGA head (figure 2). Additionally, three amoebozoans uniquely 225

have two epidermal growth factor (EGF) domains (figure 3B), a Thrombospondin repeat-1 (TSP1) 226

(IPR000884) (figure 3C), and a carbohydrate binding domain called a wall stress component 227

(WSC) (IPR002889) (figure 3D) all with cysteine rich motifs located upstream (N-terminal) to the 228

ITGA head (figure 2). We were unable to detect any other protein in the nr database with 229

significant similarity to the N-terminal domain composition of these sequences (EGF-TSP1-WSC) 230

using an e-value cut-off < 1e-10. This suggests that this EGF-TSP1-WSC architecture may be 231

unique to these three amoebozoans. 232

233

Seven of the amoebozoan ITGAs we designated as Type II possess canonical (6-7 blades) b-234

propellers (figure 2) and a different number of cation motifs (supplemental table 1). The other five 235

amoebozoans Type II ITGAs have short (3-5 blades) b-propellers, and a wide range of number of 236

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 11: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

11

cation motifs (supplemental table 1). Four amoebozoan ITGA homologs of this type contained 237

both a signal peptide and a transmembrane domain (figure 2). Eight others had either a signal 238

peptide or a transmembrane region. 239

240

Obazoan ITGA: General observations on ITGA 241

We also found ITGA homologs in several underexplored obazoan transcriptomes: the opisthokont 242

Ministeria vibrans and the breviate Lenisia limosa (figure 2). Obazoan ITGA proteins have at least 243

three β-propeller blade (IPR013517) domains, with an FG-GAP domain (figure 2, supplemental 244

figure 5A) and FG-GAP/cation binding motif domains are seemingly distributed randomly 245

(supplemental table 1). Animal-like KXGFFXKR cytoplasmic tail motifs were observed in most 246

obazoan transcriptomes that were examined in this study (figure 4E, supplemental figure 5C). 247

Among the non-animal obazoans, the seven novel obazoans ITGA have an Ig-fold-like domain leg 248

next to ITGA head (figure 2; supplemental figure 5D). 249

250

Ministeria vibrans has both Type I and Type II paralogs (figure 2). In Type II, paralog 1 and 2 251

have a predicted long (10) ITGA b-propeller, but they do not contain transmembrane domains. 252

The other paralog has a few (3) b-propeller blades (one with FG-GAP/cation binding motif), a 253

cytosolic tail motif (figure 3E), and a transmembrane domain (figure 2). In both paralog 1 and 3, 254

a signal peptide region was not predicted. Ministeria vibrans Type II ITGA uniquely has an 255

epidermal growth factor (EGF) like domain (IPR000742) attached upstream (N-terminal) to the 256

ITGA head (figure 2). All obazoans in our study have a signal peptide or a transmembrane region 257

except for Pigoraptor spp., Sphaeroforma arctica, and Syssomonas multiformis (figure 2), which 258

lack both. 259

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 12: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

12

260

Phylogenetic analysis of ITGA 261

The unrooted tree of ITGA recovered all of Amoebozoa ITGA as a clade (figure 2), which is 262

generally sister to most of Obazoa. However, a few opisthokont sequences are nested in this clade. 263

These include a paralog of Capsaspora owczarzaki (ITGA4: XP_004347912), both M. vibrans 264

paralogs, Breviates and the lone Syssomonas multiformis ITGA. The major lineages or sublineages 265

of Amoebozoa were not recovered. Instead amoebozoan ITGA are divided into two clades. One 266

clade consists primarily of tubulinid amoebozoans (+ Ministeria) and the other contains the rest of 267

Amoebozoa. 268

It should be noted that in addition to these taxa, the genome sequence of the very 269

evolutionary distant Stramenopile brown alga (Ectocarpus siliculosus) revealed the presence of 270

three ITGA homologs[37] (supplemental figure 8). However, no other available Stramenopile 271

genome data or transcriptomic data searched thus far (for a list of taxa searched, please see 272

supplemental text) contains integrin homologs in our examinations. Therefore, we inferred an 273

additional ITGA tree with this species (supplemental figure 3). Through BlastP to NCBI-NR we 274

identified several predicted beta propeller proteins in Archaea that have a similar architecture to 275

the canonical ITGA. An unrooted tree of beta propeller homologs recovered Archaea as a 276

paraphyletic across our phylogeny (supplemental figure 8), this is further discussed below. 277

We observed more FG-GAP proteins in other taxa such as cyanobacteria (e.g. Gloeobacter 278

violaceus; WP_011143221.1, Nostoc punctiforme; ACC84844.1 and Crocosphaera watsonii; 279

WP_007307935.1), Haptophyta (Emiliania huxleyi; XP_005778208.1), Crytophyta (Guillardia 280

theta; XP_005842481.1), and Stramenopiles (e.g. Nannochloropsis spp. CCMP1776: TFJ84058.1, 281

EWM21519 and Cafeteria roenbergensis; KAA0148838.1) on NCBI. These FG-GAP proteins 282

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 13: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

13

either do not possess transmembrane and signal peptide regions or possess 7 transmembrane or 283

more. Thus, in our estimation these FG-GAP proteins fail to pass our minimum criteria of being 284

an ITGA. 285

286

Integrin Beta 287

Amoebozoan ITGB: General observations on non-metazoan ITGB 288

We find that ITGB homologs are found in 19 out of 113 amoebozoan transcriptomes examined 289

with three ITGB paralogs in Rhizamoeba saxonica and two ITGB paralogs in Idionectes vortex 290

(figure 4). We find amoebozoan ITGBs are present in all three major clades of Amoebozoa. 291

However like in ITGA, these proteins are not in four amoebozoans groups that have genome 292

sequences, (the dictyostelids, entamoebids, acanthamoebids, and Protostelium fungiovrum[32]). 293

We were able to find ITGB gene in the genome of the amoebozoan Mastigamoeba balamuthi 294

ATCC 30984 (figure 6). For ITGB, there were five introns in the genomic scaffold 295

CBKX010020319.1 and eight introns in CBKX010020318.1. These introns are canonical M. 296

balamuthi introns and the other genes on the scaffold are mastigamoebid/amoebozoan in nature. 297

We detected a tenascin ORF on ITGB genomic scaffold of CBKX010020318.1 and PA14 ORF on 298

CBKX010020318.1. We inferred a phylogenetic tree for tenascin and PA14 by searching for 299

homologs in our extensive datasets, both of which clearly show amoebozoan signal (shown as a 300

subset in figure 6). The presence of introns within the genomic scaffold for ITGB clearly 301

demonstrates that this protein is indeed part of the genome. Additionally, the phylogenetic signal 302

of other ORFs on these genomic scaffolds show amoebozoan signal and are not from 303

contamination in the genome project. 304

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 14: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

14

305

The size range of ITGB paralogs in Amoebozoa vary widely from 595 to 3296 amino acids 306

(supplemental table 2). In addition to the novel ITGB proteins in Amoebozoa, we also found a 307

previously unreported ITGB homologs in two obazoan transcriptomes, as well in CRuMs, which 308

is a recently described clade closely related to Amorphea[38]. Based on domain and motif 309

architecture from IPRSCAN outputs, we classify non-metazoan ITGBs into two representatives; 310

ITGB that are similar to a canonical ITGB, and unique domain architecture proteins that are ITGB-311

like. 312

313

Type I ITGB: similar architecture to Metazoa 314

Our first representative type of ITGB has a similar domain architecture to metazoan ITGB (figure 315

4). From the 19 amoebozoan taxa in which we identified an ITGB, 14 taxa have Type I (figure 4). 316

Most amoebozoans in this class appear to have the vWA domain, b-subunit stalk region and the 317

PSI domains except two amoebozoans that lacks PSI domains and one amoebozoans which possess 318

the duplicated vWA domain (figure 4). Amoebozoa have the CRSM pattern in the b-subunit stalk 319

region of DXCGXCGGX(3-6)CXGC (figure 5F) ranging from three to seven times (Supplemental 320

table 2). In of b-tail motifs (IPR012896), the cytosolic tail motif (NPXY/F) is present across 321

Amoebozoa (figure 5G) except one amoebozoan (supplemental table 2) whereas an ITGA 322

interacting motif (LLXXXHDRKE) is absent in Amoebozoa. Twelve amoebozoans in this class 323

possess amino acids with a predicted electrostatic charge where the ITGA interacting motif 324

(LLXXXHDRKE) would be present in metazoan ITGB (figure 4, figure 5H). MIDAS are well 325

conserved especially in amoebozoan b-I domains (figure 5A, B) and SyMBS are well conserved 326

across the amoebozoans (figure 5C, D). However, ADMIDAS motifs are quite variable in 327

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 15: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

15

Amoebozoa compared to the metazoan consensus pattern (figure 5B, Supplemental table 2). A 328

signal peptide and a transmembrane domain were predicted in most amoebozoan ITGB in this 329

class and three amoebozoans lacking signal peptides (figure 4). 330

331

The ITGB proteins of non-metazoan obazoans have domain/motif architectures very similar to a 332

canonical metazoan ITGB (figure 4, supplemental 6). At the B-tail motifs (IPR012896) of Obazoa, 333

the cytosolic tail motif (NPXY/F) is present in all obazoans ITGB homologs (supplemental figure 334

6G), but we did not observe the ITGA interacting motif (LLXXXHDRKE) in ITGB of Breviatea. 335

We found a metazoan GXXXG motif in three amoebozoan ITGB (supplemental table 2), 336

suggesting that this motif is not animal specific. There are multiple tandem repeats of the 337

opisthokont CRSM pattern in Obazoa ITGB and both breviate taxa have a clear expansion of b-338

subunit stalk region as previously reported [6–11] (figure 3). All of the obazoan ITGB MIDAS, 339

ADMIDAS, SyMBS motifs were well-conserved (supplemental figure 6A-D). We also 340

investigated a possible “choanoflagellate ITGB gene” that was recently reported to be present in 341

the transcriptome of Didymoeca costata[39]. The protein domain architecture has a serine protease 342

flanking vWA, PSI domain. However, it is missing the CRSM, NPXY/F (IPR021157), 343

LLXXXHDRKE (IPR014836) and GXXXG motifs. To our surprise Rigifila ramosa (in CRuMs), 344

has a protein that contains all the canonical domains of ITGB such as vWA, amoebozoan CRSM 345

pattern and c-terminal motifs (figure 3) and all typical cation motifs (supplemental table 2). We 346

were unable to find an ITGA homolog in either Rigifila ramosa or Didymoeca costata. 347

348

Type two ITGB: unique architecture compared to Metazoa and solely found in the amoebozoan 349

clade Variosea 350

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 16: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

16

Our second representative class of ITGB, which we refer to as Type II, are large proteins (1405 aa 351

to 3641 aa) that have novel domain architectures unobserved before in ITGB, which we refer to as 352

Type II. Of the total 19 amoebozoan taxa that have ITGB five amoebozoans have an ITGB 353

assigned to our Type II class (figure 4). Type II amoebozoan ITGB domain architectures include 354

Laminin G3 (LamG3) (IPR013320), Von Willebrand Type D (vWD) (IPR001846), Ig-like fold 355

(IPR013783), EGF-like (IPR013032) and a cyclic nucleotide binding (IPR000595) domain (figure 356

4. A canonical human ITGB has four EGF domains [16,18] that coincide with the location of Ig 357

regions in amoebozoan ITGB. Three amoebozoan ITGB in this class have an extracellular EGF–358

like domain at the N-terminus, at the location of the PSI domain (figure 3). LamG3 (CSM) and 359

EGF-like domain are recovered in two amoebozoan ITGB (figure 3). The cation binding motifs 360

MIDAS, ADMIDAS motifs are hypervariable in Type II representatives compared to the metazoan 361

consensus pattern, but SyMBS is well conserved except in Filamoeba nolandi (supplemental table 362

2). 363

364

365

Phylogenetic analysis of ITGB 366

We selected Homo sapiens, Gallus gallus, and Nematostella vectensis as a selective reference 367

group because integrin proteins are highly conserved among Metazoa. The tree of ITGB reveals a 368

three major clade when we rooted on Opisthokonta as an outgroup (figure 3). We observed three 369

clades of type II amoebozoan ITGB, breviates with apusomonads ITGB, and type I amoebozoan 370

with the CRuMs taxon Rigifila ramosa ITGB. Therefore, we do not recover Amoebozoa as a 371

monophyletic group. 372

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 17: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

17

Additionally, we investigated the evolutionary history of Sib (Similar to Integrin Beta) proteins. 373

Sib proteins is a cell adhesion molecule that are similar to ITGB, originally discovered in 374

Dictyostelium discoideum [40]. However, they do not appear to have a canonical ITGB 375

(Supplemental result). Additionally, an ITGB homolog was shown in cyanobacteria, but it 376

contained only one of the ITGB domains[6]. Therefore, we inferred an additional ITGB tree with 377

these SIBA proteins from these species (Didymoeca costata, Dictyostelium discoideum and 378

cyanobacteria) (supplemental figure 2). In this tree, a clade of Type II ITGB is placed between 379

the cyanobacteria Trichodesmium and the choanoflagellate Didymoeca. However, these apparent 380

ITGB homologs do not have obvious ITGB-stalk region or NPXY motifs. While they are 381

recovered well with other ITGB, given the important missing motifs and domains the placement 382

of these in a phylogenetic context alone does not clearly define these as ITGB. 383

384

Scaffold Cytoplasmic proteins 385

In our comparative transcriptome and genome analysis, we observed the following intracellular 386

scaffold cytoskeletal adhesomes proteins: talin, a-actinin, vinculin, PINCH, paxillin, and ILK 387

throughout Amoebozoa. These six adhesome proteins were have been reported to be present in 388

Amoebozoa[6]. Since then the number of the core consensus adhesomes were expanded[21]. 389

Filamin, kindlin, and tensin are three consensus adhesome proteins and cross-actin linkers that 390

interact with ITGB[21]. We have observed two novel adhesome proteins (filamin, tensin) that were 391

not known in Amoebozoa (Supplemental figure 1). We suspect these cytoskeleton proteins may 392

be critical, and the absence of these genes in some amoebozoan taxa is probably a result low gene 393

coverage in some transcriptomes. Talin, PINCH, filamin, and Paxillin key players in the integrin 394

adhesome, were not present in seven amoebozoan transcriptomes (figure 1B), however their 395

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 18: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

18

absence is also suspected to be a artefact of low gene coverage in these data. Across the 396

amoebozoan clade, we also noticed a large number of what appear to be truncated talin compared 397

to the rest of adhesome proteins (figure 1B). 398

399

ILK is scattered across the Amoebozoa unlike in Apusomonadida and Opisthokonta[6–10]. a-400

actinin is the only signaling protein that is present in all of the amoebozoan taxa. Consistent with 401

previous reports, we did not detect FAK-cSRC or Parvin in any of Amoebozoa or Breviatea[11]. 402

Since the gene coverage for Pygsuia (Breviatea) is high and genomes are available for Lenisia 403

limosa (Breviatea) and Entamoeba spp. (Amoebozoa), we conclude that breviates and Entamoeba 404

spp. likely do not possess vinculin proteins (supplemental figure 1). For Ectocarpus siliculosus, 405

its genome has talin, and a-actinin homologs[37], but it does not contain any other integrin-406

signaling adhesome protein homologs. 407

408

409

410

DISCUSSION 411

Our findings show that the IMAC is not exclusive to obazoans as previously proposed[6,8–11]. 412

We report a nearly full set of IMAC proteins is present throughout Amorphea including four 413

amoebozoan species (Flabellula citata, Rhizamoeba saxonica, Cryptodifflugia operculatum, 414

Mastigamoeba balamuthi, and Nebela sp.), and two previously unsurveyed obazoan taxa Lenisia 415

limosa (Breviatea) and in Ministeria vibrans (Opisthokonta). Our increased taxon sampling that 416

spans the breadth of the known diversity within Amoebozoa permitted discovery of these proteins. 417

While the integrin proteins undoubtedly played a role in the origins of animal 418

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 19: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

19

multicellularity[6,41,42], we do not find any evidence of integrins in the few multicellular lineages 419

of Amoebozoa, those which socially aggregate to form emergent fruiting body structures 420

(Copromyxa protea and the model dictyostelids). Interestingly, secondary loss of these proteins 421

seems to be rampant, as the majority of amoebozoan taxa appear to lack integrins or some of the 422

IMAC components, but the caveat is that some of these data are based on incomplete 423

transcriptomic data. Therefore, their absence may be due to the available data. None-the-less, the 424

most parsimonious explanation of the IMAC evolution is that the ancestor of Amoebozoa must 425

have contained them. An analogous scenario of secondary loss of the IMAC plays out in obazoan 426

evolution, in which we see that the closest relative of animals, the Choanoflagellata[39,43,44], as 427

well as the Nucletmycea (also referred to as the Holomycota, the fungal lineage) are devoid of key 428

IMAC components (namely the integrins)[6,11]. 429

430

The functional domains of ITGA in amoebozoans are similar to the canonical ITGA found in 431

animals and their close relatives like the opisthokont filastereans Capsaspora owczarzaki[6], 432

Ministeria vibrans, and Pigoraptor spp.[7]. Given the homologous nature of the functional domain 433

architecture in amoebozoan ITGA, it is likely that they function at the cellular-level in a similar 434

manner to those in animals. Therefore, we suspect the last common ancestor of Amorphea (LCAA) 435

probably already had ITGA within its genome. Of note, there are potential ITGA homologs in 436

Archaea, therefore it is likely ITGA homologs predate the eukaryotic last common ancestor, but 437

have been retained mostly in Amorphea. We also suspect that the ITGA Ig-like domains possessed 438

by some amoebozoans and obazoans are equivalent to metazoan ITGA legs originally thought to 439

be specific to metazoans[6,11]. An ITGA leg domain was probably in the ITGA of the LCAA, and 440

secondarily lost in some taxa. 441

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 20: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

20

442

Some novel functional domains in Type II representatives are integrin binding domains that are 443

present in ECM protein such as LamG[45], TSP-1[46], and EGF-like domains[47]. Could the last 444

common ancestor of Type II representatives or LCAA contain these integrin binding domains? 445

Other functional domains in amoebozoan ITGA such as protein kinase, TSP-1 together with WSC 446

and intramolecular chaperone are not found in Obazoa. Domain shuffling is an important 447

mechanism for creating a novel genes in evolution[48]. It has been suggested that many domains 448

in the ECM are found in unrelated proteins[49]. Whether these proteins were parts of the domain 449

shuffling in ITGA of the last common ancestor of Amoebozoa, which resulted in ECM evolution 450

remains to be seen and their function remains elusive. However, based on our results, it is possible 451

that we have predicted novel integrin proteins with catalytic as well as adhesive properties. 452

453

The domains of amoebozoan ITGB are very similar to the canonical metazoan ITGB and we 454

suspect that the last common ancestor of Amoebozoa retained these canonical ITGB structures 455

including the CRSM motif and three metal binding sites. The cytosolic tail motif “GFFKR” is 456

probably specific to Obazoa and there may be no interaction in the c-terminal intracellular region 457

of integrins in Amoebozoa. The presence of transmembrane and the signal peptide regions are 458

sporadic in Amorphea including metazoan, a previous study showed shedding (evolution from 459

membrane docked to a secretory function) of integrins occurs[27]. The amoebozoan ITGB 460

cysteine pattern of CGXCGGXXXXCXGC possesses more amino acids, including glycine, 461

compared to Obazoa (figure 5F). Glycine is a small non-polar amino acid[50] and due to its small 462

size, glycine positions are often highly conserved within proteins[51,52]; because, glycine residues 463

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 21: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

21

maintain the overall architecture[53]. It may be interpreted that increased glycine residues could 464

maintain the overall protein structure in amoebozoan ITGB. 465

466

The type II ITGB form an independent clade, and it is a sister to the other amoebozoans and R. 467

ramosa in figure 4. LamG3 and vWD in ITGB have never been observed in Obazoa and the 468

functions of these type II ITGB proteins are unknown. Some unicellular protists such as breviates, 469

Capsaspora owczarzaki, and apusomonads have a long expansion of the CRSM for ITGB [6,11]. 470

We suspect that Breviatea with a long integrin legs form a long integrin heterodimerization 471

complex at the extracellular area of the cell, as both ITGA and ITGB has long legs of nearly equal 472

proportions in Pygsuia biforma (figures 2 and 3)[11]. 473

474

Our prediction is that most of Discosea, which includes Acanthamoeba (again with several 475

genomes available) and many other sampled transcriptomes in our study, have lost these proteins. 476

However, one representative discosean (Mycamoeba gemmipara) has canonical ITGB homolog 477

(figure 4). Additionally, four other amoebozoan taxa with genome sequences available also appear 478

to have lost ITGB in their evolution. There are amoebozoans in which we have observed only 479

ITGA or ITGB receptors, without their counterpart integrin protein. A study by Li et al. and 480

Schneider et al. showed that homodimerization of ITGA or ITGB might occur by clustering of 481

integrin in a lipid raft[54,55]. Our results show that many of the amoebozoan species singly possess 482

either ITGA or ITGB and yet they possess all the components of signal components of IMAC. 483

These amoebozoan with singular integrins may be capable of initiating the integrin signaling 484

pathway by a similar process. 485

486

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 22: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

22

Also, the question remains that the complexity of extra domains of integrin are not species related 487

and their compositions are a single multidomain protein that is present only in Amoebozoa. 488

Phylogenetically these orthologs (i.e., amoebozoan ITGA and ITGB) form novel clades 489

independent to known organisms and most likely are ancestral to the Amoebozoa as a whole. In 490

order to further examine the overall distributions and protein architectures, genomic sequencing 491

will be necessary, and the combination of localization and gene knockout strategies are necessary 492

to understand the function of these ‘multicellularity’ proteins in unicellular taxa. 493

494

Conclusion 495

Our most comprehensive comparative genomic and transcriptomic of Amoebozoa has revealed the 496

presence of a full set of IMAC across the Amoebozoa. To date, this complex was believed to be 497

Obazoa specific, our data shows that these IMAC proteins predates Obazoa. Therefore, the last 498

common ancestor of Amorphea contains the necessary machinery of IMAC, yet it remains 499

unknown if and what functional role they play in these unicellular protists. In future work, 500

localization and functionality studies of these IMAC proteins will be investigated. 501

502

Acknowledgements 503

This project was supported in part by the United States National Science Foundation (NSF) 504

Division of Environmental Biology (DEB) grant 1456054 (http://www.nsf.gov), awarded to MWB. 505

We thank Dr. Andrew J. Roger (Dalhousie University) for advanced access to Mastigamoeba 506

balamuthi transcriptome, which was supported by grant MOP-142349 from the Canadian Institutes 507

of Health Research awarded to A.J. Roger. 508

509 FIGURE LEGENDS 510

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 23: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

23

511 Figure 1: A) Repertoire of IMAC in amoebozoan species that have either integrin alpha or beta. 512 Names of IMAC proteins are listed at the bottom and names of amoebozoan taxa are listed on side 513 of the presence/absence heat map. Arrows indicates amoebozoan species that have nearly complete 514 set of IMAC proteins. Dark blue indicates the presence of a canonical IMAC protein, white 515 indicates the absence of an IMAC protein and light blue indicates a truncated form of IMAC 516 protein. B) Phylogenetic distribution of IMAC. Circle, innovation/gain; bar, loss. Abbreviations: 517 vin, vinculin; pax, paxillin; tal, talin; par, parvin; pin, particularly interesting new cysteine-518 histidine-rich protein; FAK, focal adhesion kinase; ILK, integrin-linked kinase; αA, alpha-actinin; 519 α, α-integrin; β, β-integrin. C) Cartoon schematic of the membrane docked IMAC, which 520 associates with intracellular actin (modified from Brown et al., 2013), color corresponds with the 521 species tree. 522 523 524 525 526 527 528

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 24: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

24

529 Figure 2: Domain architecture of ITGA from Amoebozoa. The domain architecture of ITGA for 530 amoebozoan species are shown at the right side and phylogenetic tree of Amoebozoa is shown at 531 the left side of figure. Keys to each domain’s color and shape are represented at the bottom of the 532 figure. Phylogenetic tree of amoebozoan ITGA, 160 sites phylogeny of amoebozoan ITGA rooted 533 with Obazoa ITGA. The tree was built using IQTREE v1.5.5 under the LG+C60+F+G model of 534 protein evolution. Numbers at nodes are ML bootstrap (MLBS) values derived from 1000 ultrafast 535 MLBS replicates. Any values that are below 50% are not shown. The color of branches 536 amoebozoan sequences are illustrated according to Kang et al figure 3[14]. When domains are 537 overlapped because of different software algorithms, these are represented in transparent colors. 538 539

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 25: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

25

540 Figure 3: Most significantly enriched motifs in 23 amoebozoans ITGA homologs, as obtained by 541 MEME-suite except for (E). (A) Consensus cation motif and FG-GAP motifs found in 23 542 amoebozoans ITGA homologs. (B-D) For figure 3B to 3D, these consensus motifs are exclusively 543 enriched in three amoebozoan ITGA homologs out of 23 amoebozoan ITGA homologs: (B) EGF-544 like motif (C) Thrombospodin repeat 1 motif (D) Carbohydrate WSC motif. (E) GFFKR motif 545 found in three ITGA homologs from Filamoeba nolandi and two obazoans species, Ministeria 546 vibrans and Pygsuia biforma. (F) Proline-rich extensin motif found in 8 ITGA amoebozoan 547 homologs out of 23 amoebozoan ITGA homologs. 548

549

ITGA Motif

MEME (no SSC) 07.06.2018 21:11

0

1

2

3

4

bits

1

C2E

LQ

3FY

4GST

5W 6GST

7PTV

8W 9DST

10

QA

11

C

12

S

13

YV

14

DE

15

CMEME (no SSC) 18.06.2018 20:29

0

1

2

3

4

bits

1ACILNRYTS

2AFLQSIV

3

DNTIVAS

4

F

P

V

CNTYGLAS

5

EIGLSA

6ARPG

7D 8AMLFVI

9DN

10

HNVAG

11

ND

12

KRNG

13

MRYFVKLI

14

KQLAGSNPD

15

ED

16

AVLI

17

MIAVL

18

FLVI

19

ASG

20

CEFRVITSA

21

GQTVKSYP

MEME (no SSC) 18.06.2018 21:14

0

1

2

3

4

bits

1LTP

2ST

3INPVL

4ALTVNS

5VAP

6IST

7FLNS

8NQTAS

9VTP

10

T

11

NP

12

NRTS

13

LTP

14

GIST

15

ARSVNP

16

NQRPS

17

TP

18

ST

19

NQSVL

20

ILT

21

P

22

ST

23

EINQPT

24

NPRS

25

ILP

26

ST

27

DFILTP

28

PQTS

29

VTP

30

ST

31

DNP

32

LNVS

33

LNP

34

AST

35

TLNI

36

AIMRS

37

AQTP38

PST

39

PTINS

40

NSVT

41

RP

42

RST

43

LPSTIN

44

ATRS

45

LSIP

46

ST

47FNRVP

48NTAS

49NRP

50

ST

MEME (no SSC) 18.06.2018 22:08

0

1

2

3

4

bits

1

C

2ALQV

3

C4D

EKL

5DLNP

6

G7F 8F

HST

9AG

10

D11

G12

KLNS

13

QN

14

C15

FT

16RD

MEME (no SSC) 18.06.2018 22:11

0

1

2

3

4

bits

1N 2TS

3N 4HRV

5ALN

6

C

7MQT

8LQ

9AY

10

C11

EFQ

12

D13

L14

GHQ

15

FMY

16

PS

17

YF

18

A19

A20

T

A.

B. C.

D.

F. MEME (no SSC) 18.06.2018 20:36

0

1

2

3

4

bits

1HLK

2HKQE

3ENSRK

4AFEKR

5PQE

6MPQVE

7ADNPK

8MPQE

9INRYDK

10

IQRE

11

ELM

E.

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 26: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

26

550

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 27: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

27

551 552 Figure 4: Phylogenetic tree of ITGB homologs with a focus on Amoebozoa. The domain 553 architecture of ITGB homologs are shown to the right and the phylogenetic tree of ITGB. Keys to 554 each of domains color and shape are represented in the box at the bottom of the figure. Tree of 555 amoebozoan ITGB 295amino acids sites phylogeny of amoebozoan ITGB rooted with Obazoa. 556 The tree was built using IQTREE v1.5.5 under the LG+C60+F+G model of protein evolution. 557 Numbers at nodes on the phylogenetic tree are maximum likelihood bootstrap support values 558 derived from 1000 ultrafast bootstrap replicates. Any values that are below 50% are not shown. 559 The color of branches amoebozoan sequences are illustrated according to Kang et al. figure 3[14]. 560 When domains are overlapped because of different software algorithms, these are represented in 561 transparent colors. 562 563

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 28: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

28

564 Figure 5: Most significantly enriched motifs in 18 amoebozoans ITGB homologs, as obtained by 565 MEME-suite. (A)-(F) Three divalent cation binding motifs: MIDAS, ADMIDAS and SyMBS are 566 shown. (A) MIDAS (black arrow) and ADMIDAS (red arrow) motifs are shown. (B) A shared 567 amino acid (indicted by arrow) in MIDAS and ADMIDAS motifs are illustrated. (C) SyMBS motif 568 represented by green arrow and a shared amino acid of MIDAS motif indicated by black arrow are 569 shown. These consensus motifs are obtained from type I amoebozoan ITGB homologs sequences 570 only. (D) SyMBS motif indicted by a green arrow is shown. (E-F) Cysteine rich motifs are shown. 571 (E) PSI domain and (F) Cysteine-rich motif stalk is shown. These are recovered from type I 572 amoebozoan ITGB sequences. (G-H) Motifs found at the C-terminal are shown. (G) NPXY motif 573 shown in blue arrow. found in most amoebozoans ITGB except three amoebozoan ITGB homolog. 574 (H) An electrostatic charge motif is shown, this motif is found exclusively in type I ITGB 575 homologs. 576 577 578 579

ITGB MotifA.

C.

B.

E.

MEME (no SSC) 11.01.2018 20:59

0

1

2

3

4

bits

1YFILTD

2LSTVDEI

3ILQVP

4ASP

5LI

6ED

7LMVF

8ILVYF

9IVF

10

IVL

11

HMLQ

12

D13

VSA

14

TS

15

EFYAVGI

16

S17

IM

18

FGHLQVIKRS

19

QRDP

20

EFLNDHY

21

VMI

22

DELNQASTK

23

ADMTNL

24

MVFI

25

ILSGAQR

26

ALNDRST

27

ISVML

28

ALTMV

29

GP

30

DNRKQ

31

VLI

0

1

2

3

4

bits

1

C

2NVG

3VYFIW

4

C

5DEKSAVG

6EHGLQDS

7DHKSVLNT

8DENRSKG

9LMQVKARSET

10

C11

DHQWEIRVML

12

LVGPSTA

13

KVCNAG

14

GQATSDN

15

DLMQTVIAEGP

0

1

2

3

4

bits

1AD

L

N

EKSTQRG

2AGMSYTINVL

3HL

Q

V

Y

GIRTSA

4DE

F

L

Y

ANRSIG

5VGN

6AFKSQINGY

7ED

8FLNP

Q

VERTWGH

9KQYCP

10

DQNE

11

GTADNS

12

C

D

VAFGLMSPQ

13

HSWIMFL

14

NSVTDE

15GA

16VIMAL

17

AGIVYMFL

18

MRFHILQY

19

TGLVSA

20

GILTSA

21

A

C

D

IKQWYRVL

0

1

2

3

4

bits

1IRVFLY

2D 3ELAIV

4

C5N

G6I

V7

C8D

G9

G10

KLND

11

SGN

12

DLNRTS

13

LST

14

C15

ADQRIL

16

DG

17

C18

D19

HNG

20

FKRIV

21

AFSP

0

1

2

3

4

bits

1SARG

2HVIMYLF

3NTVSAG

4GMKQATS

5HYF

6RLSIV

7NED

8IRK

9AKNRSVTP

10

AKNTCQLIV

11

MYVALS

12

GNVYP

13

C

HI

LS

W

QMYF

14

STIMAVG

15

C

QRDNYGKS

16

AMSIYWPT

17

AQRNSVPT

18

QSTDEP

MEME (no SSC) 24.04.2018 21:59

0

1

2

3

4

bits

1LPS

KRH

2FTMVIL

3AF

CTMIVL

4AY

IMFLV

5

TIYMLVF

6

LCSIATV

7GST

8EKD

9ENGSDA

10

ACNKGSDT

11

TAYSF

D.

0

1

2

3

4

bits

1NQSTAW

2DGHIVTNSQK

3AGMSVTQL

4SKNEQGHD

5GH

V

AENTSD

6SN

7P 8HTIL

9FY

10

IRQEK

11

DQPANGES

12

NQSA

13

FQGVIKT

14

DNRSTK

15

AQVNKET

16

KRYHVTF

17

F

SNEIQKVD

18

SN

19

AVP

20

QFIRTL

21

LFY

MEME (no SSC) 06.04.2018 16:56

0

1

2

3

4

bits

1LTA

2IV

3VIA

4GLVA

5GL

6SA

7TVAS

8ILV

9LTA

10

AGILV

11

V

12

AG

13

LIV

14

IV

15

IFA

16

A

17

GVAI

18

LAIV

19

GA

20

AFIVCL

21

CLVI

22

ALVMF

23

CFL

24

AGILSVF

25

VSG

26

CGA

27

ASTG

28

VA

29

IKA

30

IM

31

VAM

32

RAKV

33

A

34

D

35

ADEHKNQT

36

IMVE

37

ML

38

FHLVI

39

TDE

40

QK

41

LE

F.

G.

H.

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 29: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

29

580 Figure 6: The presence of ITGB gene in a Mastigamoeba balamuthi genome sequence. 581 Introns/exons boundaries sequences are shown along with corresponding intron size. The genome 582 contig that possess ITGB gene are colored blue and green. The protein domain architecture of 583 ITGB gene are shown along with protein size. The adjacent tenascin gene on the ITGB genome 584 contig are shown left and right along with the inferred phylogenetic tree. 585 586

Supplemental Materials 587

Supplemental Results: Details on the Sib (similar to integrin beta) proteins found in dictyostelids. 588

Supplemental Figure 1: Repertoire of IMAC including filamin and tensin in Amoebozoa. 589

Supplemental Figure 2: Repertoire of IMAC in Amoebozoa species. Supplemental Figure 3: 590

Maximum likelihood tree of ITA homolog and other cell adhesion molecules that contain beta-591

propellers. Supplemental Figure 4: Maximum likelihood tree of ITB homolog and Sib proteins. 592

Supplemental Figure 6: Most significantly enriched motifs in Obazoan ITA, obtained by MEME-593

suite. Supplemental Figure 7: Most significantly enriched motifs in Obazoan ITB, obtained by 594

MEME-suite. Supplemental Figure 8: Most significantly enriched motifs in Sib protein, obtained 595

by MEME. 596

597

598

Supplemental Table 1: The table shows the list of Amorphea ITGA proteins characteristic 599

including: length, beta propeller motifs, subcellular location, any extra domains attached to ITGA, 600

c-terminal motif signal peptide, transmembrane regions and type of ITGA. The order of taxons are 601

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 30: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

30

listed in accordance to the Figure 2. The subcellular localization is displayed with likelihood value 602

in parentheses. The location of extra domains attached to ITGA are shown as the C-terminal end 603

(CTE) or at the start of N-terminal (NTB). The presence of ITGA interacting motif with ITGB is 604

shown as GFFKR and GXXXG. The location of beta propeller motifs performed by MEME-suite 605

is displayed. The presence of FG-GAP motifs are shown in asterixis. The location of beta motifs 606

that corresponds to IPRSCAN output are highlighted by parentheses. The presence of consensus 607

cation motifs are displayed in number 2 superscripts. 608

609

Supplemental Table 2: The table shows ITGB motifs, subcellular localizations, signal peptide, 610

transmembrane and extra domains attached to ITGB protein. The order of taxons are listed in 611

accordance to the Figure 3. For ITGB motifs of MIDAS, AMIDAS and SyMBS, any non-canonical 612

amino acids were highlighted in red color, cysteine rich motifs are either shown as EGF or PSI 613

domains. The presence of C-terminal motifs are shown in KLXXXD or NPXY/F and GXXXG. 614

The parentheses indicate number of motifs present in ITGB. 615

616

Supplemental Table 3: This table shows Rsem value of protists that have either a complete set of 617

IMAC, integrins with extra domains or unexpected discovery of an integrin protein. The values 618

are shown in transcripts per million (TPM). 619

620

Supplemental Table 4: A). Summary of amoebozoan ITA type I table listing: number of paralogs, 621

Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal peptide 622

and transmembrane region, c-terminal interacting motif, GXXXG motif and Ig-fold like/Cadherin 623

domain. B). Summary of amoebozoan ITA type II table listing: number of paralogs, Beta 624

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 31: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

31

propeller, FG-GAP and canonical motifs. The table also shows presence of signal peptide and 625

transmembrane region, c-terminal interacting motif, GXXXG motif and extra domain attached to 626

ITA. 627

628

REFERENCES 629

1. Hynes, R.O. (2002). Integrins: bidirectional, allosteric signaling machines. Cell 110, 673–630 687. 631 2. LaFlamme, S.E., and Auer, K.L. (1996). Integrin signaling. Semin. Cancer Biol. 7, 111–632 118. 633 3. Harburger, D.S., and Calderwood, D.A. (2009). Integrin signalling at a glance. J. Cell Sci. 634 122, 159–163. 635 4. Horton, E.R., Byron, A., Askari, J.A., Ng, D.H.J., Millon-Frémillon, A., Robertson, J., 636 Koper, E.J., Paul, N.R., Warwood, S., Knight, D., et al. (2015). Definition of a consensus integrin 637 adhesome and its dynamics during adhesion complex assembly and disassembly. Nat. Cell Biol. 638 17, 1577. 639 5. LaFlamme, S.E., Mathew-Steiner, S., Singh, N., Colello-Borges, D., and Nieves, B. (2018). 640 Integrin and microtubule crosstalk in the regulation of cellular processes. Cell. Mol. Life Sci. 75, 641 4177–4185. 642 6. Sebé-Pedrós, A., Roger, A.J., Lang, F.B., King, N., and Ruiz-Trillo, I. (2010). Ancient 643 origin of the integrin-mediated adhesion and signaling machinery. Proc. Natl. Acad. Sci. 107, 644 10142. 645 7. Hehenberger, E., Tikhonenkov, D.V., Kolisko, M., del Campo, J., Esaulov, A.S., Mylnikov, 646 A.P., and Keeling, P.J. (2017). Novel Predators Reshape Holozoan Phylogeny and Reveal the 647 Presence of a Two-Component Signaling System in the Ancestor of Animals. Curr. Biol. 27, 2043-648 2050.e6. 649 8. Grau-Bové, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T.A., and 650 Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. 651 eLife 6, e26036. 652 9. de Mendoza, A., Suga, H., Permanyer, J., Irimia, M., and Ruiz-Trillo, I. (2015). Complex 653 transcriptional regulation and independent evolution of fungal-like traits in a relative of animals. 654 Elife 4, e08904. 655 10. Dudin, O., Ondracka, A., Grau-Bové, X., A.B. Haraldsen, A., Toyoda, A., Suga, H., Bråte, 656 J., and Ruiz-Trillo, I. (2019). A unicellular relative of animals generates an epithelium-like cell 657 layer by actomyosin-dependent cellularization. bioRxiv, 563726. 658 11. Brown, M.W., Sharpe, S.C., Silberman, J.D., Heiss, A.A., Lang, B.F., Simpson, A.G., and 659 Roger, A.J. (2013). Phylogenomics demonstrates that breviate flagellates are related to 660 opisthokonts and apusomonads. Proc. R. Soc. Lond. B Biol. Sci. 280, 20131755. 661 12. Adl, S.M., Bass, D., Lane, C.E., Lukeš, J., Schoch, C.L., Smirnov, A., Agatha, S., Berney, 662 C., Brown, M.W., Burki, F., et al. (2018). Revisions to the Classification, Nomenclature, and 663 Diversity of Eukaryotes. J. Eukaryot. Microbiol. 0. Available at: https://doi.org/10.1111/jeu.12691 664 [Accessed October 18, 2018]. 665

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 32: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

32

13. Cavalier-Smith, T. (2017). Origin of animal multicellularity: precursors, causes, 666 consequences—the choanoflagellate/sponge transition, neurogenesis and the Cambrian explosion. 667 Philos. Trans. R. Soc. B Biol. Sci. 372. Available at: 668 http://rstb.royalsocietypublishing.org/content/372/1713/20150476.abstract. 669 14. Kang, S., Tice, A.K., Spiegel, F.W., Silberman, J.D., Pánek, T., Čepička, I., Kostka, M., 670 Kosakyan, A., Alcântara, D., and Roger, A.J. (2017). Between a pod and a hard test: the deep 671 evolution of amoebae. Mol. Biol. Evol. 672 15. Hamann, E., Gruber-Vodicka, H., Kleiner, M., Tegetmeyer, H.E., Riedel, D., Littmann, S., 673 Chen, J., Milucka, J., Viehweger, B., Becker, K.W., et al. (2016). Environmental Breviatea 674 harbour mutualistic Arcobacter epibionts. Nature 534, 254. 675 16. Zhang, K., and Chen, J. (2012). The regulation of integrin function by divalent cations. 676 Cell Adhes. Migr. 6, 20–29. 677 17. Xia, W., and Springer, T.A. (2014). Metal ion and ligand binding of integrin α5β1. Proc. 678 Natl. Acad. Sci. 111, 17863–17868. 679 18. Campbell, I.D., and Humphries, M.J. (2011). Integrin structure, activation, and interactions. 680 Cold Spring Harb. Perspect. Biol. 3, a004994. 681 19. Lee, J.-O., Rieu, P., Arnaout, M.A., and Liddington, R. Crystal structure of the A domain 682 from the a subunit of integrin CR3 (CD11 b/CD18). Cell 80, 631–638. 683 20. Xiong, J.-P., Stehle, T., Diefenbach, B., Zhang, R., Dunker, R., Scott, D.L., Joachimiak, 684 A., Goodman, S.L., and Arnaout, M.A. (2001). Crystal structure of the extracellular segment of 685 integrin αVβ3. Science 294, 339–345. 686 21. Humphries, J.D., Chastney, M.R., Askari, J.A., and Humphries, M.J. (2019). Signal 687 transduction via integrin adhesion complexes. Cell Archit. 56, 14–21. 688 22. Srichai, M.B., and Zent, R. (2010). Integrin structure and function. In Cell-Extracellular 689 Matrix Interactions in Cancer (Springer), pp. 19–41. 690 23. Finn, R.D., Attwood, T.K., Babbitt, P.C., Bateman, A., Bork, P., Bridge, A.J., Chang, H.-691 Y., Dosztányi, Z., El-Gebali, S., Fraser, M., et al. (2017). InterPro in 2017—beyond protein family 692 and domain annotations. Nucleic Acids Res. 45, D190–D199. 693 24. Oxvig, C., and Springer, T.A. (1998). Experimental support for a beta-propeller domain in 694 integrin alpha-subunits and a calcium binding site on its lower surface. Proc. Natl. Acad. Sci. U. 695 S. A. 95, 4870–4875. 696 25. Humphries, M.J., Symonds, E.J.H., and Mould, A.P. (2003). Mapping functional residues 697 onto integrin crystal structures. Curr. Opin. Struct. Biol. 13, 236–243. 698 26. Jin, R., Trikha, M., Cai, Y., Grignon, D., and Honn, K.V. (2007). A naturally occurring 699 truncated β3 integrin in tumor cells: Native anti-integrin involved in tumor cell motility. Cancer 700 Biol. Ther. 6, 1559–1568. 701 27. Gomez, I.G., Tang, J., Wilson, C.L., Yan, W., Heinecke, J.W., Harlan, J.M., and Raines, 702 E.W. (2012). Metalloproteinase-mediated Shedding of Integrin β2 promotes macrophage efflux 703 from inflammatory sites. J. Biol. Chem. 287, 4581–4589. 704 28. Tan, S.-M., Walters, S.E., Mathew, E.C., Robinson, M.K., Drbal, K., Shaw, J.M., and Law, 705 S.K.A. (2001). Defining the repeating elements in the cysteine-rich region (CRR) of the CD18 706 integrin β2 subunit. FEBS Lett. 505, 27–30. 707 29. Russ, W.P., and Engelman, D.M. (2000). The GxxxG motif: A framework for 708 transmembrane helix-helix association11Edited by G. von Heijne. J. Mol. Biol. 296, 911–919. 709

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 33: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

33

30. Mould, A.P., Barton, S.J., Askari, J.A., Craig, S.E., and Humphries, M.J. (2003). Role of 710 ADMIDAS cation-binding site in ligand recognition by integrin α5β1. J. Biol. Chem. 278, 51622–711 51629. 712 31. Calderwood, D.A., Fujioka, Y., de Pereda, J.M., García-Alvarez, B., Nakamoto, T., 713 Margolis, B., McGlade, C.J., Liddington, R.C., and Ginsberg, M.H. (2003). Integrin β cytoplasmic 714 domain interactions with phosphotyrosine-binding domains: A structural prototype for diversity 715 in integrin signaling. Proc. Natl. Acad. Sci. 100, 2272. 716 32. Hillmann, F., Forbes, G., Novohradská, S., Ferling, I., Riege, K., Groth, M., Westermann, 717 M., Marz, M., Spaller, T., Winckler, T., et al. (2018). Multiple Roots of Fruiting Body Formation 718 in Amoebozoa. Genome Biol. Evol. 10, 591–606. 719 33. Nývltová, E., Šuták, R., Harant, K., Šedinová, M., Hrdý, I., Pačes, J., Vlček, Č., and 720 Tachezy, J. (2013). NIF-type iron-sulfur cluster assembly system is duplicated and distributed in 721 the mitochondria and cytosol of Mastigamoeba balamuthi. Proc Natl Acad Sci U A 110, 7371–722 7376. 723 34. Metcalf, J.A., Funkhouser-Jones, L.J., Brileya, K., Reysenbach, A.-L., and Bordenstein, 724 S.R. (2014). Antibacterial gene transfer across the tree of life. Elife 3, e04266. 725 35. Clarke, J., Cota, E., Fowler, S.B., and Hamill, S.J. (1999). Folding studies of 726 immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway. 727 Structure 7, 1145–1153. 728 36. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., 729 Moulton, G., Nordle, A., Paine, K., Taylor, P., et al. (2003). PRINTS and its automatic supplement, 730 prePRINTS. Nucleic Acids Res. 31, 400–402. 731 37. Cock, J.M., Sterck, L., Rouzé, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V., 732 Artiguenave, F., Aury, J.-M., Badger, J.H., et al. (2010). The Ectocarpus genome and the 733 independent evolution of multicellularity in brown algae. Nature 465, 617. 734 38. Brown, M.W., Heiss, A.A., Kamikawa, R., Inagaki, Y., Yabuki, A., Tice, A.K., Shiratori, 735 T., Ishida, K.-I., Hashimoto, T., Simpson, A.G.B., et al. (2018). Phylogenomics Places Orphan 736 Protistan Lineages in a Novel Eukaryotic Super-Group. Genome Biol. Evol. 10, 427–433. 737 39. Richter, D., Fozouni, P., Eisen, M., and King, N. (2017). The ancestral animal genetic 738 toolkit revealed by diverse choanoflagellate transcriptomes. bioRxiv. Available at: 739 http://biorxiv.org/content/early/2017/12/26/211789.abstract. 740 40. Cornillon, S., Gebbie, L., Benghezal, M., Nair, P., Keller, S., Wehrle-Haller, B., Charette, 741 S.J., Brückert, F., Letourneur, F., and Cosson, P. (2006). An adhesion molecule in free-living 742 Dictyostelium amoebae with integrin β features. EMBO Rep. 7, 617–621. 743 41. Suga, H., Chen, Z., de Mendoza, A., Sebé-Pedrós, A., Brown, M.W., Kramer, E., Carr, M., 744 Kerner, P., Vervoort, M., Sánchez-Pons, N., et al. (2013). The Capsaspora genome reveals a 745 complex unicellular prehistory of animals. 4, 2325. 746 42. Brunet, T., and King, N. (2017). The Origin of Animal Multicellularity and Cell 747 Differentiation. Dev. Cell 43, 124–140. 748 43. King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J., Fairclough, 749 S., Hellsten, U., Isogai, Y., and Letunic, I. (2008). The genome of the choanoflagellate Monosiga 750 brevicollis and the origin of metazoans. Nature 451, 783–788. 751 44. Fairclough, S.R., Chen, Z., Kramer, E., Zeng, Q., Young, S., Robertson, H.M., Begovic, 752 E., Richter, D.J., Russ, C., and Westbrook, M.J. (2013). Premetazoan genome evolution and the 753 regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta. Genome Biol. 14, 754 R15. 755

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 34: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

34

45. Pulido, D., Hussain, S.-A., and Hohenester, E. (2017). Crystal Structure of the 756 Heterotrimeric Integrin-Binding Region of Laminin-111. Struct. Lond. Engl. 1993 25, 530–535. 757 46. Calzada, M.J., Annis, D.S., Zeng, B., Marcinkiewicz, C., Banas, B., Lawler, J., Mosher, 758 D.F., and Roberts, D.D. (2004). Identification of novel β1 integrin binding sites in the type 1 and 759 type 2 repeats of thrombospondin-1. J. Biol. Chem. 279, 41734–41743. 760 47. Hynes, R.O. (2012). The evolution of metazoan extracellular matrix. J. Cell Biol. 196, 671. 761 48. Babushok, D.V., Ohshima, K., Ostertag, E.M., Chen, X., Wang, Y., Mandal, P.K., Okada, 762 N., Abrams, C.S., and Kazazian, H.H., Jr (2007). A novel testis ubiquitin-binding protein gene 763 arose by exon shuffling in hominoids. Genome Res. 17, 1129–1138. 764 49. Adams, J.C. (2013). Extracellular Matrix Evolution: An Overview. In Evolution of 765 Extracellular Matrix, F. W. Keeley and R. P. Mecham, eds. (Berlin, Heidelberg: Springer Berlin 766 Heidelberg), pp. 1–25. Available at: https://doi.org/10.1007/978-3-642-36002-2_1. 767 50. Lodish, H., Darnell, J.E., Berk, A., Kaiser, C.A., Krieger, M., Scott, M.P., Bretscher, A., 768 Ploegh, H., and Matsudaira, P. (2008). Molecular cell biology (Macmillan). 769 51. Creighton, T.E. (1993). Proteins: structures and molecular properties (Macmillan). 770 52. Branden, C.I., and Tooze, J. (2012). Introduction to protein structure (Garland Science). 771 53. Jacob, J., Duclohier, H., and Cafiso, D.S. (1999). The Role of Proline and Glycine in 772 Determining the Backbone Flexibility of a Channel-Forming Peptide. Biophys. J. 76, 1367–1376. 773 54. Li, R., Mitra, N., Gratkowski, H., Vilaire, G., Litvinov, R., Nagasami, C., Weisel, J.W., 774 Lear, J.D., DeGrado, W.F., and Bennett, J.S. (2003). Activation of integrin αIIbß3 by modulation 775 of transmembrane helix associations. Science 300, 795–798. 776 55. Schneider, D., and Engelman, D.M. (2004). Involvement of transmembrane domain 777 interactions in signal transduction by α/β integrins. J. Biol. Chem. 279, 9840–9846. 778 56. Clarke, M., Lohan, A.J., Liu, B., Lagkouvardos, I., Roy, S., Zafar, N., Bertelli, C., Schilde, 779 C., Kianianmomeni, A., and Burglin, T.R. (2013). Genome of Acanthamoeba castellanii highlights 780 extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol 14. 781 57. Tice, A.K., Shadwick, L.L., Fiore-Donno, A.M., Geisen, S., Kang, S., Schuler, G.A., 782 Spiegel, F.W., Wilkinson, K.A., Bonkowski, M., Dumack, K., et al. (2016). Expansion of the 783 molecular and morphological diversity of Acanthamoebidae (Centramoebida, Amoebozoa) and 784 identification of a novel life cycle type within the group. Biol. Direct 11, 69. 785 58. Urushihara, H., Kuwayama, H., Fukuhara, K., Itoh, T., Kagoshima, H., Shin-I, T., Toyoda, 786 A., Ohishi, K., Taniguchi, T., Noguchi, H., et al. (2015). Comparative genome and transcriptome 787 analyses of the social amoeba Acytostelium subglobosum that accomplishes multicellular 788 development without germ-soma differentiation. BMC Genomics 16, 80–80. 789 59. Lahr, D.J.G., Kosakyan, A., Lara, E., Mitchell, E.A.D., Morais, L., Porfirio-Sousa, A.L., 790 Ribeiro, G.M., Tice, A.K., Pánek, T., Kang, S., et al. (2019). Phylogenomics and Morphological 791 Reconstruction of Arcellinida Testate Amoebae Highlight Diversity of Microbial Eukaryotes in 792 the Neoproterozoic. Curr. Biol. 29, 991-1001.e3. 793 60. Tekle, Y.I., Anderson, O.R., Katz, L.A., Maurer-Alcalá, X.X., Romero, M.A.C., and 794 Molestina, R. (2016). Phylogenomics of ‘Discosea’: A new molecular phylogenetic perspective 795 on Amoebozoa with flat body forms. Mol Phylogenet Evol 99, 144–154. 796 61. Heidel, A.J., and Glöckner, G. (2008). Mitochondrial Genome Evolution in the Social 797 Amoebae. Mol. Biol. Evol. 25, 1440–1450. 798 62. Eichinger, L., Pachebat, J.A., Glockner, G., Rajandream, M.A., Sucgang, R., Berriman, M., 799 Song, J., Olsen, R., Szafranski, K., Xu, Q., et al. (2005). The genome of the social amoeba 800 Dictyostelium discoideum. Nature 435, 43–57. 801

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 35: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

35

63. Gonzales, C.M., Spencer, T.D., Pendley, S.S., and Welker, D.L. (1999). Dgp1 and Dfp1 802 Are Closely Related Plasmids in theDictyosteliumDdp2 Plasmid Family. Plasmid 41, 89–96. 803 64. Sucgang, R., Kuo, A., Tian, X., Salerno, W., Parikh, A., Feasley, C.L., Dalin, E., Tu, H., 804 Huang, E., Barry, K., et al. (2011). Comparative genomics of the social amoebae Dictyostelium 805 discoideum and Dictyostelium purpureum. Genome Biol. 12, R20–R20. 806 65. Loftus, B., Anderson, I., Davies, R., Alsmark, U.C.M., Samuelson, J., Amedeo, P., 807 Roncaglia, P., Berriman, M., Hirt, R.P., Mann, B.J., et al. (2005). The genome of the protist 808 parasite Entamoeba histolytica. Nature 433, 865. 809 66. Ehrenkaufer, G.M., Weedall, G.D., Williams, D., Lorenzi, H.A., Caler, E., Hall, N., and 810 Singh, U. (2013). The genome and transcriptome of the enteric parasite Entamoeba invadens, a 811 model for encystation. Genome Biol. 14, R77. 812 67. Tachibana, H., Yanagi, T., Pandey, K., Cheng, X.-J., Kobayashi, S., Sherchand, J.B., and 813 Kanbara, H. (2007). An Entamoeba sp. strain isolated from rhesus monkey is virulent but 814 genetically different from Entamoeba histolytica. Mol. Biochem. Parasitol. 153, 107–114. 815 68. Schaap, P., Barrantes, I., Minx, P., Sasaki, N., Anderson, R.W., Bénard, M., Biggar, K.K., 816 Buchler, N.E., Bundschuh, R., Chen, X., et al. (2015). The Physarum polycephalum Genome 817 Reveals Extensive Use of Prokaryotic Two-Component and Metazoan-Type Tyrosine Kinase 818 Signaling. Genome Biol. Evol. 8, 109–125. 819 69. Heidel, A.J., Lawal, H.M., Felder, M., Schilde, C., Helps, N.R., Tunggal, B., Rivero, F., 820 John, U., Schleicher, M., and Eichinger, L. (2011). Phylogeny-wide analysis of social amoeba 821 genomes highlights ancient origins for complex intercellular communication. Genome Res. 822 70. Cavalier-Smith, T., Fiore-Donno, A.M., Chao, E., Kudryavtsev, A., Berney, C., Snell, E.A., 823 and Lewis, R. (2015). Multigene phylogeny resolves deep branching of Amoebozoa. Mol 824 Phylogenet Evol 83, 293–304. 825 71. Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large 826 sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. 827 72. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., and Amit, I. (2011). 828 Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat 829 Biotechnol 29. 830 73. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, 831 M.B., Eccles, D., Li, B., Lieber, M., et al. (2013). De novo transcript sequence reconstruction from 832 RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 8, 1494–833 1512. 834 74. Croning, M.D.R., Apweiler, R., and Möller, S. (2001). Evaluation of methods for the 835 prediction of membrane spanning regions. Bioinformatics 17, 646–653. 836 75. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat 837 Methods 9. 838 76. Buchfink, B., Xie, C., and Huson, D.H. (2014). Fast and sensitive protein alignment using 839 DIAMOND. Nat. Methods 12, 59. 840 77. Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, 841 D.S., and Stoeckert, C.J. (2002). Using OrthoMCL to Assign Proteins to OrthoMCL-DB Groups 842 or to Cluster Proteomes Into New Ortholog Groups. In Current Protocols in Bioinformatics (John 843 Wiley & Sons, Inc.). 844 78. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 845 Illumina sequence data. Bioinformatics 30. 846

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 36: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

36

79. Katoh, K., and Standley, D.M. (2013). MAFFT Multiple Sequence Alignment Software 847 Version 7: Improvements in Performance and Usability. Mol Biol Evol 30. 848 80. Criscuolo, A., and Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with 849 Entropy): a new software for selection of phylogenetic informative regions from multiple sequence 850 alignments. BMC Evol Biol 10. 851 81. Nguyen, L.T., Schmidt, H.A., Haeseler, A., and Minh, B.Q. (2014). IQ-TREE: A Fast and 852 Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol 853 32. 854 82. Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq 855 data with or without a reference genome. BMC Bioinformatics 12. 856 83. Almagro Armenteros, J.J., Tsirigos, K.D., Sønderby, C.K., Petersen, T.N., Winther, O., 857 Brunak, S., von Heijne, G., and Nielsen, H. (2019). SignalP 5.0 improves signal peptide 858 predictions using deep neural networks. Nat. Biotechnol. 37, 420–423. 859 84. MacManes, M.D. (2017). The Oyster River Protocol: A Multi Assembler and Kmer 860 Approach For de novo Transcriptome Assembly. bioRxiv. Available at: 861 http://biorxiv.org/content/early/2017/08/16/177253.abstract. 862 85. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, 863 W.W., and Noble, W.S. (2009). MEME Suite: tools for motif discovery and searching. Nucleic 864 Acids Res. 37, W202–W208. 865 86. Nielsen, H., Almagro Armenteros, J.J., Sønderby, C.K., Sønderby, S.K., and Winther, O. 866 (2017). DeepLoc: prediction of protein subcellular localization using deep learning. 867 Bioinformatics 33, 3387–3395. 868 87. Huerta-Cepas, J., Serra, F., and Bork, P. (2016). ETE 3: Reconstruction, Analysis, and 869 Visualization of Phylogenomic Data. Mol. Biol. Evol. 33, 1635–1638. 870 88. El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C., Qureshi, M., 871 Richardson, L.J., Salazar, G.A., Smart, A., et al. (2018). The Pfam protein families database in 872 2019. Nucleic Acids Res. 47, D427–D432. 873 89. Van Dongen, S.M. (2000). Graph clustering by flow simulation. 874 90. Blandenier, Q., Seppey, C.V.W., Singer, D., Vlimant, M., Simon, A., Duckert, C., and Lara, 875 E. (2017). Mycamoeba gemmipara nov. gen., nov. sp., the First Cultured Member of the 876 Environmental Dermamoebidae Clade LKM74 and its Unusual Life Cycle. J. Eukaryot. Microbiol. 877 64, 257–265. 878 91. Picelli, S., Faridani, O.R., Bjorklund, A.K., Winberg, G., Sagasser, S., and Sandberg, R. 879 (2014). Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9. 880 92. Onsbring, H., Tice, A.K., Barton, B.T., Brown, M.W., and Ettema, T.J.G. (2019). An 881 efficient single-cell transcriptomics workflow to assess protist diversity and lifestyle. bioRxiv, 882 782235. 883 93. Boratyn, G.M., Schäffer, A.A., Agarwala, R., Altschul, S.F., Lipman, D.J., and Madden, 884 T.L. (2012). Domain enhanced lookup time accelerated BLAST. Biol. Direct 7, 12–12. 885 94. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L.L. (2001). Predicting 886 transmembrane protein topology with a hidden markov model: application to complete 887 genomes11Edited by F. Cohen. J. Mol. Biol. 305, 567–580. 888 95. Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, 889 D.S., and Stoeckert, C.J., Jr (2011). Using OrthoMCL to assign proteins to OrthoMCL-DB groups 890 or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinforma. Chapter 6, Unit-891 6.12.19. 892

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 37: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

37

96. Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., and Liu, S. (2014). SOAPdenovo-Trans: 893 de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30. 894 895 896 897 898 METHODS 899

900

Method Details 901

We have strategically sampled 118 Amorphea species and their transcriptomes and genomes to 902

cover all known Amorphea subgroups. Additionally, we also searched for IMAC proteins in any 903

of the eukaryotic data in NCBI. 904

905

Culturing and Isolation of Mycamoeba gemmipara 906

Mycamoeba gemmipara was obtained from Q. Blandenier and cultured as in Blandenier et al. 907

2017[90]. 908

909

Ultra-low input/ Single-Cell cDNA library construction based on Smart-Seq2 910

The cultured amoeba from Mycamoeba gemmipara were subjected to Smart-Seq2 [91] for cDNA 911

library construction as in Onsbring et al. 2019[92]. About 50 cells scraped off of a agar plate using 912

a 30-gauge platinum wire placed directly into a 200 µL thin-walled PCR tube containing the cell 913

lysis mixture [91]. Each reaction was then subjected to six freeze-thaw cycles in -80 °C 914

isopropanol and ~25 °C DI H2O respectively, before following the remainder of the protocol. A 915

modified version of Smart-Seq2 [91] was used for RNA isolation and dscDNA preparation. cDNA 916

libraries were prepared using a Nextera XT DNA Library Prep Kit (Illumina, CA) following the 917

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 38: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

38

manufacturer's protocol with dual index primers. The Mycamoeba gemmipara library was 918

sequenced using an Illumina HiSeq 4000 at Genome Quebec. 919

920

Transcriptomic Sequencing Assembly 921

Raw sequence data from the Illumina sequencing platform or from Bioproject accession number 922

PRJNA380424 [14] were subjected to TRIMMOMATIC V 0.35 [78], for cleaning and trimming of 923

poorly called reads or specific to the adaptor will be used in sequencing with the paired-end data. 924

Quality filtered and cleaned high quality reads were assembled through TRINITY [73] for de novo 925

assembly. Nucleotide sequences were translated with TRANSDECODER V 5.5.0 926

(https://github.com/TransDecoder/TransDecoder/) which also predicts open reading frames, and 927

PFAM domains [88] as well. Contigs of transcriptomes were blasted against the non-redundant 928

database of NCBI using the BLASTX program of BLAST+ suite [93]. Each predicted ORFs were 929

assigned an ortholog number through the OrthoMCL v5 [77] database. For taxa with clearly 930

truncated predicted proteins, the computationally laborious transcriptome assembly pipeline, 931

Oyster River Protocol[84], was performed in an attempt to retrieve predicted a full-length integrin 932

proteins. 933

934

Integrin Protein analysis 935

We selected a model organism IMAC proteins from NCBI: ITGA5:P08648, ITGB1:NP_002202, 936

Talin:AAF27330, Parvin:AAH16713, PINCH:NP_060450.2, vinculin:AAH39174, 937

FAK:AAA35819 Paxillin:AAC50104 ILK:NP_001014794, a-actinin:AAC17470, 938

Filamin:AAF72339 and Tensin:AAG33700. INTERPROSCAN 5.27-66.0[23] was used to determine 939

these IMAC protein domain architecture along with SIGNALIP v 5.0 [83] and TMHMM v 2.0 [94]. 940

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 39: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

39

MEME-SUITE 5.0.4 [85]was used to examine integrin motifs. DEEPLOC v 1.0[86] was used to 941

determine the subcellular localization of integrin proteins. We set the minimum criteria of 942

canonical IMAC proteins based on their architecture and motifs. We used OrthoMCL to assign 943

IMAC proteins their own ortholog numbers. 944

945

Since OrthoMCL database is heavily metazoan biased, we created an ortholog database with 946

eukaryotes listed in resources table. A modified version of OrthoMCL-DB [95] was used to create 947

a novel ortholog database using the above listed transcriptomes and genomes as well as the whole 948

OrthoMCL DB v5. To do this, an all-against-all BLAST using DIAMOND-BLASTP[76] was 949

conducted using each protein from the above data as queries. The DIAMOND-BLASTP collected up 950

to 1,000 hits with e-value of 1e-5 as a cutoff for putative homology. Blast results were clustered 951

using the OrthoMCL pipeline methodology [95]. 952

953

BLASTP was used to search for novel amoebozoan IMAC proteins through the new ortholog 954

database and using the previous amoebozoan ortholog protein as a query. Using a modified custom 955

pipeline was used to collect a novel IMAC orthologs. These methods generated various orthologs 956

with a similar protein architecture. We created a FASTA file of all novel eukaryote integrin 957

proteins based on their protein architecture. Clustering of these integrin proteins was performed 958

by CD- HIT [71], which used the condition of 0.95 global sequence identity. From CD-HIT output, 959

we used DIAMOND [76] to perform All-vs-ALL blast with database size of 50 million sequences. 960

Consequently, a custom script of Markov cluster (MCL) algorithm[89] was used to cluster the 961

output of DIAMOND. A custom PYTHON script was used to eliminate any MCL clustered integrin 962

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 40: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

40

proteins that were less than 500 AA. We used a custom PYTHON pipeline to collects ortholog 963

sequences that clustered to model metazoan IMAC orthologs listed above. 964

965

Sometimes, we fail to capture integrin proteins even after all these strenuous processes. Therefore, 966

PFAM ID of Integrins was used to grab any putative integrins from TRANSDECODER output. For 967

ITGA, we used FG-GAP (PF01839), and for ITGB we used PSI_integrin (PF17205), integrin_beta 968

(PF00362), integrin_b_cyt (PF08725) and integrin B tail (PF07965). We disregarded any protein 969

sequences with PFam ID lower than e-value of 1e-10. The identity of the putative integrins were 970

confirmed by INTERPROSCAN, SIGNALIP, TMHMM, DEEPLOC, and MEME-SUITE. 971

972

For each putative integrin protein, we examined proteins motifs with MEME-SUITE, and always 973

blasted our proteins of interest using BLASTP against NCBI’s GenBank NR database to check for 974

possible contaminations. For integrin proteins with a novel domains, the read depth of these 975

transcripts were examined with RSEM. The final amoebozoan integrin architectures were compared 976

with metazoan integrin proteins. A custom PYTHON script was used to create a presence/absence 977

binary (1,0) matrix of IMAC proteins across Amoebozoa. A custom R script were used to create a 978

heatmap. 979

980

In our current study these data are transcriptomic in nature and by this virtue some of the predicted 981

transcript may be truncated or missing. However, identifying full-length transcripts from short 982

reads from Illumina technology can be difficult because assembled transcript can be fragmented 983

or incomplete due to alternative splice sites and untranslated regions [96]. Therefore, it is possible 984

that some transcripts are not fully assembled, and therefore their corresponding predicted protein 985

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 41: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

41

is truncated, thus the transmembrane region and signal peptide region may be missing. The same 986

can be said for the absence of these proteins within our data, because transcriptomes contain only 987

genes being expressed at the time mRNA was harvested some genes within an organism’s genome 988

may be missed. None-the-less, our methodology permitted the unparalleled discovery of these 989

proteins within a whole supergroup, highlighting the utility of the method. 990

991

Manually assembled contigs in SEQUENCHER 992

Where necessary due to fragmentation, contigs of our automated de novo transcriptomic 993

assemblies as above from each gene of interest were blasted (BLASTN) back to the raw nucleotide 994

data and to the assembly for each transcriptome. Hits were collected and assembled using 995

SEQUENCHER v 5.4.6 (GeneCodes, Madison, WI, USA). The taxa which we had to take this 996

approach were Amphizonella sp. 2, Centropyxis aerophyla, Goncevia foncevia and Hyalosphenia 997

papilio, Pellita catalonica for ITGA, and Nebela sp. for both integrin proteins. 998

999

Presence of integrin in the Mastigamoeba genome 1000

To confirm the presence of these genes in the genome, we blasted the integrin transcripts of M. 1001

balamuthi to the whole shotgun genome data (CBKX00000000; 1002

https://www.ncbi.nlm.nih.gov/nuccore/CBKX000000000.1) on NCBI [33]. The common splicing 1003

sites were for searched in integrins intron and exon boundaries by assembling the integrin genomic 1004

contig and integrin transcript in SEQUENCHER v 5.4.6. The genome contigs of which contained 1005

integrin gene were searched for additional ORFs to examine the phylogenetic affinity of other 1006

ORFs on the genomic contigs. We inferred maximum likelihood phylogenetic tree of tenascin 1007

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 42: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

42

(adjacent to the ITGB gene on CBKX010020318.1) and PA14 (adjacent to the ITGB gene on 1008

CBKX010020319.1), we used same conditions as the integrin phylogenetic trees (see below). 1009

1010 Phylogenetic trees 1011

To look for evolutionary relationships of ITGA and ITGB within Amorphea, protein sequences 1012

were aligned by MAFFT-LINSI with the parameters “--maxiterate 1000” and “--local pair”. 1013

Ambiguous sites were trimmed from the alignments using BMGE [80] by gap penalty of 0.8 settings. 1014

Maximum likelihood (ML) trees were inferred from these trimmed alignments in IQTREE v 1.5.5 1015

[81]under the LG model with the C60 series model of site heterogeneity. Each tree is ML 1016

bootstrapped (MLBS) by 1,000 pseudoreplicates. From the resultant tree we used a custom 1017

PYTHON script implementing ETE-TOOLKIT (www.etetoolkit.org) to map protein domain 1018

architecture (IPRSCAN domain IDs) onto the tree. 1019

1020

Data and software availability 1021

The accession number for the Mycamoeba gemmipara and Mastigamoeba balamuthi 1022

transcriptome raw reads is NCBI-SRA: XXXXX and XXXXX also shown in Key resources table. 1023

All protein and nucleotide sequences of each gene, all protein alignments (trimmed and untrimmed) 1024

for phylogenetic trees, and phylogenetic trees are available on the DRYAD repository 1025

(https://datadryad.org/ | Accession: XXXXX). 1026

1027

1028

Supplemental Text 1029

SUPPLEMENTAL METHODS, NOTES, AND RESULTS 1030 Key resources Table 1031

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 43: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

43

Resources Sources Identifier Data Examined 1032 Acanthamoeba castellanii [56] https://doi.org/10.1186/gb-2013-14-2-r11 Acanthamoeba pyroformis [57] https://doi.org/10.1186/s13062-016-0171-

0 Acanthamoeba astronyxis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687/ Acanthamoeba comandoni PRJNA354221 https://www.ncbi.nlm.nih.gov/bioproject/

PRJNA354221 Acanthamoeba culbertsoni PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687 Acanthamoeba divionensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687 Acanthamoeba healyi PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687 Acanthamoeba lenticulate PRJNA374454 https://www.ncbi.nlm.nih.gov/bioproject/

PRJNA374454 Acanthamoeba lugdunensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687 Acanthamoeba mauritaniensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687 Acanthamoeba pearcei PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/

PRJEB7687

Acanthamoeba polyphaga PRJNA307312 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA307312

Acanthamoeba quina PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB7687

Acanthamoeba rhysodes PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB7687

Acanthamoeba royreba PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB7687

Acytostelium ellipticum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640

Acytostelium leptosomum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640

Acytostelium subglobosum [58] https://doi.org/10.1186/s12864-015-1278-x

Amoeba proteus [14] https://doi.org/10.1093/molbev/msx162

Amphizonella sp. 2 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 7 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 8 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 9 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp.10 [59] https://doi.org/10.1093/molbev/msx162

Arcella gibbosa [14] https://doi.org/10.1093/molbev/msx162

Arcella vulgaris [59] https://doi.org/10.1016/j.cub.2019.01.078

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 44: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

44

Armaparvus languidus MMETSP0420 https://www.imicrobe.us/project/view/304

Cavenderia deminutiva

PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640

Cavostelium apophysatum [14] https://doi.org/10.1093/molbev/msx162

Centropyxis aerophyla [59] https://doi.org/10.1016/j.cub.2019.01.078 Centropyxis sp. [59] https://doi.org/10.1016/j.cub.2019.01.078

Ceratiomyxa fruticulosa [14] https://doi.org/10.1093/molbev/msx162

Ceratiomyxella tahitiensis [14] https://doi.org/10.1093/molbev/msx162

Clastostelium recurvatum [14] https://doi.org/10.1093/molbev/msx162

Clydonella sp. [60] https://doi.org/10.1016/j.ympev.2016.03.029

Cochliopodium minus [14] https://doi.org/10.1093/molbev/msx162

Copromyxa protea [14] https://doi.org/10.1093/molbev/msx162

Coremiostelium polycephalum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640

Cryptodifflugia operculata [14] https://doi.org/10.1093/molbev/msx162

Cunea sp. [JDS-Ruffled] [14] https://doi.org/10.1093/molbev/msx162

Cunea sp. [MMETSP0417] MMETSP0417 https://data.iplantcollaborative.org/dav/iplant/commons/community_released/imicrobe/projects/104/samples/1830/

Cyclopyxis sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Dermamoeba algensis [14] https://doi.org/10.1093/molbev/msx162

Dictyostelium citrinum [61] https://doi.org/10.1093/molbev/msn088 Dictyostelium discoideum [62] https://doi.org/10.1038/nature03481 Dictyostelium firmibasis [63] https://doi.org/10.1006/plas.1998.1385 Dictyostelium intermedium PRJNA45879 https://www.ncbi.nlm.nih.gov/bioproject/

PRJNA45879 Dictyostelium purpureum [64] https://doi.org/10.1006/10.1186/gb-2011-

12-2-r20 Difflugia bryophila [59] https://doi.org/10.1016/j.cub.2019.01.078

Difflugia sp. [14] https://doi.org/10.1093/molbev/msx162

Difflugia sp. D2 [59] https://doi.org/10.1016/j.cub.2019.01.078

Diplochlamys sp. [14] https://doi.org/10.1093/molbev/msx162

Dracoamoeba jomungandrii [57] https://doi.org/10.1186/s13062-016-0171-0

Echinamoeba exudans [14] https://doi.org/10.1093/molbev/msx162

Echinosteliopsis oligospora [14] https://doi.org/10.1093/molbev/msx162

Echinostelium bisporum [14] https://doi.org/10.1093/molbev/msx162

Echinostelium minutum [14] https://doi.org/10.1093/molbev/msx162

Endostelium zonatum [PRA-191] [57] https://doi.org/10.1186/s13062-016-0171-0

Endostelium zonatum LINKS [14] https://doi.org/10.1093/molbev/msx162

Entamoeba dispar PRJNA12916 https://www.ncbi.nlm.nih.gov/bioproject/12916

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 45: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

45

Entamoeba histolytica [65] https://doi.org/10.1038/nature03291 Entamoeba invadens [66] https://doi.org/10.1186/gb-2013-14-7-r77 Entamoeba moshkovskii PRJNA12921 https://www.ncbi.nlm.nih.gov/bioproject/

12921 Entamoeba nuttalli [67] https://doi.org/10.1016/j.molbiopara.2007.

02.006

Filamoeba nolandi MMETSP0413 https://www.imicrobe.us/assembly/view/293

Flabellula citata [14] https://doi.org/10.1093/molbev/msx162

Flamella aegyptia [14] https://doi.org/10.1093/molbev/msx162

Gocevia fonbrunei [14] https://doi.org/10.1093/molbev/msx162

Gocevia fonbrunei_Tekle_2016 [60] https://doi.org/10.1016/j.ympev.2016.03.029

Grellamoeba robusta [14] https://doi.org/10.1093/molbev/msx162

Heleopera sphagni [59] https://doi.org/10.1016/j.cub.2019.01.078 Heleopera sylvatica [59] https://doi.org/10.1016/j.cub.2019.01.078

Heterostelium multicystogenum PRJNA495730 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA495730

Heterostelium pallidum [61] http://dictybase.org/ Hyalosphenia elegance [59] https://doi.org/10.1016/j.cub.2019.01.078

Hyalosphenia papilio [59] https://doi.org/10.1016/j.cub.2019.01.078 Idionectes vortex PRJNA531640 https://www.ncbi.nlm.nih.gov/bioproject/?

term=Idionectes%20vortex Lesquereusia sp. [59] https://doi.org/10.1016/j.cub.2019.01.078

Lingulamoeba sp. [14] https://doi.org/10.1093/molbev/msx162

Luapelamoeba arachisporum [57] https://doi.org/10.1186/s13062-016-0171-0

Luapelamoeba hula [57] https://doi.org/10.1186/s13062-016-0171-0

Mastigamoeba abducta [14] https://doi.org/10.1093/molbev/msx162

Mastigamoeba balamuthi (genome) [33] http://doi.org/10.1073/pnas.1219590110

Mastigamoeba balamuthi (transcriptome)

SRA TBD To be provided

Mastigella eilhardi [14] https://doi.org/10.1093/molbev/msx162

Mayorella cantabrigiensis [14] https://doi.org/10.1093/molbev/msx162

Micriamoeba sp. [14] https://doi.org/10.1093/molbev/msx162

Microchlamys sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Mycamoeba gemmipara SRA TBD https://doi.org/10.1111/jeu.1235 Nebela sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Nematostelium gracile [14] https://doi.org/10.1093/molbev/msx162

Netzelia oviformis [59] https://doi.org/10.1016/j.cub.2019.01.078 Nolandella sp. [14] https://doi.org/10.1093/molbev/msx162

Ovalopodium desertum [14] https://doi.org/10.1093/molbev/msx162

Paradermamoeba levis [14] https://doi.org/10.1093/molbev/msx162

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 46: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

46

Paramoeba aestuarina MMETSP0161_2 https://www.imicrobe.us/assembly/view/205

Paramoeba atlantica MMETSP0151_2 https://www.imicrobe.us/assembly/view/210

Parvamoeba rugata [14] https://doi.org/10.1093/molbev/msx162

Parvamoeba monoura [32] https://doi.org/10.1016/j.ympev.2016.03.029

Pellita catalonica [14] https://doi.org/10.1093/molbev/msx162

Pelomyxa sp. [14] https://doi.org/10.1093/molbev/msx162

Phalansterium solitarium [14] https://doi.org/10.1093/molbev/msx162

Physarum polycephalum [68] https://dx.doi.org/10.1093/gbe/evv237 Pygsuia biforma [11] http://doi.org/10.1098/rspb.2013.1755

Planocarina carinata [59] https://doi.org/10.1016/j.cub.2019.01.078 Polysphondylium pallidum [69] http://dictybase.org/ Protacanthamoeba bohemica [14] https://doi.org/10.1093/molbev/msx162

Protosporangium articulatum [14] https://doi.org/10.1093/molbev/msx162

Protostelium nocturnum [14] https://doi.org/10.1093/molbev/msx162

Pyxidicula operculatum [59] https://doi.org/10.1016/j.cub.2019.01.078

Rhizamoeba saonxica [14] https://doi.org/10.1093/molbev/msx162

Rhizomastix elongate [14] https://doi.org/10.1093/molbev/msx162

Rhizomastix libera [14] https://doi.org/10.1093/molbev/msx162

Ripella sp. [14] https://doi.org/10.1093/molbev/msx162

Sapocribrum chincoteaguense MMETSP0437 https://www.imicrobe.us/assembly/view/314

Sappinia pedata [14] https://doi.org/10.1093/molbev/msx162

Schizoplasmodiopsis pseudoendospora

[14] https://doi.org/10.1093/molbev/msx162

Schizoplasmodiopsis vulgaris [14] https://doi.org/10.1093/molbev/msx162

Speleostelium caveatum PRJNA495862 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA495862

Soliformovum irregularis [14] https://doi.org/10.1093/molbev/msx162

Squamamoeba japonica [14] https://doi.org/10.1093/molbev/msx162

Stenamoeba limacine [14] https://doi.org/10.1093/molbev/msx162

Stenamoeba stenopodia [14] https://doi.org/10.1093/molbev/msx162

Stygamoeba regulata MMETSP0447 https://www.imicrobe.us/assembly/view/316

Synstelium polycarpum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640

Thecamoeba quadrilineata [60] https://doi.org/10.1016/j.ympev.2016.03.029

Thecamoeba sp. [14] https://doi.org/10.1093/molbev/msx162

Thecamoebidae isolate [RHP1-1] [14] https://doi.org/10.1093/molbev/msx162

Trichosphaerium sp. (ATCC 40318) MMETSP0405 https://www.imicrobe.us/assembly/view/300

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 47: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

47

Tychosporium acutostipes [14] https://doi.org/10.1093/molbev/msx162

Vannella fimicola [14] https://doi.org/10.1093/molbev/msx162

Vannella robusta MMETSP0166 https://www.imicrobe.us/?#/samples/2469

Vannella schaefferi MMETSP0417 https://www.imicrobe.us/assembly/view/295

Vannella sp. MMETSP0168 https://www.imicrobe.us/assembly/view/34

Vermamoeba sp. [14] https://doi.org/10.1093/molbev/msx162

Vermamoeba vermiformis [14] https://doi.org/10.1093/molbev/msx162

Vermistella antarctica [60] https://doi.org/10.1016/j.ympev.2016.03.029

Vexillifera bacillipedes [70] http://doi.org/10.1016/j.ympev.2014.08.011

Vexillifera sp. MMETSP0173 https://www.imicrobe.us/#/samples/1757

List of Stramenophile TAXA from MMETSP 1033 Alexandrium minutum MOORE-via-CAMERA-MMETSP0328 1034 Aureoumbra_lagunensis CCMP1509 MMETSP0890 1035 Bolidomonas_sp RCC1657 MMETSP1321 1036 Bolidomonas_sp RCC2347 MMETSP1320 1037 Bolidomonas pacifica MMETSP1319-RCC208 1038 Cafeteria_sp CaronLabIsolate MMETSP1104 1039 Chattonella_subsalsa CCMP2191 MMETSP0947 1040 Chrysocystis_fragilis CCMP3189 MMETSP1165 **** 1041 Dictyocha_speculum CCMP1381 MMETSP1174 1042 Dinobryon_sp UTEXLB2267 MMETSP0019 1043 Fibrocapsa japonica-CCMP1661 MOORE-via-CAMERA-MMETSP1339 1044 Florenciella_sp RCC1587 MMETSP1324 Stramenopiles 1045 Florenciella parvula MMETSP1323-RCC1693 1046 Heterosigma_akashiwo CCMP2393 MMETSP0292 1047 Love2098 non_described LOVEJOY CMP2098 MMETSP0990 1048 Mallomonas_sp CCMP3275 MMETSP1167 1049 Ochromonas_sp BG1 MMETSP1105 1050 Paraphysomonas_bandaiensis CaronLabIsolate MMETSP1103 1051 Paraphysomonas_vestita GFlagA MMETSP1107 1052 Pelagomonas_calceolata CCMP1756 MMETSP0886 1053 Pelagococcus_subviridis CCMP1429 MMETSP0882 1054 Phaeomonas_parva CCMP2877 MMETSP1163 1055 Pinguiococcus_pyrenoidosus CCMP2078 MMETSP1160 1056 Pseudopedinella_elastica CCMP716 MMETSP1068 1057 Rhizochromulina_marina cf CCMP1243 MMETSP1173 1058 Spumella_elongata CCAP955 1 MMETSP1098 1059 Thraustochytrium_sp LLF1b MMETSP0198 1060 Vaucheria_litorea CCMP2940 MMETSP0945 1061

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 48: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

48

1062 1063 Reagents

2-propanol(certified ACS) Ethanol (pure) 200 proof

Sigma-aldrich I9516-25ML

Critical Commercial Assay Kits

Nextera XT DNA Library Prep Kit Illumina FC-131-1096 Nextera XT index kit v2 set A Illumina FC-131-2001 Nextera XT index kit v2 set B Illumina FC-131-2002 Nextera XT index kit v2 set C Illumina FC-131-2003 Nextera XT index kit v2 set C Illumina FC-131-2003

1064 Software and Algorithms

CD-HIT [71] https://github.com/weizhongli/cdhit TRINITY V2.4.0 [72] https://github.com/trinityrnaseq/trinityrnase

q/ TRANSDECODER V 5.5.0 [73] https://transdecoder.github.io/ TMHMM V 2.0 [74] http://www.cbs.dtu.dk/services/TMHMM/ BOWTIE2 V 2.3.4.3 [75] https://sourceforge.net/projects/bowtie-

bio/files/bowtie2/2.3.3.1 DIAMOND V 0.9.25 [76] https://github.com/bbuchfink/diamond ORTHOMCL V 5.0 [77] http://orthomcl.org/orthomcl/ TRIMMOMATIC V 0.35 [78] https://github.com/timflutre/trimmomatic MAFFT-LINSI V 7 [79] https://mafft.cbrc.jp/alignment/server/ BMGE V 1.12 [80] ftp://ftp.pasteur.fr/pub/GenSoft/projects/BM

GE SEQUENCER v 5.4.6. http://www.genecodes.com/ BLAST 2.2.30+ https://blast.ncbi.nlm.nih.gov/ IQTREE v 1.5.5 [81] http://www.iqtree.org RSEM [82] https://github.com/deweylab/RSEM SIGNALIP 5.0 [83] http://www.cbs.dtu.dk/services/SignalP/ OYSTER RIVER PROTOCOL V 2.1.1 [84] https://oyster-river-

protocol.readthedocs.io/en/latest/ MEME-SUITE V 5.0.4 [85] http://meme-suite.org/index.html DEEPLOC -1.0 [86] http://www.cbs.dtu.dk/services/DeepLoc/ INTERPROSCAN 5.27-66.0 [23] https://www.ebi.ac.uk/interpro/search/seque

nce-search ETE3 [87] http://etetoolkit.org/documentation/ete-

view/ PFAM [88] https://pfam.xfam.org/ MCL [89] https://micans.org/mcl/

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 49: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

49

1065 1066 1067

Contact for Reagent and resources sharing 1068

Further information and request for resources should be directed to, and will be fulfilled by, the 1069

corresponding author Matthew W. Brown ([email protected]). 1070

1071

Sib (similar to integrin beta) proteins found in dictyostelids | Sib Proteins comparison with 1072

ITB-like 1073

SibA, a cell adhesion molecule that binds to phagocytic particles similar to ITB, was discovered 1074

in Dictyostelium discoideum [45]. Sib proteins (A to E) are capable of binding to talin and SibC 1075

regulates cell adhesion [46]. However their protein structure is different from the canonical ITB 1076

in a lack of three cation binding motifs as well as a cysteine rich region at the C-terminus [26]. 1077

Although the varioseans, described above, are phylogenetically close to dictyostelids their type 1078

two ITB-like proteins with three cation motifs are distinct from Sib proteins, though with a great 1079

variation in amino acid sequences. Interestingly, we did not observe Sib proteins outside of the 1080

Dictyostelia clade, suggesting that they evolved recently in the dictyostelids. The protein 1081

architecture of SibA has a similar composition to metazoan ITB solely with a GXXXG motif in 1082

the transmembrane region, but it is observed in type II ITB (see supplemental table 2). In 1083

addition, LamG3, vWD domains in type II ITB have never been observed in Sib. 1084

1085

1086 1087 1088 1089 1090 1091 1092

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 50: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

50

1093 1094 1095 1096

1097 Supplemental Figure 1: Repertoire of IMAC including filamin and tensin in Amoebozoa: 1098 names of IMAC proteins are listed at the top and names of amoebozoan taxa are listed on side of 1099 the map. Arrows indicates amoebozoan species that have nearly complete set of IMAC proteins. 1100

Sappinia pedataThecamoeba sp.Thecamoebidae isolate [RHP1-1]

Stenamoeba limacina

Dermamoeba algensisMayorella cantabrigiensis

Squamamoeba japonicaSapocribrum chincoteaguenseArmaparvus languidus

Entamoeba nuttalliEntamoeba moshkovskiiEntamoeba invadensEntamoeba disparEntamoeba histolytica

Rhizomastix liberaRhizomastix elongataPelomyxa sp.

Mastigamoeba abductaMastigamoeba balamuthiMastigella eilhardi

Dictyostelium discoideumPolysphondylium pallidum

Clastostelium recurvatumProtosproangium articulatumCeratiomyxa fruticulosa

Physarum polycephalum

Echinostelium bisporumEchinosteliopsis oligospora

Phalansterium solitariumFlamella aegyptia

Filamoeba nolandiProtostelium nocturnumSoliformovum irregularisGrellamoeba robusta

Ceratiomyxella tahitiensisAcramoeba dendroidea

Cavostelium apophysatum

Schizoplasmodiopsis vulgarisSchizoplasmodiopsis pseudoendosporaTychosporium acutostipes

Trichosphaerium sp. Diplochlamys sp.Amphizonella sp. 2 Amphizonella sp.10Amphizonella sp. 8Amphizonella sp. 9Amphizonella sp. 7

Rhizamoeba_saxonicaFlabellula citata

Nolandella sp.Amoeba proteusCopromyxa protea

Microchlamys sp.Pyxidicula operculatumCryptodifflugia operculata

Heleopera sylvatica

Hyalosphenia papilioNebela sp.

Lesquereusia sp.Difflugia sp. D2Difflugia bryophila

Centropyxis aerophyla

Arcella vulgaris

α−Int

egrin

β−Int

egrin

Vincu

linPa

xillin

TalinPINCH

FAK

ILK α−Ac

tinin

ILKILK

Taxon

Vermamoeba vermiformisEchinamoeba exudansVermamoeba sp.

Heleopera sphagni

Echinostelium minutum

Nematostelium gracile

Stenamoeba stenopodiaThecamoeba quadrilineata

Paradermamoeba levis

Micriamoeba sp.

Paramoeba aestuarinaParamoeba atlanticaVexillifera sp. [MMETSP0173]Stygamoeba regulata

Ripella sp.Lingulamoeba sp.Vannella robustaVannella sp.

Ovalopodium desertumParvamoeba rugataGocevia fonbruneiCochliopodium minus

Acanthamoeba castellanii

Dracoamoeba jomungandriiProtacanthamoeba bohemica Luapelamoeba hula Luapelamoeba arachisporum

Pellita catalonicaEndostelium zonatum [PRA-191] Endostelium zonatum LINKS

Vermistella antarcticaVexillifera minutissima

Cunea sp. [JDS-Ruffled]

Clydonella sp.

Cunea sp.

Acanthamoeba pyroformis

ILK

TubulineaEvosea

Discosea

Mycamoeba Gemmipara

Filam

inTe

nsin

Amoebozoa

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 51: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

51

Dark blue indicates the presence of a canonical IMAC protein, white indicates the absence of an 1101 IMAC protein and light blue indicates a truncated form of IMAC protein. 1102 1103 1104

1105 Supplemental Figure 2: Repertoire of IMAC in only amoebozoan species that have either 1106 integrin alpha or beta. Names of IMAC proteins are listed at the top and names of amoebozoan 1107 taxa are listed on side of the map. Arrows indicates amoebozoan species that have nearly 1108 complete set of IMAC proteins. Dark blue indicates the presence of a canonical IMAC protein, 1109 white indicates the absence of an IMAC protein and light blue indicates a truncated form of 1110 IMAC protein 1111 1112 1113

Sappinia pedataThecamoeba sp.

Thecamoebidae isolate [RHP1-1]

Stenamoeba limacina

Dermamoeba algensis

Mayorella cantabrigiensis

Squamamoeba japonicaSapocribrum chincoteaguense

Armaparvus languidus

Entamoeba nuttalli

Entamoeba invadensEntamoeba disparEntamoeba histolyticaRhizomastix liberaRhizomastix elongataPelomyxa sp.

Mastigamoeba abductaMastigamoeba balamuthiMastigella eilhardi

Dictyostelium discoideum

Clastostelium recurvatumProtosproangium articulatumCeratiomyxa fruticulosa

Physarum polycephalumEchinostelium bisporumEchinosteliopsis oligospora

Phalansterium solitariumFlamella aegyptia

Filamoeba nolandiProtostelium nocturnum

Soliformovum irregularisGrellamoeba robusta

Ceratiomyxella tahitiensisAcramoeba dendroidea

Cavostelium apophysatum

Schizoplasmodiopsis vulgaris

Schizoplasmodiopsis pseudoendosporaTychosporium acutostipes

Trichosphaerium sp. Diplochlamys sp.

Amphizonella sp. 2

Amphizonella sp.10Amphizonella sp. 8Amphizonella sp. 9Amphizonella sp. 7

Rhizamoeba_saxonicaFlabellula citata

Nolandella sp.

Amoeba proteus

Microchlamys sp.Pyxidicula operculatum

Cryptodifflugia operculata

Heleopera sylvatica

Hyalosphenia papilioNebela sp.

Lesquereusia sp.

Difflugia sp. D2Difflugia bryophila

Centropyxis aerophyla

Arcella vulgaris

α−Int

egrin

β−Int

egrin

α−Pa

rvin

β−Pa

rvin

Vincu

linPa

xillin

TalinPINCH

FAK

ILK α−Ac

tinin

Amoebozoa

Choanoflagellates

integrins

integrins

paxvintalpinILK

pin

FAK

αβ

c-Src FAKLore

Tal

PinILKPar

Pax

Vin

αA

Extracellular

Actin

Cytoplasmic

OpisthokontaHolozoa

Fungi

Innovation/GainLoss

par

par

Obazoa

ILK

ILK

Other EukaryotesOther Eukaryotes

Canonical Protein PresentTruncated Protein PresentProtein Absent in DataNearly Complete IMAC

Taxon

Amorphea

Animals

Vermamoeba vermiformisEchinamoeba exudans

Vermamoeba sp.

Heleopera sphagni

Echinostelium minutum

Nematostelium gracile

Stenamoeba stenopodia

Thecamoeba quadrilineata

Paradermamoeba levis

Micriamoeba sp.

Paramoeba aestuarinaParamoeba atlanticaVexillifera sp. [MMETSP0173]

Stygamoeba regulata

Ripella sp.Lingulamoeba sp.

Vannella robustaVannella sp.

Ovalopodium desertumParvamoeba rugata

Gocevia fonbrunei

Cochliopodium minus

Acanthamoeba castellanii

Dracoamoeba jomungandriiProtacanthamoeba bohemica

Luapelamoeba hula Luapelamoeba arachisporum

Pellita catalonica

Endostelium zonatum [PRA-191] Endostelium zonatum LINKS

Vermistella antarctica

Vexillifera minutissima

Cunea sp. [JDS-Ruffled]

Clydonella sp.

Cunea sp. [MMETSP0417]

Acanthamoeba pyroformis

TubulineaEvosea

Discosea

AB

Capsaspora

ThecamonasPygsuia

Mycamoeba Gemmipara

Acanthamoeba quina

Acanthamoeba healyiAcanthamoeba astronyxisAcanthamoeba commandoniAcanthamoeba culbertsoniAcanthamoeba divionensisAcanthamoeba healyiAcanthamoeba lenticulateAcanthamoeba lugdunensisAcanthamoeba mauritaniensisAcanthamoeba pearceiAcanthamoeba polyphaga

Cyclopyxis sp

Difflugia sp. USP

Centropyxis sp.Planocarina carinata

Hyalosphenia elegance

Arcella GibbosaNetzelia oviformis

Copromyxa protea

Ptoleamoeba bullinesis

Idionectes vortex

Entamoeba moshkovskii

Dictyostelium citrinum

Tieghemostelium

lacteumDictyostelium firmibasis

Dictyostelium intermedium

Dictyostelium purpureumSpeleostelium caveatumCavenderia deminutivaAcytostelium ellipticumAcytostelium leptosomumAcytostelium subglobosumSynstelium polycarpum

Coremiostelium polycephalum

Heterostelium pallidum

Heterostelium multicystogenum

Vannella schaefferi

Parvamoeba monoura

Vannella firmcola

Gocevia fonbrunei_Tekle_2016

Acanthamoeba rhysodesAcanthamoeba royreba

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 52: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

52

1114 Supplemental Figure 3: Maximum likelihood tree of ITA homolog and other cell adhesion 1115 molecules that contain beta-propellers. Bootstrap values ≥50 are shown on the branch points. 1116 Phylogenetic tree was built with IQtree v1.5.5 under under the LG+C60+F+G model of protein 1117 evolution ML bootstrap (MLBS) (1000 ultrafast BS reps) values respectively. The protein 1118 sequences were aligned by MAFFT with the parameter linsi, maxiterate 1000 and local pair. 1119 BMGE was used to mask the alignment with the parameter of gap penalty of 0.8. ITA homolog 1120 phylogenetic tree are rooted with Linkin protein. Amoebozoan ITA genes are mostly 1121 concentrated on a single clade (colored) except for the Flabellula citata paralog. 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 53: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

53

1139 Supplemental Figure 4: Maximum likelihood tree of ITB homolog and Sib protein. Bootstrap 1140 values ≥50 are shown on the branch points. Phylogenetic tree was built with IQtree v1.5.0 under 1141 the LG+C60+F+G model of protein evolution ML bootstrap (MLBS) (1000 ultrafast BS reps) 1142 values respectively. The protein sequences were aligned by MAFFT with the parameter linsi, 1143 maxiterate 1000 and local pair. BMGE was used to mask the alignment with the parameter of 1144 gap penalty of 0.6. ITB homolog phylogenetic tree are rooted with Sib protein. Amoebozoan ITB 1145 genes are mostly concentrated on a single clade (colored). 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157

100

0.5

Metazoan_ITGB

Opistho-2_34916_Pigoraptor

8 7

97

93

100

100

91

92

78

100

100

90

93

96

99

92

100

78

7594

97

98

92

99

98

90

75

86

90

100

91

100

96

91

99

90

100

100

Schizoplasmodiopsis_vulgaris_TR1491

Cavostelium_apophysatum_c13787

Filamoeba_nolandi_DN12164

Soliformovum_irregularis_c14427

Clastostelium_recurvatum_c10578

Trichodesmium erythraeum_ABG51146.1_ITGB-like

Rhodobacteraceae bacterium_KPP84658.1_ITGB_like

Didymoeca costata_m.142452_ITGB-like

Pygsuia_biforma_ITGB

Lenisia_limosa_LenisMan46

Thecamonas_trahens_AMSG_06622T0

Capsaspora_owczarzaki_CAOG04086T0

Capsaspora_owczarzaki_CAOG02005T0

Capsaspora_owczarzaki_CAOG05058T0

Capsaspora_owczarzaki_CAOG01283T0

Rigifila_ramosa_a2731185122

Rhizamoeba_saxonica_TR7441

Rhizamoeba_saxonica_TR11158

Rhizamoeba_saxonica_TR7856

Cryptodifflugia_operculata_c2

Pyxidicula_operculatum_TR11783

Heleopera_sphagni_DN3395

Difflugia_sp._D2_TR12857

Difflugia_sp._USP_TR583

Nebela_sp._0_TR6609

Hyalosphenia_papilio_DN7101

chinamoeba_exudans_TR10338E

Flabellula_citata_135911

Mastigamoeba_balamuthi_MMETSP0035-20110809

Sphaeroforma arctica_XP_014159356.1_ITGB

Ministeria_vibrans_c46

Trichoplax sp. H2_3000888_ITGB6

Trichoplax sp. H2_3000889_ITGB1

Sib_Dictyostelium discoideum

Pigoraptor_vietnamica_104197_Paralog2

Pigoraptor_chileana_93177_Paralog1

Syssomonas_multiformis_24197

Creolimax_fragrantissima_CFRG6586T1

Pigoraptor_vietnamica_12581_Paralog1

Pigoraptor_chileana_34909_Paralog2

Idionectes_vortex_TRINITY_DN4916_c4_g12_paralog1

Idionectes_vortex_TRINITY_DN4117_C0_G1_I1_paralog2

Mastigella_eilhardi_A1302_1610_2327

Ministeria_vibrans_c334

Ministeria_vibrans_c191

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 54: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

54

1158 Supplemental Figure 6: Most significantly enriched motifs in Obazoan ITA, obtained by 1159 MEME-suite. (A) Consensus cation motif and FG-GAP obtained by RNA-seq for an Obazoan 1160 ITA, (B) EGF motif found exclusively in Ministeria vibrans (C) GFFKR motif found in Obazoa, 1161 (D) immunoglobulin like motif found in Obazoa. 1162 1163

ITGA Obazoan Motif

MEME (no SSC) 16.04.2019 22:29

0

1

2

3

4

bits

1ANTDYG

2ADHGSE

3KNSRLQE

4ACLSF

5DG

6GESFY

7RATS

8SLVMI

9IVSA

10

ADLRSTVEGNP

11

PVLGIA

12

ARKG

13

D14

SIVL

15

DN16

ADQGKNR

17

D18

G19

RCYIVF

20

DGLNTQP

21

D22

FILMSTV

23

ILAV

24

MVI

25

SG

26

CMA

27

P

28

AKMRTGNYF

29

ADFKTY

A.

B.

MEME (no SSC) 26.06.2019 15:18

0

1

2

3

4

bits

1

C2M

WL

3LSY

4GQA

5HG

6DLP

7

G8Q

RE9H

Y10

CDFS

11

C12

AGSV

13

VYC

14EP

C.

MEME (no SSC) 16.04.2019 22:34

0

1

2

3

4

bits

1ACLFG

2FVAIM

3GAL

4VWAKY

5ARSHK

6GMRAKL

7PHG

8AMSF

9F 10

LRDAK

11

AER

12

HNK

13

KLMTAP

14

TKP

15

T

A

S

P16

TKEP

17

EHMNQAT

18

DQSYFL

19

HDVE

20

GLMNTE

D.

MEME (no SSC) 26.06.2019 16:24

0

1

2

3

4

bits

1IYV

2LTI

3AFSPR

4LYKN

5FISV

6LRTVYG

7AFV

8GIKRSTV

9N 10

RLNT

11

G

12

KTP

13

GHQSA

14

QT

15

A16

GKF

17

APDG

18

CILNTV

19

HQST

20

ILV

21

KLNYT

22

FLI

23

QST

24

FIMSTL

25

LP26

RSTVYA

27

ASG

28

MV29

FSTVYE

30

VFL

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 55: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

55

1164 Supplemental Figure 7: Most significantly enriched motifs in Obazoan ITB, obtained by MEME. 1165 (A-D) Cation binding motifs (A) MIDAS binding motif DXSXS (B) Possible fifth position of 1166 MIDAS and ADMIDAS, (C) ITB SyMBS (green) and MIDAS (Yellow), (D) Possible E of 1167 SyMBS (first amino acid of SyMBS) (E-F) Cysteine rich motif, (E) PSI domain, (F) Cysteine-1168 rich motif stalk, (G-H) C-terminal motifs (G) NPXY motif, (H) Possible KLXXXD motif 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190

ITGB Obazoan Motif

MEME (no SSC) 16.04.2019 22:32

0

1

2

3

4

bits

1STDC

2FGHTWYLVA

3QSKNYR

4WFMVIL

5QRG

6IMLF

7ESAG

8NVGTS

9HYF

10

ALIV

11

NED

12

RK

13

HSP

14

KLMNIQRAV

15

ADEGINQVTSY

16

VP

17

LMSAIF

MEME (no SSC) 16.04.2019 20:55

0

1

2

3

4

bits

1AWLVFY

2RK

3CVGIL

4AGFVYI

5AHITVGNQ

6HKWGRTMV

7HKLQRYIMA

8QSLAMR

9AEND

10

NRK

11

NQKR

12

ADLQE

13

W14

DGNRSAKQE

15

AEQRK

16

HWF

17

HQYNE

18

ESQRAK

19

GIKADQE

20

LMQKR

21

DGRSAKQE

MEME (no SSC) 27.06.2019 17:57

0

1

2

3

4

bits

1IRTV

2PQVKT

3ILRSTVP

4RVANSPQ

5RSTVEPAGQ

6DGLPQSYRC

7IQYPK

8DEKSGNP

9IVL

10

ED11

FL12

ILVFY

13

IVFYL

14

IVL

15

VFELM

16

D17

IAVFL

18

TS

19

DQYSAG

20

GTS

21

FM

22

DEFLARSG

23

GKSND

24

QD

25

IMVKL

26

HKTVNRDA

27

EFSVNKT

28

IMVL

29

VAEQRK

MEME (no SSC) 16.04.2019 22:28

0

1

2

3

4

bits

1KR

2KQHR

3LMI

4MVIL

5FL 6FIMYVL

7GIVSLA

8ST

9ND

10

NSDA

11

GLNPQTD

12

GTAPSFY

13

DEQH

14

QYFMIL

15

GNVERA

16

FKMTG

17

ED

18

YASG

19

FGNPSTYELRA

20

LMSIKRA

21

QRVYILA

22

MRSYG

A.

MEME (no SSC) 16.04.2019 22:33

0

1

2

3

4

bits

1

C

2ASVYT

3ATD

4ILYFK

5GLRT

6INDT

7

C8D

EHKNR

9ESD

10

C11

NVLI

12

TN13

GDN

14

ADGKPL

15

ENRSTA

16

LMI

17

AYTP

18

EMNQTG

19

C20

SG

21

W22

C23

AEIVG

24ANQSTD

25

FHMTD

26

PQG

27

QSGT

28

IC

29

GQLR

MEME (no SSC) 16.04.2019 22:28

0

1

2

3

4

bits

1W 2AGNSK

3LNQTGAS

4AGNQTD

5EGANDQ

6N 7P 8HL

9YF 10

EVKQ

11

DAPS

12

ALNS

13

FTV

14

DILRTK

15

QE16

HYFT

17

INSVDE

18

N19

P20

VTIL

21

FY

MEME (no SSC) 16.04.2019 22:31

0

1

2

3

4

bits

1LNVTI

2AHLQRST

3DSEG

4DN

5KLRNI

6HD

7AFHQTYW

8SAP

9NE

10

NQDAS

11

ASVGMP

12

ISTFL

13

DE14

GA15

AML

16AFSVLM

17

HQ

18

ISVA

19

SVILA

20

KRLVA

21

C22

PQDKRT

23

ASDNTK

24

MRSEIL

25

LSTIV

26

KDNG

27

W28

GPAR

29

AENS

MEME (no SSC) 16.04.2019 22:29

0

1

2

3

4

bits

1NC

2ADGNS

3NPYG

4FKQRH

5

G6A

ENVST

7EFC

8ADLQESTNV

9

C

10

ASG

11

ALQREVS

12

C

13

NQTEKV

14

C

15

AFNED

16

DETGSA

17

ADLNG

18

FYW

19

ADFGIVTS

20

LQSG

21

IQAEPSDV

22

FGKTVSD

23

TC

24

GKEDTS

25

FC

B.

C. D.

E. F.

G. H.

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 56: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

56

1191 1192 1193 1194

1195 Supplemental Figure 8: Most significantly enriched motifs in Sib protein, obtained by MEME. 1196 (A)-(D) Cation binding motifs (A) Possible binding motif of MIDAS and ADMIDAS, (B) 1197 Possible fifth position of MIDAS and ADMIDAS (C) Possible binding motif of SyMBS (D) 1198 NPXY motif 1199 1200 1201 1202 1203 1204 1205 1206 1207

MEME (no SSC) 27.06.2019 20:22

0

1

2

3

4

bits

1IL 2VL

3MQ

4D 5LV 6ST

7KQA

8ANS

9NF

10

GSTA

11

YS12

ED13

FIL

14

PAS

15

ILTY

16

LVI

17ARK

18STA

19FSVQ

20

AIL

21

LNSTV

22

QGS

23

ILTV

24

LVYF

25

FNSWY

26

MNQRT

27

CILV

28

STVK

29

ADTQ

30

ITV

31

FY

32

AP

33

KSYN

34

VA

35

QRK

36

ILF

37

AG

38

FILV

39

GTA

40

TS

41

F

42

S

43

D

44

K

45

P

46

LI

47

SA

48

NP

49

FY

50

GA.

B.

C.

D.

Sib Motif

MEME (no SSC) 17.09.2019 18:05

0

1

2

3

4

bits

1DG

2GD

3AG

4V 5NS 6ATS

7N 8P 9LM 10

Y

11

QE

12

EQ

13

S

14

GKA

15

NRS

16

GSA

17

GA

18

IE

19

N

20

P

21

L

22

Y23

ELQ

24

AS

25

AS

26

NS

27

DE

MEME (no SSC) 17.09.2019 18:02

0

1

2

3

4

bits

1DC

2EGIPV

3GKTD

4GKN

5FW

6TD

7STP

8AITE

9KYS

10

DKQN

11

C12

LYF

13

DE

14

C15

LKS

16

TDK

17

G18

FHY

19

Y20

G21

DQE

22

NK

23

C24

ITVD

25

VAP

26

LTV

27

DP28

LP29

C30

DKV

31

KN32

G33

EIV

34

SP35

N36

GES

37

G38

SVI

39QLN

40G

41ND

42

G43

K44

C45

MTYL

46

C47

ISYN

48

YN49

G50

YWMEME (no SSC) 17.09.2019 17:57

0

1

2

3

4

bits

1FIV

2N 3HRW

4

S

5P 6V 7TS

8NG

9M 10

ILMV

11

P

12

I

13

IYV

14

PRTQ

15

LV

16

NTVI

17

RYA

18

DGN

19

RK

20

N21

N22

QN23

F24

LNR

25

VI26

API

27

A28

NS29

D30

PQ31

HN32

GKLV

33

RDQ

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 57: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

57

Supplemental Table 1: The table shows the list of Amorphea ITGA proteins characteristic 1208 including: length, beta propeller motifs, subcellular location, any extra domains attached to 1209 ITGA, c-terminal motif signal peptide, transmembrane regions and type of ITGA. The order of 1210 taxons are listed in accordance to the Figure 2. The subcellular localization is displayed with 1211 likelihood value in parentheses. The location of extra domains attached to ITGA are shown as 1212 the C-terminal end (CTE) or at the start of N-terminal (NTB). The presence of ITGA interacting 1213 motif with ITGB is shown as GFFKR and GXXXG. The location of beta propeller motifs 1214 performed by MEME-suite is displayed. The presence of FG-GAP motifs are shown in asterixis. 1215 The location of beta motifs that corresponds to IPRSCAN output are highlighted by parentheses. 1216 The presence of consensus cation motifs are displayed in number 2 superscripts. 1217

Species Size of AA

Subcellular localization Predictability based on likelihood

Extra Domain C-terminal end = CTE N-terminal Beginning =NTB

GFFKR motif GXXXG motif

Location of Beta propellor *= predict to possess FG-GAP 2 = Cation motif (D/E-h-D/N-x-D/N-G-h-x-D/E)

Signal peptide(SP) or transmembrane (TM)

Type I and II

Filamoeba nolandi (CAMPEP_0168571262)

631

Cell membrane protein Likelihood (0.971)

PRICHEXTENSN(CTE) protein kinase (CTE)

GFFKR GIAVG

16, 47, [109], 171

TM II

Ministeria vibrans Paralog 2 (2753598)

917 Extracellular (0.9862) Soluble (1) Likelihood (0.997)

EGF 2 repeat (NT)

1952, 2552, [319]2, [436]2*, [581]2, 772, 8122

II

Rhizamoeba saxonica (TR11094_c0_g1_i1)

449 Cell membrane protein Likelihood (0.437)

None [55], 1222, 184, 2542*

SP, TM I

Flabellula citata paralog3 (c2461_g1_i2)

673 Cell membrane protein Likelihood (0.301)

None 57*, 121, 184*, 251*, 313

SP, TM I

Cryptodifflugia operculata Paralog 2 (c8521_g1_i1)

1200

Cell membrane Protein Likelihood (0.257)

(NTB) Thrombospondin type -1 repeat, EGF 4 repeat Intramolecular chaperone

8372, 8892*, [1033]2, [1107]2

SP II

Nolandella sp. AFSM (TR12097c0_g1_i1)

975 Cell membrane Protein Likelihood (0.672)

(NTB) Thrombospondin type -1 repeat, EGF 4 repeat

613, [670]2*, [734]2*, [805]2*, 934

SP II

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 58: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

58

Arcella intermedia (c23786_g1_i1)

1074

Nucleus membrane Protein Likelihood (0.58)

(NTB) Thrombospondin type -1 repeat, EGF 4 repeat Intramolecular chaperone

[899]2*, [965]2* SP II

Flabellula citata paralog1 (c9004_g2_i1)

626 Cell membrane Protein Likelihood (0.994)

EGF (CTE) 123, [259], [323]2*

SP II

Flabellula citata paralog2 (c8227_g1_i1)

713 Cell membrane Protein Likelihood (0.669)

EGF (CTE) 88*, 150*, 255, 310

TM II

Ministeria vibrans Paralog 1 (2757223)

860 Peroxisome (0.3325), Soluble (0.9929) Likelihood (0.386)

None 582, [125]2*, 185, 243, 310, 3872, 448, [578]2*, [647], 748

I

Flabellula citata paralog5 (c8994_g2_i16)

604 Extracellular (0.5213) Soluble (0.9842) Likelihood (0.622)

None 113, 172, [302]*

I

Flabellula citata paralog4 (c8386_g1_i3)

646 Cell membrane Protein Likelihood (0.833)

None 53, [116]*, [178]*

SP, TM I

Cryptodifflugia operculata Paralog 1 (c43306_g1_i1)

581 Cell membrane Protein Likelihood (0.445)

None 582, [120]*2, [188]*, 2612, 3252, 383*, [449]2

SP, TM I

Amphizonella sp 9 paralog2 (TR2587c1_g1_i1)

565 Extracellular (0.9902) Soluble (0.9998) Likelihood (0.997)

None [47], 121, [198]1, 262, [344]2, [414]2, 4822

SP I

Mastigamoeba balamuthi paralog1 (MMETSP0035-

201108093729)

813 Cytoplasm (0.3215) Soluble (0.8872) Likelihood (0.493)

Cadherin/Ig-like fold (CTE)

66, [200]2, 268, [345], 416, 468, [540]

I

Amphizonella sp 8 Paralog 2 (TR9459c0_g1_i1)

569 Vacuole (0.4562) Soluble (0.8119) Likelihood (0.362)

None [71],138, 207, 274 [353], 419

SP I

Heleopera sylvatica (Gene.370_NODE_157)

743 Cell membrane Protein Likelihood (0.493)

ADP-ribose (NTB)

GIVAG [56], 139, 2152, [280]2

TM II

Nebela sp. (Gene.16036_NODE_111)

839 Peroxisome (0.3381), Soluble (0.6007)

Cadherin (CTE)

[36], [109], [179], [244],

TM I

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 59: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

59

Likelihood (0.125) [320], [388], [455]

Difflugia sp. D2 (DN18414_c0_g1_i5)

266 Organelle (0.3485) soluble (0.8094) Likelihood (0.287)

None [55], [122], [190]

I

Protosproangium articulatum Paralog3 (TR6387|m.15047)

671 Cell membrane Protein Likelihood (0.937)

None GAVVG

[48], [124], [194]

TM I

Ceratiomyxa fruticulosa (c6850_g1_i1)

565 Vacuole (0.4506) membrane (0.7419) Likelihood (0.215)

None 742, [127], 198, 259, 321, 393, 471

SP, TM I

Diplochlamys sp. Paralog2 (m.4379)

666 Cell membrane Protein Likelihood (0.989)

None GAIAG [120]2, 249, [311]2,

SP, TM

Paramoeba atlantica (2184673)

623 Cell membrane Protein Likelihood (0.498)

none 69, [125], 198, [337]2, 4252

SP, TM I

Mastigamoeba balamuthi paralog2 (MMETSP0035-

20110809854)

1042

Cell membrane Protein Likelihood (0.674)

Cadherin/Ig-like fold (CTE)

[133], 202, 265, 343,522 613, [685], 754

TM I

Amphizonella sp 9 Paralog3 (Gene.962_NODE_597)

685 Cell membrane Protein Likelihood (0.87)

PRICHEXTENSN(CTE)

68, [133]2, [195], 255, [287]2, [380]2*, 455, 5202, 586

SP, TM II

Amphizonella sp 9 paralog1 (TR17939c0_g1_i1)

677 extracellular (0.8519) soluble (0.8141) Likelihood (0.936)

None [62], [124], [199]2, [268]*, [337], [409]2, [488]2

SP I

Diplochlamys sp. Paralog1 (2112184)

540 Vacuole (0.559) Soluble (0.5526) Likelihood (0.639)

None [68], 143, 397, [468]

SP I

Vermistella antarctica (156452)

570 Extracellular (0.7931) Soluble (0.9906) Likelihood (0.905)

None 83, 224, 2992, 368, 4422, 516

SP I

Protosproangium articulatum Paralog4 (TR6387_c1_g4_i7)

631 Vacuole (0.4093) Soluble (0.8173) Likelihood (0.459)

None [108], [180], 248, 329, [390]*, [487], [567]*

SP I

Amphizonella sp 8 Paralog 1

656 Extracellular (0.8352) Soluble (0.992)

PRICHEXTENSN(CTE)

[53], [125], [200]2, 2662,

SP II

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 60: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

60

(TR6668_c0_g1_i4)

Likelihood (0.964) 3322, [403]2, 473

Amphizonella sp. 10 Paralog1 (TR4717c0_g1_i1)

563 Cytoplasm (0.625) Soluble (0.8464) Likelihood (0.758)

PRICHEXTENSN(CTE)

16, 87, [161], 237, 2992, 370, [438]

II

Amphizonella sp 2 (TR2129_c1_g3_i9)

544 Endoplasmic reticulum (0.2477) Soluble (0.5541) Likelihood (0.224)

none 111, [189]2, [257], [334]2, [403]2, 474, [539]2

SP I

Endostelium zonatum LINKS (789702)

683 Cell membrane Protein Likelihood (0.451)

PRICHEXTENSN(CTE)

2322, [295], 432, [497]2, [572]2, [647]

SP II

Endostelium zonatum (m.5907)

681 Cell membrane Protein Likelihood (0.479)

PRICHEXTENSN(CTE)

2312, [293], 430, [495]2, [570]2, [645]

SP II

Protosproangium articulatum Paralog5 (TR2944c0_g1_i1)

735 Extracellular (0.4086) Soluble (0.4867) Likelihood (0.572)

none [70], [137], [200], [268], [344], 407, 480

SP I

Protosproangium articulatum Paralog1 (TR2074c0_g1_i2)

851 Cell membrane Protein Likelihood (0.448)

protein kinase GIVFG 572, [131]2, [201], [275]2, [344], 418, 485

SP, 2TM II

Protosproangium articulatum Paralog2 (TR2817c1_g1_i1_6465)

685 Cell membrane Protein Likelihood (0.997)

None GVGLG [105], 176, [240], 303, [378], 450, 512

TM I

Stenamoeba stenopodia (2782329)

662 Cell membrane Protein Likelihood (0.93)

None [168]2, 3102, 4442, 5142

SP, TM I

Gocevia fonbrunei Paralog1 (DN14950_c7_g1_i2)

869 Cell membrane Protein Likelihood (0.999)

Cadherin/Immunoglobulin-like fold (CTE)

GAIIG [66]2, [207], 281, [348], 417, [488]2,

TM II

Gocevia fonbrunei Paralog2 (DN13664_c1_g1_i1)

665 Extracellular (0.5821) Soluble (0.9361) Likelihood (0.656)

None 52, [121], 194, 257, [318]2, [386]2*, 463

SP I

Amphizonella sp. 10 Paralog2 (TR11506c6_g2_i3)

523 Extracellular (0.9162) Soluble (0.9984) Likelihood (0.929)

None [56], [129]2, [201], [273]2, 3402, 3972, [468]

SP I

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 61: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

61

Ectocarpus siliculosus

1105

Cell membrane Protein Likelihood (0.758)

PRICHEXTENSN(CTE)

[56], [127], 195, 260, 326, [396], 464*

SP, TM II

Ministeria vibrans Paralog 3 (m.23418)

634 Cell membrane Protein Likelihood (0.214)

None GFFKR

[18]2*, 78 TM I

Pygsuia biforma (ITGA)

3312

Extracellular (0.9162) Soluble (0.9984) Likelihood (0.93)

DUF11 4 repeats (CTE)

GFFKR GLAIG

267, [685], [807]

TM I

Lenisia limosa (LenisMan47)

3346

Cytoplasm (0.5704) Soluble (0.9615) Likelihood (0.632)

Ig fold like 11 repeats (CTE)

GFFKR

3102, [701], [826]2

I

Thecamonas trahens

2114

Cell membrane Protein Likelihood (0.975)

Ig fold like 2 repeats (CTE) Laminin_g3 (CTE)

GFFKR

349*, [894]2*, [956]2*

SP, TM I

1218 Supplemental Table 2: The table shows ITGB motifs, subcellular localizations, signal peptide, 1219 transmembrane and extra domains attached to ITGB protein. The order of taxons are listed in 1220 accordance to the Figure 3. For ITGB motifs of MIDAS, AMIDAS and SyMBS, any non-1221 canonical amino acids were highlighted in red colour, cysteine rich motifs are either shown as 1222 EGF or PSI domains. The presence of C-terminal motifs are shown in KLXXXD or NPXY/F and 1223 GXXXG. The parentheses indicate number of motifs present in ITGB. 1224

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 62: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

62

Species MIDAS

AMIDAS SyMBS CRM Cysteine rich motifs

Size

aa

Signal

Peptide

and

Transme

mbrane

B-tail

motifs

Ty

pe

Subcellular

localization

(Deeploc)

Clastostelium recurvatum

25D 27T 29S 123E 139R

29S 32S 33D 202A 160D

81E 120R 121P 122D 123E

None 1405aa

TM GAVVG II Extracellular Likelihood (0.3436) soluble likelihood (0.731)

Filamoeba nolandi

893D 895Y 897S 956E 991D

897S 899K 900A 1066A 991D

906D 949D 950E 951P 952N

EGF, PSI (NTB)

2714aa

SP, TM GAFIG II Cell membrane Protein likelihood (0.5463)

Soliformovum irregularis

872D 874S 876N 958S 975N

874N 879Y 880N 1071A 975N

924D 1012D 1013P 1014N 1015E

PSI 2218aa SP NPXY Present

II Extracellular likelihood (0.8972) soluble likelihood (0.9172)

Schizoplasmodiopsis vulgaris

839D, 841T 845S 928E 944D

845S 846Y 857E 1045A 944D

889D 923N 925Q 927L 928E

PSI 3641aa SP, TM NPXY present GAAG

II Extracellular likelihood (0.5465) soluble likelihood (0.6244)

Cavostelium apophysatum

507D 509T 511S 592N 642D

511S 514S 515D 708A 642D

546D 656N 658D 659E 660D

PSI 3296aa

TM NPXY present GAASG

II Cell membrane protein likelihood (0.7274)

Mycamoeba gemmipara

277D 279T 281S 375E 416E

281S 285N 286D 530A 390D

328E 403N 411D 414P 416E

Unique cysteine rich region pattern downstream from 500AA-860AA

927aa

TM NPXY present GVIAG

I Cell membrane protein likelihood (0.998)

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 63: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

63

Mastigamoeba balamuthi

159D, 161T, 163S 242E 271D

163S, 170P, 171N 371S, 271D

198E, 238S, 239D, 241P, 242E

PSI, EGF-like present 5 times

1315 aa

SP, TM NPXA present GAASG

I Cell membrane Protein likelihood (0.561)

Mastigella eilhardi

127D, 129S, 131S 212E 230D

131S 134D 135D 339A 230D

198E, 207N, 209D, 211P, 212E

PSI, EGF-like present 5 times

1292 aa

SP, TM NPXA present GAASG

I Cell membrane Protein likelihood (0.7939)

Rigifila limosa

124D 126S 128S 206E 237D

128S 131D 133D 391A 237D

164D 201N 203D 205P 206E

PSI, EGF-like present 13 times

932 aa SP, TM NPXY present GTATG

I Extracellular soluble likelihood (0.681)

Echinamoeba exudans

123D, 125S, 127S, 209E, 239D

127S, 140G 141E 338A 239 D

174E 204N, 206E, 208P, 209E

PSI, EGF-like present 3 times

593 aa SP, TM

NPXY present (2) KMAAAD GAVIG

I Cell membrane protein likelihood (0.5596)

Difflugia sp. D2

115D, 117S, 119S, 213E, 249D

119S 136D 137E 295I 249D

177E, 208N, 210E, 212P, 213E

PSI, EGF present 3 times

605 aa

TM NPXY present (2)

I Cell membrane protein likelihood (0.8598)

Nebula sp. 131D, 133S, 135S, 225E, 264D

135S 138P 139Y 361A, 264D

192E, 223N, 225E, 227P, 228E

PSI, EGF present 3 times

630 aa SP, TM NPXY present (2)

I Cell membrane protein likelihood (0.7413)

Heleopera sphagni

127D 129S 131S 221E 266D

131S 134P 135H 352A 266D

174E 216N 218E 220P 221E

PSI, EGF present 3 times

622 aa SP, TM

NPXY present (2) KADX(4)E

I Cell membrane protein likelihood (0.9332)

Difflugia sp. USP

146D, 148S, 150S, 245E 280D

150S 153P 154Y 370A 280D

197E 239N 241E 243P 245E

PSI, EGF present 3 times

630 aa SP, TM

NPXY present (2)

I Cell membrane protein likelihood (0.9847)

Hyalosphenia papilio

35D 37S 39S 132E

39S 42P 43Y 270D

85D 127N 129E 131P

EGF present 3 times

531 aa TM NPXY present (2) KADX(4)D

I Cell membrane protein likelihood (0.8186)

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 64: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

64

172D 172D 132E Rhizamoeba saxonica (paralog 1) TR11158c0_g1_i1

143D 145S 147S 228E 259D

147S 150Q 151E 383A 259D

182E 223N, 225D, 227P, 228E

PSI, EGF present 7 times

951 aa SP, TM NPXY present (2)

I Cell membrane protein likelihood (0.7754)

Cryptodifflugia operculum

144D 146S 148S 241E 278D

148S 151P 152L 370G 278D

194E 236N 238E 240P 241E

PSI, EGF present 3 times

633aa SP, TM NPXY present (2)

I Cell membrane protein likelihood (0.9426)

Rhizamoeba Saonxica (Paralog2) TR7856c0_g1_i1

142D, 144S 146S 227E, 258D

145S 148D 149D 329A 258D

181E, 222N, 224D, 226C, 227E

PSI, EGF present 3 times

932aa TM NPXY present (2)

I Cell membrane protein likelihood (0.8204)

Flabellula citata

163D, 165S 167S 234E, 282D

167S 172M 173E 364A 282D

185D, 229G 231D, 233P 234E

PSI, EGF present 5 times

1141 aa

SP GAAVG I Cell membrane protein likelihood (0.7827)

Rhizamoeba saxonica (Paralog 3) TR7441_c0_g1_i1

832D 834S 836S 918E 949D

836S 839P 840F 1063A 949D

871E, 913N 915D 917C 918E

EGF-like present 2 times

1456 aa SP, TM

NPXY present (2)

I Extracellular (05263) soluble likelihood (0.731)

Pyxidicula operculata

133D, 135S, 137S, 229E, 264D

137S, 140P 141Y, 360A, 264D

192E, 224N, 226E, 228P, 229E

PSI, EGF present 3 times

621 aa

SP, TM NPXY present KADX(4)D GECGG

I Extracellular (0.9867) soluble likelihood (0.9995)

Pygsuia biforma

110D 112S 114S 204E 231D

114S 117D 118D 331A 231D

161D 200Q 201D 203P 204E

PSI, EGF present 34 times

1793 aa TM NPXY present KAGX(4)E present

I Cell membrane Protein likelihood (0.9792)

Lenisia limosa

130D, 132S 134S 218S 246D

134S 137N 138D 357N 246D

182N 215N 216E 217P 218S

PSI, EGF present 12 times, EGF like 4 times

SP, TM NPXY present RFNAAMKE present

I Cell membrane Protein likelihood (0.9934)

Ministeria vibrans (paralog 1) c191|m.702

85D, 87S 89S 179E

89S 92D 93D 310S

124E 174N 176D 178P

EGF-like present 5 times

SP, TM NPXY present KAWQQWEE

I Cell membrane protein likelihood (0.9827)

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 65: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

65

1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 Supplemental Table 3: This table shows Rsem value of protists that have either a complete set 1237 of IMAC, integrins with extra domains or unexpected discovery of an integrin protein. The 1238 values are shown in transcripts per million (TPM). 1239

Species ITA ITB Talin

PINCH

ILK Paxillin Actinin Actin eF1α Vinculin

Filamoeba nolandi

10.23 91.07(?) 45.54 30.62 11.71 0.00 138.81 2417.45

5932.12 36.07

Difflugia sp. D2 0.00 0.00 0.00 6.01 N/A 0.00 63.16 12.24 0.00 0.00

Flabellula citata

16.31(Paralog1) 9.81(Paralog2) 14.56(paralog3) 4.90 (Paralog4)

421.32 18.56 34.78 5.54 25.09 109.90 2708.65

6180.65 7.28

210D 210D 179E Ministeria vibrans (paralog 2) c46_198

73D, 75S 77S 194E 225D

77S 80N 81D 339A 225D

122D 189N 191D 193P 194E

EGF-like present 14 times

TM NPXF present (2) KLAQVAAD

I Cell membrane Protein likelihood (0.9941)

Ministeria vibrans (paralog 3) c334|m.1235

594D, 596S 598S 682E 714D

598S 601N 602D 824A 714D

633E 677N 679H 681S 682E

EGF-like present at N-terminal and C-terminal

SP, TM NPXY present (2) KLITGAAD

I Cell membrane Protein likelihood (0.9985)

Didymoeca costata

444D 446S, 448S 530E, 562D

446S 449D 451D 652A 562D

524S 527D 529P 530E

None 1673aa SP, TM None I Cell membrane protein likelihood (0.7458)

Sphaeroforma arctica

119D 121S 123S 231E 260D

123S 125D 127D 296A 260D

195E 227N 229D 230P 231E

EGF-like present 7 times

1141aa

TM KGIQMVMD

I Cell membrane protein likelihood (0.638)

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 66: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

66

0.67(paralog 5) Nebela sp. 3.73 11.34 3.95 85.57 N/A 25.24 444.60 653.00 2388.17 6.42

Rhizamoeba saxonica

174.00 52.55(Paralog3) 82.61(Paralog2) 121.63(paralog1)

25.63 29.96 3.23 13.13 106.07 12079.17

5322.57 20.45

Cryptodifflugia operculata

4.93 (paralog1) 1.80 (paralog2)

7.95 50.58 2.40 2.47 15.50 5.55 272.10 1857.80 (c7512_g1_i1)

9.65

Nolandella sp. AFSM

2.77 N/A 9.84 41.65 N/A 11.66 3.44 2929.71

3621.35 12.38

Arcella intermedia

2.98 N/A 4.91 1.19 N/A 11.19 43.20 110.29 (c17090_g2_i1)

2001.59 4.71

Rigifila ramosa N/A 810.35 348.93 59.65 N/A 141.87 127.94 18519.18

2284.60 7.37

Pygsuia biforma 2.14 1.45 41.95 111.50 N/A 13.47 103.13 17397.16

7071.38 N/A

Ministeria vibrans

1.71 (Paralog 1) 2.90 (Paralog 2) 20.02 (Paralog 3)

0.00(Paralog 1) 5.17(Paralog 2) 10.56 (Paralog3)

3.88

13.43 3.38 13.43 8.72 277.70 310.59 1.15

1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257

Supplemental Table 4: A). Summary of amoebozoan ITA type I table listing: number of 1258 paralogs, Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal 1259 peptide and transmembrane region, c-terminal interacting motif, GXXXG motif and Ig-fold 1260 like/Cadherin domain. B). Summary of amoebozoan ITA type II table listing: number of 1261 paralogs, Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal 1262 peptide and transmembrane region, c-terminal interacting motif, GXXXG motif and extra 1263 domain attached to ITA. 1264 1265 1266 1267

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint

Page 67: 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020  · 58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 59 Obazoa,

67

1268 1269

1270

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint