supporting information - pnas · 7/1/2010 · fig. s3. alignment of x-box dna footprints or proven...
TRANSCRIPT
Supporting InformationPiasecki et al. 10.1073/pnas.0914241107SI Materials and MethodsProtein Sequence Identifications. Using T-BLASTN (http://blast.ncbi.nlm.nih.gov/Blast.cgi) analyses, we searched for similaritiesto the 76-aa residues DNA-binding domain of human RFX3 (GI:57209027) in the publicly available genome sequences of 121eukaryotic organisms (Dataset S1). Genome sequence data wereavailable at the Joint Genomes Institute, Department of Energy(Walnut Creek, CA), the J. Craig Venter Institute (Rockville,MD), theWellcome Trust Sanger Institute (Cambridge, England,UK), the Broad Institute (Cambridge, MA), the Human GenomeSequencing Center, Baylor College of Medicine (Houston, TX),and many other public and privately funded universities andconsortia. Database comparisons were performed betweenMarchand December of 2009. Using default search parameters, thepresence or absence of RFX TFs in the respective genomes be-came readily apparent by their respective E-value scores. Allsampled organisms that yielded E-values lower than E−6 werescored as “containing RFX,” whereas E-values greater than E−2
were scored as “RFX is absent.”NoE-values between E−6 and E−2
were found for any sampled organism. Subsequent T-BLASTNand PSI-BLAST analyses of all RFX TF domains yielded no sig-nificant homology to any prokaryotic or nonunikont organism.Reverse-BLAST comparisons using RFX amino acid sequences(full-length and DNA-binding domains only) from the fungus S.cerevisiae and the amoebozoan A. castellani revealed no significanthomologies in the genomes of any non-RFX-containing organism.For full-length comparisons, BLASTP was used to identify
homologs of the complete human RFX3 sequence in the NCBIdatabase and in select organisms in the JGIdatabase.A single best-hit sequence was extracted from all sampled organisms. No NCBIdatabase submission was found for the single RFXTF homolog intheA. castellani genome identified by T-BLASTN analysis (BaylorCollege of Medicine). Therefore, the amino acid sequence wasmanually assembled. We found that the putative RFX TF ho-molog in A. castellani was encoded on two overlapping contigs,793 and 941, in the current assembly of the genome (2008-01-30).These contigs were merged, and Fgenesh-M v2.6 (http://linux1.softberry.com/all.htm) was used to predict the coding regionsusing a model designed for the fungal genus Phyrenophora. Thepredicted A. castellani RFX gene includes a nine-exon codingregion, which was used to assemble the putative amino acid se-quence.
Phylogenetic Comparisons. All extracted protein sequences werealigned using MUSCLE v3.5 multiple-sequence comparison bylog expectation or, for phylogenetic tree construction, a MAFFT
v6.234b multiple-sequence alignment for amino acid or nucleo-tide sequences program with an L-INS-I strategy (1, 2). Toidentify conserved domains, aligned sequences were viewed inBioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Forgenerating phylogenetic trees, informative characters were firstselected using GBlocks v0.91b (3). A maximum-likelihood phy-logenetic tree was then generated using RAxML v7.0.3 witha PROTMIXWAG model and 100 bootstrap replicates (4) andviewed using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).
X-Box DNA Footprint Identifications. When possible, the trans-lational start site for each genewasdeducedbasedon the respectiveNCBI database submission annotation. For genes without anno-tation,weonly sampled thepromoter regionsofgeneswith strong5′end conservation, which were deduced from T-BLASTN analysesof the amino acid sequences encoded by each of the respectiveciliary genes. Only predicted proteins with strong conservationwithin the first 30 amino acid residues from the beginning of thequery protein were used. X-box promoter motif searches wereconducted using a hidden Markov model (HMM) approachwith HMMER 2.3.2 (http://hmmer.janelia.org/). Mononucleotidesequence scrambling was conducted using the Sequence Manipu-lation Suite (SMS) software (http://www.bioinformatics.org/sms/).Dinucleotide sequence scrambling was conducted using theuShuffle software (5). As an additional negative control, we con-structed two independent replicate HMM training sets in which the15 aligned nucleotide positions of the 17 X-boxes that were used toconstruct the original HMM model (Table S2) were randomized,but in which the respective nucleotide composition at each positionwas maintained. The resulting two randomized profile HMMmodels were then used to query against all 349 endogenous pro-moter regions sampled in this study, revealing false-positive rates of3.7% and 4.5%, respectively (see also Fig. 3B for comparison). Fordata presented in Fig. 3, the standard error of percent was calcu-lated for the percentage of sampled organisms that contain X-boxDNA footprints in the promoter of each of the respective genes(Fig. 3B) and for the percentage of ciliary gene promoters thatcontain X-box DNA footprints in the complete sampling of genesfor each of the respective organisms (Fig. 3C). Consensusmotif illus-trations were constructed using WebLogo V2.8.2 (http://weblogo.berkeley.edu/).
Nomenclature. Gene and protein names follow the human no-menclature convention except when referring to a specific or-ganism.
1. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and highthroughput. Nucleic Acids Res 32:1792–1797.
2. Katoh K, Kuma K, Miyata T, Toh H (2005) Improvement in the accuracy of multiplesequence alignment program MAFFT. Genome Inform 16:22–33.
3. Castresana J (2000) Selection of conserved blocks from multiple alignments for theiruse in phylogenetic analysis. Mol Biol Evol 17:540–552.
4. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyseswith thousands of taxa and mixed models. Bioinformatics 22:2688–2690.
5. Jiang M, Anderson J, Gillespie J, Mayne M (2008) uShuffle: A useful tool for shufflingbiological sequences while preserving the k-let counts. BMC Bioinformatics 9:192.
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 1 of 7
Fig. S1. Maximum-likelihood phylogenetic tree of the full-length amino acid sequences of RFX TFs from select unikont organisms. Informative characters froma MAFF-LINSI sequence alignment were selected using GBlocks. The phylogenetic tree was generated using RAxML with a PROTMIXWAG algorithm and vi-sualized using FigTree. The values generated from 100 bootstrap replicates are depicted at each node. Names of taxa are listed together with their respectiveNCBI/JGI protein identification numbers, when available. The basal amoebozoan A. castellani was used to root the tree.
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 2 of 7
Fig. S2. Multiple-sequence alignment of ciliary protein B9D2 amino acid sequences from various unikont and nonunikont organisms. A BLOSUM62 matrix wasused for identity and similarity shading using a 70% threshold value. No B9D2 gene homologs were identified in the genomes of any nonciliated organism,such as yeast or the plant Arabidopsis, respectively.
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 3 of 7
Fig. S3. Alignment of X-box DNA footprints or proven X-box motifs identified in ciliary gene promoters of select animal species. For every sequence motif, thedistance upstream of the putative translational start site (for annotated genes) or upstream of the most highly conserved 5′ region (identified by BLASTanalyses) is indicated. The experimentally verified function of each ciliary gene is depicted in quotation marks above each column. A position-weight consensussequence for each X-box promoter motif was generated for each set of orthologous genes.
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 4 of 7
Table
S1.
SummaryofRFX
TFsfrom
allorgan
ismswithfunctionally
characterize
dhomologs
Organ
ism
Upstream
regulatora
RFX
TFs
Domain
structure
b
Expressionpatterns
Downstream
target
gen
ese
X-box
consensusf
Ref(s).
SAGEc
Experim
entald
Mam
mals
H.sapiensan
dM.musculus
–RFX
1-A
-DBD-B-C-D
-Brain/broad
Brain
FGF1
,ALM
S1GTN
RCCN0–3RGYAAC
1,2,
3A-M
YB
RFX
2-A
-DBD-B-C-D
-Brain
andtestis/broad
Testis
IL5R
A,SP
AG6,
PDCL2
,ALF
–4
NOTO
RFX
3-A
-DBD-B-C-D
-Brain/broad
Brain
BBS4
,DYNC2L
I1,DNAHC11
,DNAHC5,
DNAHC9,
FOXJ1
GTY
BYCN1–4GRMAAC
5,6
–RFX
4---D
BD-B-C-D
-Brain/testis
Brain
CX3C
L1,IFT1
72–
7,8
–RFX
5---D
BD--------
Brain/broad
Broad
HLA
-DOA,HLA
-DOB,HLA
-DP,
HLA
-DR,HLA
-DQ
–9,
10
NGN3
RFX
6---D
BD-B-C-D
-Pa
ncrea
s/hea
rtan
dliv
erPa
ncrea
s–
–11
,12
–RFX
7---D
BD--------
Brain/broad
––
–13
Flies
Drosophila
melan
ogaster
–dRFX
---D
BD-B-C-D
-–
Nervo
ussystem
andbrain
>15
cilia
rygen
etargets
GYTR
YY
N1–3RRHRAC
14,15
–dRFX
2---D
BD--------
–Ey
e-
-16
Nem
atodes
C.eleg
ans
–DAF-19
---D
BD-B-C-D
-Ciliated
sensory
neu
rons
Ciliated
sensory
neu
rons
>30
cilia
rygen
etargets
GTH
NYY
N1–2RRNAAC
17–20
Fungi
S.cerevisiae
S.pombe
Crt1
Crt1
---D
BD-B------
n/a
n/a
>10
noncilia
rygen
etargets
-TYKYY
N1–2GRGAAC
21,22
–Sa
k1---D
BD-B-C-D
-n/a
n/a
--
23
aMost
likelycandidateupstream
regulators
asindicated
byex
perim
entalev
iden
ce.Note:Mam
malianupstream
target
gen
esfollo
wthehuman
nomen
clature.
bDomainsinclude(A
)activa
tiondomain,DNA-bindingdomain(D
BD),(B)domainB,(C)domainC,an
d(D
)dim
erizationdomain.Note:Homologiesto
fungal
domainsB,C,an
dD
aresignificantlywea
ker.
c SAGE:
Enrich
ed/id
entified
expressionpatternsreve
aled
from
serial
analysisofgen
eex
pressionas
dep
ictedin
Aftab
etal.(13)
andBlacq
ueet
al.(18).
dEx
perim
entally
verified
expressionpatterns.
ePu
tative
andve
rified
RFX
targetsbased
onex
perim
entalan
alyses;kn
owncilia
rygen
esarein
boldface.Note:Mam
maliandownstream
target
gen
esfollo
wthehuman
nomen
clature.
f Consensussequen
ceswereex
tractedorderived
(when
more
than
five
target
gen
eswerekn
own)from
therespective
references.Dash(–),unkn
own;n/a,notap
plicab
le.
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 5 of 7
Table S2. List of ciliary gene promoters that harbor experimentally proven X-box promotermotifs used to construct a hidden Markov model (HMM) training set for the identification ofnovel candidate X-box promoter motifs
Organism Gene X-box sequence Ref(s).
H. sapiens DNAHC9 GTTGCT A–– GGACAC 1H. sapiens DYNC2LI1 GCTCCC AT– GGCAAC 1H. sapiens DNAHC11 CGTCCC CCG GGAAAC 1H. sapiens BBS4 GTCGTC TG– GGAAAC 1H. sapiens FOXJ1 GTCTCC AAG GAGACC 1H. sapiens TRAF3IP1 GTTGCT AA– GGCCGC 2, 3a
D. melanogaster Dosm-6 GTTGCC G–– GGCAAC 4D. melanogaster CG30441 GTTGTC AAT AGCAAC 4D. melanogaster CG3769 GTTGCT AGT AGCAAC 4D. melanogaster CG9227 GTTACT TT– GACAAC 4D. melanogaster CG1126 GTTGCC T–– AGCAAC 4C. elegans daf-10 ATCTCC AT– AGCAAC 5, 6C. elegans xbx-1 GTTTCC AT– GGTAAC 7C. elegans osm-6 GTTACC AT– AGTAAC 8C. elegans ifta-1 GTTGCC A–– GGCAAT 2C. elegans che-2 GTTGTC AT– GGTGAC 8C. elegans ift-81 GTTGCC CT– GGTAAC 2, 9
aX-box was deduced through the C. elegans ortholog dyf-11.
1. El Zein L, et al. (2009) RFX3 governs growth and beating efficiency of motile cilia in mouse and controls the expression of genes involved in human ciliopathies. J Cell Sci 122:3180–3189.2. Blacque OE, et al. (2005) Functional genomics of the cilium, a sensory organelle. Curr Biol 15:935–941.3. Efimenko E, et al. (2005) Analysis of xbx genes in C. elegans. Development 132:1923–1934.4. Laurençon A, et al. (2007) Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species. Genome Biol 8:R195.5. Chen N, et al. (2006) Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biol 7:R126.6. Bell LR, Stone S, Yochem J, Shaw JE, Herman RK (2006) The molecular identities of the Caenorhabditis elegans intraflagellar transport genes dyf-6, daf-10 and osm-1. Genetics 173:
1275–1286.7. Schafer JC, Haycraft CJ, Thomas JH, Yoder BK, Swoboda P (2003) XBX-1 encodes a dynein light intermediate chain required for retrograde intraflagellar transport and cilia assembly in
Caenorhabditis elegans. Mol Biol Cell 14:2057–2070.8. Swoboda P, Adler HT, Thomas JH (2000) The RFX-type transcription factor DAF-19 regulates sensory neuron cilium formation in C. elegans. Mol Cell 5:411–421.9. Kobayashi T, Gengyo-Ando K, Ishihara T, Katsura I, Mitani S (2007) IFT-81 and IFT-74 are required for intraflagellar transport in C. elegans. Genes Cells 12:593–602.
1. Emery P, Durand B, Mach B, Reith W (1996) RFX proteins, a novel family of DNA binding proteins conserved in the eukaryotic kingdom. Nucleic Acids Res 24:803–807.2. Hsu YC, Liao WC, Kao CY, Chiu IM (2010) Regulation of FGF1 gene promoter through transcription factor RFX1. J Biol Chem 285:13885–13895.3. Purvis TL, et al. (2010) Transcriptional regulation of the Alström syndrome gene ALMS1 by members of the RFX family and Sp1. Gene 460:20–29.4. Horvath GC, Kistler MK, Kistler WS (2009) RFX2 is a candidate downstream amplifier of A-MYB regulation in mouse spermatogenesis. BMC Dev Biol 9:63.5. Beckers A, Alten L, Viebahn C, Andre P, Gossler A (2007) The mouse homeobox gene Noto regulates node morphogenesis, notochordal ciliogenesis, and left right patterning. Proc Natl
Acad Sci USA 104:15765–15770.6. El Zein L, et al. (2009) RFX3 governs growth and beating efficiency of motile cilia in mouse and controls the expression of genes involved in human ciliopathies. J Cell Sci 122:3180–3189.7. Zhang D, et al. (2006) Identification of potential target genes for RFX4_v3, a transcription factor critical for brain development. J Neurochem 98:860–875.8. Ashique AM, et al. (2009) The Rfx4 transcription factor modulates Shh signaling by regional control of ciliogenesis. Sci Signal 2:ra70.9. Reith W, LeibundGut-Landmann S, Waldburger JM (2005) Regulation of MHC class II gene expression by the class II transactivator. Nat Rev Immunol 5:793–806.10. Seguín-Estévez Q, et al. (2009) The transcription factor RFX protects MHC class II genes against epigenetic silencing by DNA methylation. J Immunol 183:2545–2553.11. Smith SB, et al. (2010) Rfx6 directs islet formation and insulin production in mice and humans. Nature 463:775–780.12. Soyer J, et al. (2010) Rfx6 is an Ngn3-dependent winged helix transcription factor required for pancreatic islet cell development. Development 137:203–212.13. Aftab S, Semenec L, Chu JS, Chen N (2008) Identification and characterization of novel human tissue-specific RFX transcription factors. BMC Evol Biol 8:226.14. Dubruille R, et al. (2002) Drosophila regulatory factor X is necessary for ciliated sensory neuron differentiation. Development 129:5487–5498.15. Laurençon A, et al. (2007) Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species. Genome Biol 8:R195.16. Otsuki K, Hayashi Y, Kato M, Yoshida H, Yamaguchi M (2004) Characterization of dRFX2, a novel RFX family protein in Drosophila. Nucleic Acids Res 32:5636–5648.17. Efimenko E, et al. (2005) Analysis of xbx genes in C. elegans. Development 132:1923–1934.18. Blacque OE, et al. (2005) Functional genomics of the cilium, a sensory organelle. Curr Biol 15:935–941.19. Williams CL, Winkelbauer ME, Schafer JC, Michaud EJ, Yoder BK (2008) Functional redundancy of the B9 proteins and nephrocystins in Caenorhabditis elegans ciliogenesis. Mol Biol
Cell 19:2154–2168.20. Efimenko E, et al. (2006) Caenorhabditis elegans DYF-2, an orthologue of human WDR19, is a component of the intraflagellar transport machinery in sensory cilia. Mol Biol Cell 17:
4801–4811.21. Huang M, Zhou Z, Elledge SJ (1998) The DNA replication and damage checkpoint pathways induce transcription by inhibition of the Crt1 repressor. Cell 94:595–605.22. Zaim J, Speina E, Kierzek AM (2005) Identification of newgenes regulated by the Crt1 transcription factor, an effector of the DNA damage checkpoint pathway in Saccharomyces cerevisiae. J Biol
Chem 280:28–37.23. Wu SY, McLeod M (1995) The sak1+ gene of Schizosaccharomyces pombe encodes an RFX family DNA-binding protein that positively regulates cyclic AMP-dependent protein kinase-
mediated exit from the mitotic cell cycle. Mol Cell Biol 15:1479–1488.
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 6 of 7
Table S3. Average number of ciliary genes that contain X-box DNA footprints or proven X-box promoter motifs in endogenous andsequence-scrambled 1-kb promoter regions from various unikont and nonunikont organisms
Phylum Organism
Endogenousa Scrambledb
n P valuecAvg. STDV Avg. STDV
Unikonts Chordata H. sapiens 0.67 0.49 0.08 / 0.00 0.29 / 0.00 12 ***0.002 / <0.001 With RFXM. musculus 0.67 0.49 0.00 / 0.00 0.00 / 0.00 12 ***<0.001 / <0.001
D. rerio 0.27 0.47 0.09 / 0.18 0.30 / 0.40 11 0.297 / 0.634C. intestinalis 0.00 0.00 0.08 / 0.08 0.29 / 0.29 12 0.350 / 0.350
Echinodermata S. purpuratus 0.11 0.33 0.00 / 0.11 0.00 / 0.33 9 0.332 / 1.000Arthropoda D. melanogaster 0.89 0.33 0.11 / 0.00 0.33 / 0.00 9 ***<0.001 / <0.001
D. pulex 0.64 0.50 0.09 / 0.09 0.30 / 0.30 11 ***0.005 / 0.005Nematoda C. elegans 0.80 0.42 0.10 / 0.00 0.32 / 0.00 10 ***0.005 / <0.001Annelida H. robusta 0.33 0.49 0.00 / 0.00 0.00 / 0.00 12 **0.029 / 0.029
C. sp. I 0.18 0.40 0.00 / 0.00 0.00 / 0.00 11 0.151 / 0.151Cnidaria N. vectensis 0.17 0.39 0.08 / 0.00 0.29 / 0.00 12 0.528 / 0.145Placozoa T. adhaerens 0.45 0.52 0.00 / 0.00 0.00 / 0.00 11 ***0.010 / 0.010
Choanozoa M. brevicollis 0.17 0.39 0.17 / 0.33 0.39 / 0.49 12 1.000 / 0.386Chytridiomycota B. dendrobatidis 0.00 0.00 0.00 / 0.00 0.00 / 0.00 9 — Without RFX
Nonunikonts Chlorophyta V. carteri 0.17 0.39 0.17 / 0.08 0.39 / 0.29 12 1.000 / 0.528C. reinhardtii 0.08 0.29 0.00 / 0.08 0.00 / 0.29 12 0.350 / 1.000M. sp. RC299 0.00 0.00 0.00 / 0.00 0.00 / 0.00 12 —
Ciliophora T. thermophila 0.00 0.00 0.00 / 0.00 0.00 / 0.00 11 —
P. sojae 0.08 0.29 0.08 / 0.08 0.29 / 0.29 12 1.000 / 1.000Heterokontophyta P. tetraurelia 0.08 0.29 0.00 / 0.17 0.00 / 0.39 12 0.350 / 0.538
Eulenozoa T. brucei 0.09 0.30 0.18 / 0.00 0.40 / 0.00 11 0.553 / 0.332Metamonada T. vaginalis 0.00 0.00 0.08 / 0.08 0.29 / 0.29 12 0.350 / 0.350Percolozoa N. gruberi 0.00 0.00 0.09 / 0.09 0.30 / 0.30 11 0.332 / 0.332
***99% confidence interval, **95% confidence interval. Avg., average; STDV, SD; n, no. of ciliary genes sampled.aPromoter regions sampled include B9D2, BBS1, BBS5, WDR10, WDR35, IFT52, IFT81, IFT172, SPAG6, CCDC147, KAP1, and IFT74.bAll sampled promoter regions were individually scrambled and resampled identically, maintaining mono-/dinucleotide frequencies.cResults from a comparison of means t test of endogenous and scrambled mono-/dinucleotide promoter regions.
Dataset S1. BLAST analysis survey of the presence or absence of the RFX DNA binding domain in the genomes of >120 eukaryotic organisms.
Dataset S1
Dataset S2. List of X-box DNA footprints or proven X-box motifs identified in various unikont gene promoters using a hidden Markov model (HMM) pre-diction method.
Dataset S2
Piasecki et al. www.pnas.org/cgi/content/short/0914241107 7 of 7