supporting information - pnas · 7/1/2010 · fig. s3. alignment of x-box dna footprints or proven...

Supporting InformationPiasecki et al. 10.1073/pnas.0914241107SI Materials and MethodsProtein Sequence Identifications. Using T-BLASTN (http://blast.ncbi.nlm.nih.gov/Blast.cgi) analyses, we searched for similaritiesto the 76-aa residues DNA-binding domain of human RFX3 (GI:57209027) in the publicly available genome sequences of 121eukaryotic organisms (Dataset S1). Genome sequence data wereavailable at the Joint Genomes Institute, Department of Energy(Walnut Creek, CA), the J. Craig Venter Institute (Rockville,MD), theWellcome Trust Sanger Institute (Cambridge, England,UK), the Broad Institute (Cambridge, MA), the Human GenomeSequencing Center, Baylor College of Medicine (Houston, TX),and many other public and privately funded universities andconsortia. Database comparisons were performed betweenMarchand December of 2009. Using default search parameters, thepresence or absence of RFX TFs in the respective genomes be-came readily apparent by their respective E-value scores. Allsampled organisms that yielded E-values lower than E−6 werescored as “containing RFX,” whereas E-values greater than E−2

were scored as “RFX is absent.”NoE-values between E−6 and E−2

were found for any sampled organism. Subsequent T-BLASTNand PSI-BLAST analyses of all RFX TF domains yielded no sig-nificant homology to any prokaryotic or nonunikont organism.Reverse-BLAST comparisons using RFX amino acid sequences(full-length and DNA-binding domains only) from the fungus S.cerevisiae and the amoebozoan A. castellani revealed no significanthomologies in the genomes of any non-RFX-containing organism.For full-length comparisons, BLASTP was used to identify

homologs of the complete human RFX3 sequence in the NCBIdatabase and in select organisms in the JGIdatabase.A single best-hit sequence was extracted from all sampled organisms. No NCBIdatabase submission was found for the single RFXTF homolog intheA. castellani genome identified by T-BLASTN analysis (BaylorCollege of Medicine). Therefore, the amino acid sequence wasmanually assembled. We found that the putative RFX TF ho-molog in A. castellani was encoded on two overlapping contigs,793 and 941, in the current assembly of the genome (2008-01-30).These contigs were merged, and Fgenesh-M v2.6 (http://linux1.softberry.com/all.htm) was used to predict the coding regionsusing a model designed for the fungal genus Phyrenophora. Thepredicted A. castellani RFX gene includes a nine-exon codingregion, which was used to assemble the putative amino acid se-quence.

Phylogenetic Comparisons. All extracted protein sequences werealigned using MUSCLE v3.5 multiple-sequence comparison bylog expectation or, for phylogenetic tree construction, a MAFFT

v6.234b multiple-sequence alignment for amino acid or nucleo-tide sequences program with an L-INS-I strategy (1, 2). Toidentify conserved domains, aligned sequences were viewed inBioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Forgenerating phylogenetic trees, informative characters were firstselected using GBlocks v0.91b (3). A maximum-likelihood phy-logenetic tree was then generated using RAxML v7.0.3 witha PROTMIXWAG model and 100 bootstrap replicates (4) andviewed using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

X-Box DNA Footprint Identifications. When possible, the trans-lational start site for each genewasdeducedbasedon the respectiveNCBI database submission annotation. For genes without anno-tation,weonly sampled thepromoter regionsofgeneswith strong5′end conservation, which were deduced from T-BLASTN analysesof the amino acid sequences encoded by each of the respectiveciliary genes. Only predicted proteins with strong conservationwithin the first 30 amino acid residues from the beginning of thequery protein were used. X-box promoter motif searches wereconducted using a hidden Markov model (HMM) approachwith HMMER 2.3.2 (http://hmmer.janelia.org/). Mononucleotidesequence scrambling was conducted using the Sequence Manipu-lation Suite (SMS) software (http://www.bioinformatics.org/sms/).Dinucleotide sequence scrambling was conducted using theuShuffle software (5). As an additional negative control, we con-structed two independent replicate HMM training sets in which the15 aligned nucleotide positions of the 17 X-boxes that were used toconstruct the original HMM model (Table S2) were randomized,but in which the respective nucleotide composition at each positionwas maintained. The resulting two randomized profile HMMmodels were then used to query against all 349 endogenous pro-moter regions sampled in this study, revealing false-positive rates of3.7% and 4.5%, respectively (see also Fig. 3B for comparison). Fordata presented in Fig. 3, the standard error of percent was calcu-lated for the percentage of sampled organisms that contain X-boxDNA footprints in the promoter of each of the respective genes(Fig. 3B) and for the percentage of ciliary gene promoters thatcontain X-box DNA footprints in the complete sampling of genesfor each of the respective organisms (Fig. 3C). Consensusmotif illus-trations were constructed using WebLogo V2.8.2 (http://weblogo.berkeley.edu/).

Nomenclature. Gene and protein names follow the human no-menclature convention except when referring to a specific or-ganism.

1. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and highthroughput. Nucleic Acids Res 32:1792–1797.

2. Katoh K, Kuma K, Miyata T, Toh H (2005) Improvement in the accuracy of multiplesequence alignment program MAFFT. Genome Inform 16:22–33.

3. Castresana J (2000) Selection of conserved blocks from multiple alignments for theiruse in phylogenetic analysis. Mol Biol Evol 17:540–552.

4. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyseswith thousands of taxa and mixed models. Bioinformatics 22:2688–2690.

5. Jiang M, Anderson J, Gillespie J, Mayne M (2008) uShuffle: A useful tool for shufflingbiological sequences while preserving the k-let counts. BMC Bioinformatics 9:192.

Piasecki et al. www.pnas.org/cgi/content/short/0914241107 1 of 7

http://blast.ncbi.nlm.nih.gov/Blast.cgi

http://blast.ncbi.nlm.nih.gov/Blast.cgi

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.0914241107/-/DCSupplemental/sd01.xls

http://linux1.softberry.com/all.htm

http://linux1.softberry.com/all.htm

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

http://tree.bio.ed.ac.uk/software/figtree/

http://hmmer.janelia.org/

http://www.bioinformatics.org/sms/

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.0914241107/-/DCSupplemental/pnas.200914241SI.pdf?targetid=nameddest=ST2

http://weblogo.berkeley.edu/

http://weblogo.berkeley.edu/

www.pnas.org/cgi/content/short/0914241107

Fig. S1. Maximum-likelihood phylogenetic tree of the full-length amino acid sequences of RFX TFs from select unikont organisms. Informative characters froma MAFF-LINSI sequence alignment were selected using GBlocks. The phylogenetic tree was generated using RAxML with a PROTMIXWAG algorithm and vi-sualized using FigTree. The values generated from 100 bootstrap replicates are depicted at each node. Names of taxa are listed together with their respectiveNCBI/JGI protein identification numbers, when available. The basal amoebozoan A. castellani was used to root the tree.



Fig. S2. Multiple-sequence alignment of ciliary protein B9D2 amino acid sequences from various unikont and nonunikont organisms. A BLOSUM62 matrix wasused for identity and similarity shading using a 70% threshold value. No B9D2 gene homologs were identified in the genomes of any nonciliated organism,such as yeast or the plant Arabidopsis, respectively.



Fig. S3. Alignment of X-box DNA footprints or proven X-box motifs identified in ciliary gene promoters of select animal species. For every sequence motif, thedistance upstream of the putative translational start site (for annotated genes) or upstream of the most highly conserved 5′ region (identified by BLASTanalyses) is indicated. The experimentally verified function of each ciliary gene is depicted in quotation marks above each column. A position-weight consensussequence for each X-box promoter motif was generated for each set of orthologous genes.



Table

S1.

SummaryofRFX

TFsfrom

allorgan

ismswithfunctionally

characterize

dhomologs

Organ

ism

Upstream

regulatora

RFX

TFs

Domain

structure

b

Expressionpatterns

Downstream

target

gen

ese

X-box

consensusf

Ref(s).

SAGEc

Experim

entald

Mam

mals

H.sapiensan

dM.musculus

–RFX

1-A

-DBD-B-C-D

-Brain/broad

Brain

FGF1

,ALM

S1GTN

RCCN0–3RGYAAC

1,2,

3A-M

YB

RFX

2-A

-DBD-B-C-D

-Brain

andtestis/broad

Testis

IL5R

A,SP

AG6,

PDCL2

,ALF

–4

NOTO

RFX

3-A

-DBD-B-C-D

-Brain/broad

Brain

BBS4

,DYNC2L

I1,DNAHC11

,DNAHC5,

DNAHC9,

FOXJ1

GTY

BYCN1–4GRMAAC

5,6

–RFX

4---D

BD-B-C-D

-Brain/testis

Brain

CX3C

L1,IFT1

72–

7,8

–RFX

5---D

BD--------

Brain/broad

Broad

HLA

-DOA,HLA

-DOB,HLA

-DP,

HLA

-DR,HLA

-DQ

–9,

10

NGN3

RFX

6---D

BD-B-C-D

-Pa

ncrea

s/hea

rtan

dliv

erPa

ncrea

s–

–11

,12

–RFX

7---D

BD--------

Brain/broad

––

–13

Flies

Drosophila

melan

ogaster

–dRFX

---D

BD-B-C-D

-–

Nervo

ussystem

andbrain

>15

cilia

rygen

etargets

GYTR

YY

N1–3RRHRAC

14,15

–dRFX

2---D

BD--------

–Ey

e-

-16

Nem

atodes

C.eleg

ans

–DAF-19

---D

BD-B-C-D

-Ciliated

sensory

neu

rons

Ciliated

sensory

neu

rons

>30

cilia

rygen

etargets

GTH

NYY

N1–2RRNAAC

17–20

Fungi

S.cerevisiae

S.pombe

Crt1

Crt1

---D

BD-B------

n/a

n/a

>10

noncilia

rygen

etargets

-TYKYY

N1–2GRGAAC

21,22

–Sa

k1---D

BD-B-C-D

-n/a

n/a

--

23

aMost

likelycandidateupstream

regulators

asindicated

byex

perim

entalev

iden

ce.Note:Mam

malianupstream

target

gen

esfollo

wthehuman

nomen

clature.

bDomainsinclude(A

)activa

tiondomain,DNA-bindingdomain(D

BD),(B)domainB,(C)domainC,an

d(D

)dim

erizationdomain.Note:Homologiesto

fungal

domainsB,C,an

dD

aresignificantlywea

ker.

c SAGE:

Enrich

ed/id

entified

expressionpatternsreve

aled

from

serial

analysisofgen

eex

pressionas

dep

ictedin

Aftab

etal.(13)

andBlacq

ueet

al.(18).

dEx

perim

entally

verified

expressionpatterns.

ePu

tative

andve

rified

RFX

targetsbased

onex

perim

entalan

alyses;kn

owncilia

rygen

esarein

boldface.Note:Mam

maliandownstream

target

gen

esfollo

wthehuman

nomen

clature.

f Consensussequen

ceswereex

tractedorderived

(when

more

than

five

target

gen

eswerekn

own)from

therespective

references.Dash(–),unkn

own;n/a,notap

plicab

le.



Table S2. List of ciliary gene promoters that harbor experimentally proven X-box promotermotifs used to construct a hidden Markov model (HMM) training set for the identification ofnovel candidate X-box promoter motifs

Organism Gene X-box sequence Ref(s).

H. sapiens DNAHC9 GTTGCT A–– GGACAC 1H. sapiens DYNC2LI1 GCTCCC AT– GGCAAC 1H. sapiens DNAHC11 CGTCCC CCG GGAAAC 1H. sapiens BBS4 GTCGTC TG– GGAAAC 1H. sapiens FOXJ1 GTCTCC AAG GAGACC 1H. sapiens TRAF3IP1 GTTGCT AA– GGCCGC 2, 3a

D. melanogaster Dosm-6 GTTGCC G–– GGCAAC 4D. melanogaster CG30441 GTTGTC AAT AGCAAC 4D. melanogaster CG3769 GTTGCT AGT AGCAAC 4D. melanogaster CG9227 GTTACT TT– GACAAC 4D. melanogaster CG1126 GTTGCC T–– AGCAAC 4C. elegans daf-10 ATCTCC AT– AGCAAC 5, 6C. elegans xbx-1 GTTTCC AT– GGTAAC 7C. elegans osm-6 GTTACC AT– AGTAAC 8C. elegans ifta-1 GTTGCC A–– GGCAAT 2C. elegans che-2 GTTGTC AT– GGTGAC 8C. elegans ift-81 GTTGCC CT– GGTAAC 2, 9

aX-box was deduced through the C. elegans ortholog dyf-11.

1. El Zein L, et al. (2009) RFX3 governs growth and beating efficiency of motile cilia in mouse and controls the expression of genes involved in human ciliopathies. J Cell Sci 122:3180–3189.2. Blacque OE, et al. (2005) Functional genomics of the cilium, a sensory organelle. Curr Biol 15:935–941.3. Efimenko E, et al. (2005) Analysis of xbx genes in C. elegans. Development 132:1923–1934.4. Laurençon A, et al. (2007) Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species. Genome Biol 8:R195.5. Chen N, et al. (2006) Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biol 7:R126.6. Bell LR, Stone S, Yochem J, Shaw JE, Herman RK (2006) The molecular identities of the Caenorhabditis elegans intraflagellar transport genes dyf-6, daf-10 and osm-1. Genetics 173:

1275–1286.7. Schafer JC, Haycraft CJ, Thomas JH, Yoder BK, Swoboda P (2003) XBX-1 encodes a dynein light intermediate chain required for retrograde intraflagellar transport and cilia assembly in

Caenorhabditis elegans. Mol Biol Cell 14:2057–2070.8. Swoboda P, Adler HT, Thomas JH (2000) The RFX-type transcription factor DAF-19 regulates sensory neuron cilium formation in C. elegans. Mol Cell 5:411–421.9. Kobayashi T, Gengyo-Ando K, Ishihara T, Katsura I, Mitani S (2007) IFT-81 and IFT-74 are required for intraflagellar transport in C. elegans. Genes Cells 12:593–602.

1. Emery P, Durand B, Mach B, Reith W (1996) RFX proteins, a novel family of DNA binding proteins conserved in the eukaryotic kingdom. Nucleic Acids Res 24:803–807.2. Hsu YC, Liao WC, Kao CY, Chiu IM (2010) Regulation of FGF1 gene promoter through transcription factor RFX1. J Biol Chem 285:13885–13895.3. Purvis TL, et al. (2010) Transcriptional regulation of the Alström syndrome gene ALMS1 by members of the RFX family and Sp1. Gene 460:20–29.4. Horvath GC, Kistler MK, Kistler WS (2009) RFX2 is a candidate downstream amplifier of A-MYB regulation in mouse spermatogenesis. BMC Dev Biol 9:63.5. Beckers A, Alten L, Viebahn C, Andre P, Gossler A (2007) The mouse homeobox gene Noto regulates node morphogenesis, notochordal ciliogenesis, and left right patterning. Proc Natl

Acad Sci USA 104:15765–15770.6. El Zein L, et al. (2009) RFX3 governs growth and beating efficiency of motile cilia in mouse and controls the expression of genes involved in human ciliopathies. J Cell Sci 122:3180–3189.7. Zhang D, et al. (2006) Identification of potential target genes for RFX4_v3, a transcription factor critical for brain development. J Neurochem 98:860–875.8. Ashique AM, et al. (2009) The Rfx4 transcription factor modulates Shh signaling by regional control of ciliogenesis. Sci Signal 2:ra70.9. Reith W, LeibundGut-Landmann S, Waldburger JM (2005) Regulation of MHC class II gene expression by the class II transactivator. Nat Rev Immunol 5:793–806.10. Seguín-Estévez Q, et al. (2009) The transcription factor RFX protects MHC class II genes against epigenetic silencing by DNA methylation. J Immunol 183:2545–2553.11. Smith SB, et al. (2010) Rfx6 directs islet formation and insulin production in mice and humans. Nature 463:775–780.12. Soyer J, et al. (2010) Rfx6 is an Ngn3-dependent winged helix transcription factor required for pancreatic islet cell development. Development 137:203–212.13. Aftab S, Semenec L, Chu JS, Chen N (2008) Identification and characterization of novel human tissue-specific RFX transcription factors. BMC Evol Biol 8:226.14. Dubruille R, et al. (2002) Drosophila regulatory factor X is necessary for ciliated sensory neuron differentiation. Development 129:5487–5498.15. Laurençon A, et al. (2007) Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species. Genome Biol 8:R195.16. Otsuki K, Hayashi Y, Kato M, Yoshida H, Yamaguchi M (2004) Characterization of dRFX2, a novel RFX family protein in Drosophila. Nucleic Acids Res 32:5636–5648.17. Efimenko E, et al. (2005) Analysis of xbx genes in C. elegans. Development 132:1923–1934.18. Blacque OE, et al. (2005) Functional genomics of the cilium, a sensory organelle. Curr Biol 15:935–941.19. Williams CL, Winkelbauer ME, Schafer JC, Michaud EJ, Yoder BK (2008) Functional redundancy of the B9 proteins and nephrocystins in Caenorhabditis elegans ciliogenesis. Mol Biol

Cell 19:2154–2168.20. Efimenko E, et al. (2006) Caenorhabditis elegans DYF-2, an orthologue of human WDR19, is a component of the intraflagellar transport machinery in sensory cilia. Mol Biol Cell 17:

4801–4811.21. Huang M, Zhou Z, Elledge SJ (1998) The DNA replication and damage checkpoint pathways induce transcription by inhibition of the Crt1 repressor. Cell 94:595–605.22. Zaim J, Speina E, Kierzek AM (2005) Identification of newgenes regulated by the Crt1 transcription factor, an effector of the DNA damage checkpoint pathway in Saccharomyces cerevisiae. J Biol

Chem 280:28–37.23. Wu SY, McLeod M (1995) The sak1+ gene of Schizosaccharomyces pombe encodes an RFX family DNA-binding protein that positively regulates cyclic AMP-dependent protein kinase-

mediated exit from the mitotic cell cycle. Mol Cell Biol 15:1479–1488.



Table S3. Average number of ciliary genes that contain X-box DNA footprints or proven X-box promoter motifs in endogenous andsequence-scrambled 1-kb promoter regions from various unikont and nonunikont organisms

Phylum Organism

Endogenousa Scrambledb

n P valuecAvg. STDV Avg. STDV

Unikonts Chordata H. sapiens 0.67 0.49 0.08 / 0.00 0.29 / 0.00 12 ***0.002 / <0.001 With RFXM. musculus 0.67 0.49 0.00 / 0.00 0.00 / 0.00 12 ***<0.001 / <0.001

D. rerio 0.27 0.47 0.09 / 0.18 0.30 / 0.40 11 0.297 / 0.634C. intestinalis 0.00 0.00 0.08 / 0.08 0.29 / 0.29 12 0.350 / 0.350

Echinodermata S. purpuratus 0.11 0.33 0.00 / 0.11 0.00 / 0.33 9 0.332 / 1.000Arthropoda D. melanogaster 0.89 0.33 0.11 / 0.00 0.33 / 0.00 9 ***<0.001 / <0.001

D. pulex 0.64 0.50 0.09 / 0.09 0.30 / 0.30 11 ***0.005 / 0.005Nematoda C. elegans 0.80 0.42 0.10 / 0.00 0.32 / 0.00 10 ***0.005 / <0.001Annelida H. robusta 0.33 0.49 0.00 / 0.00 0.00 / 0.00 12 **0.029 / 0.029

C. sp. I 0.18 0.40 0.00 / 0.00 0.00 / 0.00 11 0.151 / 0.151Cnidaria N. vectensis 0.17 0.39 0.08 / 0.00 0.29 / 0.00 12 0.528 / 0.145Placozoa T. adhaerens 0.45 0.52 0.00 / 0.00 0.00 / 0.00 11 ***0.010 / 0.010

Choanozoa M. brevicollis 0.17 0.39 0.17 / 0.33 0.39 / 0.49 12 1.000 / 0.386Chytridiomycota B. dendrobatidis 0.00 0.00 0.00 / 0.00 0.00 / 0.00 9 — Without RFX

Nonunikonts Chlorophyta V. carteri 0.17 0.39 0.17 / 0.08 0.39 / 0.29 12 1.000 / 0.528C. reinhardtii 0.08 0.29 0.00 / 0.08 0.00 / 0.29 12 0.350 / 1.000M. sp. RC299 0.00 0.00 0.00 / 0.00 0.00 / 0.00 12 —

Ciliophora T. thermophila 0.00 0.00 0.00 / 0.00 0.00 / 0.00 11 —

P. sojae 0.08 0.29 0.08 / 0.08 0.29 / 0.29 12 1.000 / 1.000Heterokontophyta P. tetraurelia 0.08 0.29 0.00 / 0.17 0.00 / 0.39 12 0.350 / 0.538

Eulenozoa T. brucei 0.09 0.30 0.18 / 0.00 0.40 / 0.00 11 0.553 / 0.332Metamonada T. vaginalis 0.00 0.00 0.08 / 0.08 0.29 / 0.29 12 0.350 / 0.350Percolozoa N. gruberi 0.00 0.00 0.09 / 0.09 0.30 / 0.30 11 0.332 / 0.332

***99% confidence interval, **95% confidence interval. Avg., average; STDV, SD; n, no. of ciliary genes sampled.aPromoter regions sampled include B9D2, BBS1, BBS5, WDR10, WDR35, IFT52, IFT81, IFT172, SPAG6, CCDC147, KAP1, and IFT74.bAll sampled promoter regions were individually scrambled and resampled identically, maintaining mono-/dinucleotide frequencies.cResults from a comparison of means t test of endogenous and scrambled mono-/dinucleotide promoter regions.

Dataset S1. BLAST analysis survey of the presence or absence of the RFX DNA binding domain in the genomes of >120 eukaryotic organisms.

Dataset S1

Dataset S2. List of X-box DNA footprints or proven X-box motifs identified in various unikont gene promoters using a hidden Markov model (HMM) pre-diction method.

Dataset S2





supporting information - pnas · 7/1/2010 · fig. s3. alignment of x-box dna footprints or proven...

Documents