protocol rapid curation of gene disruption collections ... · quorum sensing, or, in our case,...

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

protocol

2110 | VOL.12 NO.10 | 2017 | nature protocols


IntroDuctIonDevelopment of the protocol and applications of the methodOver the past decade, next-generation sequencing has markedly improved the accessibility of genetic information. However, the percentage of genes of unknown function in a genome sequenced today remains approximately the same as it did a decade ago, 30–40% (refs. 1,2). This situation is particularly acute for the most esoteric microbes, which offer the most unique resources to genetic engineering and synthetic biology3.

Genetic screening of arrayed collections of clonal isolates remains the preferred method of characterization for many non-fitness-related microbial phenotypes, including drug targets4, virulence factors5–7, and secondary and cryptic metabolism8,9. Genetic screens conducted with whole-genome knockout collec-tions such as the Yeast Knockout Collection (YKO)10 and the Keio Collection of Escherichia coli gene deletion mutants11 set the gold standard for gene function discovery12. However, construction of these collections requires extremely large technical, time, and cost investments12. As a result, only a small number of their type have been built to date13. This has motivated the development of meth-ods that use massively parallel (next-generation) sequencing and combinatorial pooling14 to markedly reduce the cost and increase the ease of annotation of arrayed collections of mutants created by random transposon mutagenesis15,16. These exciting break-throughs have facilitated the construction of a growing number of condensed curated gene knockout collections of pathogenic organisms and surrogates17–19.

Despite the considerable cost- and labor-saving advantages of recent combinatorial pooling methods, their reliance upon liq-uid-handling robotics remains an obstacle to widespread adop-tion. This barrier has spurred ongoing development of rapid, easy-to-use, and extremely low-cost methods for combinatorial pooling20, including Knockout Sudoku3, which we have previ-ously introduced. Knockout Sudoku’s name is derived from the similarity of the problem solved by the method—reconstructing the location of genomic coordinates on a grid of wells—to the one that is encountered in the number placement puzzle Sudoku14. Knockout Sudoku3 uses a set of Bayesian inference algorithms that compensate for the low-complexity combinatorial pools produced

by manual methods during the solution of sequencing data, allowing this method to annotate the extremely large random mutant collections needed to ensure high coverage of complicated microbial genomes. The software underlying Knockout Sudoku is available in the Kosudoku package (https://www.github.com/buzbarstow/kosudoku/).

ApplicationsKnockout Sudoku was designed to democratize the creation of whole-genome knockout collections. The low cost and high speed of the method mean that it can be used with organisms normally considered too esoteric to attract the resources necessary to con-struct a whole-genome knockout collection. This also makes it possible to consider building whole-genome knockout collections around parental strains engineered for specific experiments, such as a strain that contains a biosensor to aid in the discovery of genes underlying complex behaviors such as biofilm formation, quorum sensing, or, in our case, extracellular electron transfer3. Finally, the high speed and low cost of the method increase the feasibility of annotating the extremely large transposon mutant collections needed to achieve saturating coverage of extremely large and complex genomes such as those of eukaryotes.

Overview of procedureKnockout Sudoku constructs a curated, condensed, nonredun-dant whole-genome knockout collection by annotation of a highly oversampled progenitor transposon insertion mutant collection and condensation by choosing a single representative disruption mutant for each nonessential gene. A simple 4D combinatorial pooling scheme is used to prepare a library for a massively parallel sequencing experiment (Figs. 1 and 2). The sequencing data set is deconvolved to identify and locate mutants making up the pro-genitor collection3 (Fig. 3). A full list of the symbols and abbrevia-tions used in this protocol can be found in Box 1.

All plates in the progenitor collection are arranged in a virtual, approximately square plate grid and are assigned a plate-row (PR) and a plate-column (PC) coordinate (Fig. 1). The example in Figure 1 shows the pooling of a collection of 1,536 mutants

Rapid curation of gene disruption collections using Knockout SudokuIsao A Anzai1,3, Lev Shaket1,3, Oluwakemi Adesina1,3, Michael Baym2 & Buz Barstow1

1Department of Chemistry, Princeton University, Princeton, New Jersey, USA. 2Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA. 3These authors contributed equally to this work. Correspondence should be addressed to M.B. ([email protected]) or B.B. ([email protected]).

Published online 14 September 2017; doi:10.1038/nprot.2017.073

Knockout sudoku is a method for the construction of whole-genome knockout collections for a wide range of microorganisms with as little as 3 weeks of dedicated labor and at a cost of ~$10,000 for a collection for a single organism. the method uses manual 4D combinatorial pooling, next-generation sequencing, and a Bayesian inference algorithm to rapidly process and then accurately annotate the extremely large progenitor transposon insertion mutant collections needed to achieve saturating coverage of complex microbial genomes. this method is ~100× faster and 30× lower in cost than the next comparable method (In-seq) for annotating transposon mutant collections by combinatorial pooling and next-generation sequencing. this method facilitates the rapid, algorithmically guided condensation and curation of the progenitor collection into a high-quality, nonredundant collection that is suitable for rapid genetic screening and gene discovery.

https://www.github.com/buzbarstow/kosudoku/


http://orcid.org/0000-0003-1303-5598

http://orcid.org/0000-0003-2218-6152

http://dx.doi.org/10.1038/nprot.2017.073

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

protocol

nature protocols | VOL.12 NO.10 | 2017 | 2111

arrayed onto 16 96-well plates, but, in both principle and prac-tice, much larger progenitor collections can be pooled. Aliquots of culture from each well in the collection are dispatched to four pools that uniquely correspond to the coordinates of the well: the well’s row on its plate; the well’s column on its plate; the PR of the well’s plate; and the PC of the well’s plate. For example, a mutant with a disruption at position x in the parental organism’s genome in well F8 on plate 10 is placed into the PR3, PC 2, Row F, and Column 8 pools (as detailed in Fig. 1). The example collec-tion shown in Figure 1 will be pooled into eight row pools (A–H), 12 column pools (1–12), 4 PR pools (PR1–PR4), and 4 PC pools (PC1–PC4) (28 pools in total).

The protocol is optimized for use with a 96-channel pipettor and allows the user to pool a single 96-well plate in only ~1 min, compared with almost 2 h in a robotic scheme16 (Supplementary Video 1). Each plate is pooled with four pipetting operations by the 96-channel pipettor (Fig. 1b and Supplementary Video 1). The first transfers aliquots from the plate to a specialty row 96-well-format plate in which there is no separation between wells on the same row, performing the row pooling operation and combining aliquots from all wells in the collection that have the same row coordinate. The second operation transfers aliquots to a specialty column plate, performing the column pooling opera-tion. The third operation transfers aliquots to an OmniTray—in which there are no barriers between wells—corresponding to the plate’s PR coordinate. This combines aliquots from every well on every plate in the same plate row. Finally, the last operation transfers aliquots to an OmniTray corresponding to the plate’s PC coordinate.

This means that a small team can pool and cryopreserve an ≈40,000-member progenitor collection contained on 417 96-well plates into eight row pools (A–H), 12 column pools (1–12), 20 plate-row pools (PR1–PR20), and 21 plate-column pools (PC1–PC21) (61 pools in total) in just a single day. A progenitor collec-tion of this size is sufficiently large to achieve high coverage of a microbial genome with ~3,500–4,000 nonessential genes3. This pooling method was foreshadowed by the single transposon inser-tion mutant search strategy developed by Lane and Rubin21.

An amplicon library is then constructed from each of these pools using a semi-random nested PCR reaction22,23 that amplifies a portion of the chromosome adjacent to the transposon present in each collection member in each pool (Fig. 4). The same reaction adds Illumina TruSeq flow-cell-binding and read-primer-bind-ing sequences to the 3′ and 5′ ends of the amplicon while replac-ing the standard Illumina index sequence with a custom barcode sequence for each pool, allowing all pools to be combined and sequenced in parallel. Although we chose to ligate adaptors via semi-nested PCR, in theory any technique for high-throughput single-molecule identification of transposon insertion sites would work (e.g., via MmeI13). The amplicon libraries for each pool are then combined and sequenced in a single next-generation-sequencing run. Each read from the resulting sequencing data set is parsed into a transposon insertion location (x, for instance) in the genome by sequence alignment and a pool coordinate (for instance, row F). These parsed reads are then assembled into a pool presence table (Fig. 1c) that contains a line for each trans-poson insertion location seen in the sequencing data set, and enumerates the number of reads with a given pool coordinate. An example pool presence table line for transposon insertion

location x is shown in Figure 1c. The appearance of amplicons within the sequencing data set with a transposon insertion loca-tion of x and barcodes corresponding to pools PR3, PC2, F, and 8 permits the mutant to be located to well F8 on the plate at PR 3, PC 2 in the plate grid (plate 10).

As the progenitor collection size grows, the number of trans-poson mutants that appear at multiple locations rises in tandem. The appearance of identical mutants in multiple wells, along with cross-contamination at the pooling step, complicates mutant location by producing a large number of artifactual assignments relative to the number of real locations3,14. The likelihood of each of these location assignments is calculated by a Bayesian infer-ence algorithm informed by internal self-consistency within the sequencing data set, allowing us to disregard the artifacts and find the real locations of these mutants3 (Fig. 3). Sample input files for the kosudoku software are included as Supplementary data with this article and in the sample input included with the kosudoku package. Sanger sequencing verification of Knockout Sudoku pre-dictions relies on a two-step nested PCR reaction similar to that used for pool amplicon library generation.

The automatic, as compared with manual20, localization of mutants that appear at multiple locations by Knockout Sudoku allows the rapid analysis of large (at least several tens of thousands of mutants) progenitor collections that afford a wide choice of transposon insertion mutants for each gene in a genome with sev-eral thousand members. For example, during construction of the Shewanella oneidensis whole-genome knockout collection, we were able to choose from approximately nine transposon mutants per nonessential gene3. In addition, the computational assessment of cross-contamination in the progenitor collection allows a simul-taneous reduction of the construction time by avoiding colony purification of mutants that do not require it, as well as enabling an increase in the quality of the final condensed collection by allowing more time and effort to be dedicated to colony purifica-tion of the smaller number of mutants that do require it.

Although the complexity of Sudoku puzzles increases rapidly with grid size, as the number in each grid must be consistent with all the numbers in the same column, row, and subgrid, the prob-lem that Knockout Sudoku solves appears to be less constrained and to scale less quickly with collection size. Having a mutant of one type in any well has very little influence on what the other mutants in the same row or column are, especially those more than a single well away. Although we are certain that an upper limit on the maximum collection size that can be analyzed with Knockout Sudoku does exist and is probably set by the ratio of the number of individual molecules that the sequencer can read out to the minimum number of reads required for Bayesian inference, we do not believe that the solution of a 40,000-member collec-tion3 came anywhere close to this limit.

The progenitor collection catalog is used to direct the construc-tion of a nonredundant quality-controlled whole-genome knock-out collection (Figs. 5 and 6). A single representative mutant is selected for each gene disrupted in the progenitor collection by an algorithm that balances the likelihood that the mutant will knock out the function of a gene with the ease of isolation from any other mutants that co-occupy its well. These selected mutants go through a purification triage in which those in singly-occupied wells (i.e., a well that contains a single mutant species) are simply re-arrayed into the first portion of the condensed collection plates.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



Mutants that co-occupy a well but are still desirable because of the proximity of their transposon to the translation start of a gene are colony purified. For each co-occupied well, the Knockout Sudoku algorithm predicts the smallest number of colonies that must be picked to isolate the mutant of interest. These colonies are picked and then added to the condensed collection plates. The entire condensed collection is then repooled and validated by a second round of sequencing and an alternative, orthogonal sequence analysis algorithm. One representative of each type of mutant in the colony purified set is selected for insertion into the second section of the quality-controlled collection; furthermore, mutants

for any genes still lacking representatives are added from the progenitor collection into the third section of the quality-controlled collection (Figs. 5 and 6).

Current limitations and comparison with other methodsEach class of whole-genome knockout collection construction has its own unique set of advantages and disadvantages. Unlike the targeted deletion construction methods used to build the Keio and YKO collections, more modern methods (In-seq15,16, Cartesian pooling–coordinate sequencing (CPCS)20, and Knockout Sudoku3) all rely upon the annotation of large transposon

Col. 8pool

Row Fpool

PR3 pool

PC2pool

PR1

PR2

PR3

1 10-µl aliquots from all eight wells on column 8 on plate 1

10-µl aliquots from all 96wells on all four plates in

plate column 2

PC1 PC2 PC3 PC4

PR4

1 3 4

5 76 8

10

13 15

12

16

PC2OmniTray

PR3 OmniTray

2

10

16

Row tray

Column tray

2

14

119

Mutant x

Mutant x

PR3OmniTray

PC2OmniTray

Column tray

Row tray

(i) All wells arepooled into row

pools using aspecialized row tray

(ii) All wells are pooled intocolumn pools using a

specialized column tray

(iii) All wells are pooled byplate column using an OmniTray shared by all platesin the same plate column

(iv) All wells are pooled byplate row using an OmniTray

shared by all plates in thesame plate row

x F: 24 8: 60 3: 93 2: 93

Transposon location

Row pool:reads

Column pool:reads

Plate row pool: reads

Plate columnpool: reads

Most likely address(es)

F_8_PR3_PC2 (F8, plate 10)

10

b

a

c

Figure 1 | Combinatorial pooling scheme for Knockout Sudoku. (a) Knockout Sudoku uses a manual Cartesian-mapping combinatorial pooling scheme that can be performed rapidly and at low cost with a 96-channel pipettor. The example in this figure shows the combinatorial pooling of a 1,536-member transposon mutant collection contained in 16 96-well plates; however, much larger collections can be pooled. In practice, it is possible to pool a 40,000-member transposon mutant collection on 417 96-well plates in a single day, although in principle even larger collections can be pooled. (a,b) Each 96-well plate in the transposon mutant collection is pooled by four operations with a 96-channel pipettor. (b, i) Aliquots from all wells on the 96-well plate are transferred to a specialized row tray that combines aliquots from the same row into a single channel. Aliquots from all plates in the collection are added here. (ii) Aliquots from all wells on the plate are transferred to a specialized column tray that combines aliquots from the same column into a single channel. Aliquots from all plates in the collection are added here. (iii) Aliquots from all wells on the plate are transferred to a specialized OmniTray that combines all aliquots from the plate. Aliquots from all plates in the same plate column (plates 2, 6, 10, and 14 in this example) are added to this OmniTray. (iv) Aliquots from all wells on the plate are transferred to an OmniTray that combines all aliquots from the plate. Aliquots from all plates in the same plate row (plates 9, 10, 11, and 12 in this example) are added to this OmniTray. In total, these operations take ~1 min to perform. This entire operation is shown in supplementary Video 1. (c) An example pool presence table. The pool presence table is generated from next-generation sequencing of the pooled collection (Step 35). Each line corresponds to a different transposon mutant insertion location and enumerates the number of reads for each pool barcode that is observed in the sequencing data set. The example shown here (using simulated data) contains only a line for the mutant x shown in the figure.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


mutant collections by combinatorial pooling and next-generation sequencing. Although the careful choice of transposon sequence can result in a high probability of gene inactivation, it does not guarantee this with the same certainty as targeted deletion24. For this reason, best practice requires that any genotype–phenotype association identified by a screen of any transposon insertion mutant collection be confirmed by precise deletion or comple-mentation. However, the cost and time savings afforded by these transposon insertion mutant collection annotation methods often outweigh this disadvantage.

Knockout Sudoku, CPCS, and In-seq all rely upon the mari-ner transposon system for gene disruption mutant creation. Overall, this system shows a very low degree of bias in transposon

insertion, other than a small reduction in transposon insertion frequency with increasing distance from the genome’s origin of replication3,25. However, this system requires the presence of an AT dinucleotide sequence for transposon insertion. This means that mutants for some nonessential genes that either completely lack or have low numbers of the AT sequence will not be represented or will appear only infrequently in a collec-tion made by the mariner system3. The Tn5 transposon system has no absolute insertion site requirement and can produce transposon insertion libraries with extremely high insertion densities26. Recent advances in the democratization of the Tn5 system27 raise the possibility of adapting this system for use with Knockout Sudoku and other transposon insertion mutant

Progenitorcollection

batch

Calculate minimum progenitorcollection size

> kosudoku-collectionmc

Calculate maximum batch size

> kosudoku-batch

> kosudoku-egeneindex

Completedprogenitor

collection andpooled

genomic DNA

Yes

Go to troubleshooting

No

Yes Yes

Sequencing analysis≈ 1 d

> kosudoku-seqanalyze

Sequences OK?No

TS

Yes

> kosudoku-poolanalyze

> kosudoku-poolfit

> kosudoku-poolsolve

Provisional progenitorcollection catalog

Yes


No

Validated progenitorcollection catalog

Yes

Yes

Mating ofbatch

≈ 4 d

Steps 8–18

Ampliconlibrary sizeselection

≈ 6 h

Step 29

Picking ofbatch

≈ 1 d per20,000 colonies

Steps 19–21

Steps 25–26

Genomic DNAextraction

≈ 2 h

Box 4

Pooling andcryopreservation of

progenitor collection batch≈ 1 d per 400 96-well

plates

Steps 22–24

Sangersequencing of

amplicons

Step 46

Ampliconlibrary

generation≈ 1 d

Steps 27–28

Amplicon sequenceanalysis

≈ 1 d

Steps 47–49> kosudoku-verify

Amplicongeneration from

randomly chosenmembers of progenitor

collection≈ 1 d

Steps 44–45

Illuminasequencing

Steps 30–31

Colony batchcomplete?

Ampliconsequences match

predictions?

Ampliconlibrary OK?

Progenitorcollectioncomplete?

Poolingcomplete?

Return to step 22

No

Return to step 8

No

Return to step 8

No


No

Steps 1–6

Step 7

Steps 32–43

Box 2

Ampliconlibrary OK?

Box 3 Box 5

Figure 2 | Workflow for creation and annotation of progenitor collection (Steps 1–49). Knockout Sudoku builds a whole-genome knockout collection by annotation of a transposon mutant collection that is suitably large to achieve saturating coverage of the genome of the parental organism (the progenitor collection) through rapid combinatorial pooling and next-generation sequencing. TS, Troubleshooting.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



annotation technologies. However, at the time of writing, if an organism cannot be transformed with a transposon system (typically by conjugation), a transposon insertion mutant col-lection cannot be built for it.

Each of the different combinatorial pooling methods for trans-poson insertion mutant collection annotation comes with its own set of advantages and disadvantages. Robotically pooled methods such as In-seq use a binary scheme to encode the originating loca-tion of each mutant within the sequencing library15. This method uses a robot to place aliquots of the contents of each well in a gridded collection of mutants into a subset of a series of pools. The presence or absence of the mutant sequence in the complete series of pools makes a binary string that encodes the location of the mutant. This is analogous to the encoding of a letter or a string of letters in a series of binary registers in a computer memory or as a series of on–off pulses in a digital communication system. This type of method makes no use of the amount of reads correspond-ing to each mutant in each pool, simply its presence (above a set threshold), or its absence. This and other similar schemes allow for redundancy by encoding the location of the originating well multiple times in the binary string, and error-checking through the inclusion of a checksum bit. As a result, this method is highly tolerant of errors in sample placement and cross-contamination, and it can achieve extremely high levels of reconstruction of the original progenitor collection. Most importantly, this scheme is completely tolerant of sequences that appear more than once in the collection14.

By contrast, Knockout Sudoku and CPCS use Cartesian map-ping schemes to encode the originating location of each mutant in the sequencing library. In comparison with the binary encoding scheme used by In-seq and similar methods, these Cartesian meth-ods are much faster, easier to implement, and lower in cost. We calculated that we achieved 100× time savings and 30× cost reduc-tions relative to In-seq for the construction of the S. oneidensis Sudoku collection3. CPCS uses only widely available multichannel pipettors, which makes it extremely accessible. Knockout Sudoku, by contrast, relies upon a 96-channel pipettor, which, although

lower in cost and faster than many liquid-handling robots, is still a specialty item.

However, the Cartesian mapping methods are less tolerant of sample placement error and cross-contamination than the binary encoding methods, as they allow for no redundancy or checksumming. Although the Cartesian methods use more pools than the binary coding methods, we have not found this to be cost-prohibitive, given current DNA synthesis costs. Most impor-tantly, if used in a binary form, Cartesian mapping methods are completely intolerant of sequences that appear in more than one place14. Samples that appear at more than one originating loca-tion, either because of cross-contamination or random chance in transposon insertion, can map back to many more artifactual locations. During annotation, it is imprudent to simply ignore these sequences for two reasons: first, some of them are the best, if not only, mutants for some genes in the genome; second, some of them may co-occupy wells in the collection with other mutants, and, by avoiding them, a warning sign for cross-contamination is ignored that could mask the results of an assay performed as part of a genome-wide screen. Knockout Sudoku solves this problem by an automated probabilistic solution that uses the number of reads associated with each sequence in each pool to calculate the probability of each of the possible location assignments.

Experimental designAn overview of the Knockout Sudoku method is shown in Figures 1, 2 and 6. Before building a Knockout Sudoku collec-tion, carefully adapt this protocol to your experimental situa-tion by following the guidelines given below, calculate the time required for specific tasks, and thoughtfully schedule them. Some of the planning steps listed here are essential to proceed, whereas others will greatly reduce stress, save time, and produce a higher-quality collection.

Essential: calculation of minimum progenitor collection size (Steps 1–7). Before picking a progenitor collection, calculate the minimum number of members needed to ensure saturating

Box 1 | Symbols and abbreviations symbolsNp Minimum size of progenitor collectionrpick Colony-picking ratendish Number of colonies that can be reliably identified on and picked from a Petri dish or a colony-picker tray.

nplate Number of colonies grown per 96-well plate. For example, if a checkerboard pattern is used, nplate = 48.rpool Pooling rate measured in plates per day.rincubate Rate of incubation of picked colonies in liquid medium.Nbatch The maximum number of mutant colonies that can be processed in a single batch. Typically, this is the smallest of Nplate,

Npick, and Npool.Nplate The maximum number of mutant colonies that can be plated in a single batch.Nincubate The maximum number of plated colonies that can be grown in a single batch.Npick The maximum number of mutant colonies that can be picked in a single batch.Npool The maximum number of mutant colonies that can be pooled in a single batch.∆Tsolid Useful lifetime of plated colonies. Note that this is not simply the time over which colonies can be used to inoculate

liquid medium, but the shorter time over which individual colonies can be discriminated by a robotic picking system and used to inoculate the liquid medium.

∆Tliquid Lifetime of cultures after reaching saturation in the liquid medium under optimal storage conditions.∆Tpick Time needed to pick a batch of colonies.∆Tsaturation Time needed for picked colonies to reach saturating density in liquid medium.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


coverage of the genome of the parental organism. We have found that simple analytical formulas based upon nonessential gene counts tend to underestimate the minimum progenitor collection size and instead recommend algorithmic methods. We have devel-oped two programs, kosudoku-egeneindex and kosudoku-collectionmc, to aid in estimation of the minimum progenitor collection size, Np, through a Monte Carlo algorithm that uses the whole-genome sequence of the parental organism and essentiality data for each open reading frame28.

Essential: transposon mutagenesis and library construction (Steps 8–18). The current version of Knockout Sudoku relies upon generation of a transposon insertion mutant library (in the context of this article, a set of transposon insertion mutants that have not been clonally isolated or arrayed and are typically mixed in liquid suspension) through mutagenesis with a modi-fied pMiniHimar transposon delivered by conjugation with E. coli WM3064 (refs. 3,29). This transposon system has been reliably used to generate single-gene transposon insertion mutants in organisms such as Azospira suillum PS30, Dickeya dadantii31, Francisella tularensis32, Geobacter sulfurreducens33, Marinobacter subterrani34, Myxococcus xanthus35–37, and S. oneidensis3,29. In prin-ciple, any transposon system can be used with Knockout Sudoku, given the suitable modifications to the sequencing library genera-tion primers. If the pMiniHimar system is not satisfactory, recent reviews by Barquist et al.26 and Kwon et al.38 provide an exten-sive resource for possible alternatives. Transposon mutagenesis

is not restricted to prokaryotic systems; for example, proce-dures for constructing transposon insertion mutant libraries for Saccharomyces cerevisiae39 and Arabidopsis thaliana40 are well established. The high speed and low cost of combinatorial pooling in Knockout Sudoku mean that it may well be financially reason-able to annotate saturating coverage of genomes even in large and complex eukaryotic organisms.

If a well-established transposon library construction protocol for the parental organism already exists that can produce at least ten times more mutants than the minimum progenitor collection size, we suggest that it be used. In this case, make sure to modify the amplicon library generation and Sanger sequencing verifica-tion primers to work with the transposon sequence introduced by this system.

If you choose to use the pMiniHimar system, carefully adapt the transposon library construction protocol detailed in Steps 8–18 to the parental organism of the progenitor collection by prototyping. Systematically vary the parameters of the transposon library construction protocol (Steps 8–18) and carefully measure the number of transposon mutants produced (Step 18) until the protocol produces a satisfactory product.

To minimize multiplicity in the progenitor collection, we rec-ommend that the library construction protocol be modified until it yields a progenitor library (the unplated progenitor collection in a liquid medium) that contains at least ten times more mem-bers than will be picked to form the progenitor collection.

In our experience, the transposon insertion mutant generation rate (and hence the total number of mutants produced) is par-ticularly sensitive to the composition of the solid medium used in the mating steps (Steps 14 and 18). Should the transposon library construction protocol not immediately produce a satisfac-tory number of mutants, this should be the first parameter that is varied. If the preferred medium conditions of the transposon donor (E. coli WM3064) and the recipient are substantially dif-ferent, we have found that choosing a medium composition that is a compromise between the two has been successful.

After establishing an adequate transposon insertion mutant library construction protocol, test a pilot transposon insertion library with a range of antibiotic concentrations in both solid and liquid media to select conditions that will avoid biasing the pro-genitor collection toward mutants with higher antibiotic tolerance

Reaction 1

HimarSeq1.2

HimarSeq2.4-xN R2-BC-Adaptor_Yx

CEKG2C-IIIR2

CEKG2D-IIIR2

Reaction 2

5′

5′

3′

3′3′

Finished amplicon

5′ 3′

3′5′

5′

Genome

Genome

Genome TS Y BC TS

TS

Transposon

Transposon

TransposonUA xN

Genome

Figure 4 | Detailed schematic of pool amplicon library generation. Primer sequences for the two-step-nested PCR reaction are color-coded to match the block diagrams shown above or below them. TS (gray) = the Illumina TruSeq adaptor sequence, UA (coral red) = Illumina universal adaptor sequence, xN (dark blue) = a 4- to 7-bp random sequence added to avoid saturation of any color channel while sequencing the transposon sequence shared by all amplicons, and Y BC (lavender) = an 8-bp barcode sequence unique to each pool.

Illumina read 1 (transposon–genome junction) sequences

read1.fastq

Illumina read 2 (pool barcode) sequences

read2.fastq

Barcode sequences and pool assignments

barcodes_and_pools.csv

Bowtie2 reference indices

shewy

Sequencing data setanalysis program


Pool presence table

pool_presence_table.csv

Pool presence tableanalysis program

> kosudoku-poolfit

Bayesian inference parameters

bayesian_parameters.csv

Pool presence tablesolution program

> kosudoku-poolsolve

Progenitor collection summary

progenitor_summary.csv

Progenitor collection catalog

progenitor_catalog.xml

Plate grid layout

plate_grid_progenitor.csv

Figure 3 | Software workflow and important input and output files for progenitor collection catalog solution (Steps 32–43). Knockout Sudoku uses a Bayesian inference algorithm to deconvolve the next-generation sequencing data set produced by combinatorial pooling into the contents of the progenitor collection. This process is enabled by the kosudoku software package. Programs are marked with a “>” symbol.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



while still maintaining sterility and elimination of unmutagenized cells. We plated an S. oneidensis transposon insertion library onto a series of Petri dishes with antibiotic concentrations ranging from 2.5 to 50 µg/ml kanamycin (Supplementary Fig. 1). Based upon the results of this titration, we decided upon a working con-centration of 30 µg/ml kanamycin for creation of the S. oneidensis Sudoku collection. Some organisms that are naturally tolerant of antibiotics may require a much higher working concentration (sometimes as high as 300 µg/ml).

Determine the length of time, ∆Tsolid, over which plated mutant colonies are viable and pickable. For example, whereas E. coli colo-nies can be picked even after a week of storage at 4 °C, S. oneidensis colonies continue to grow at this temperature. This makes colony discrimination by colony-picking robots challenging and reduces the viable life span of cells on a Petri dish to only a few days. For S. oneidensis, we found that colonies were best picked no more than 2 d (preferably 1 d) after plating. Finally, determine the shelf life, ∆Tliquid, of transposon insertion mutants of the parental strain in a liquid medium, as this will set the maximum time available for pooling and cryopreservation of the progenitor collection after picking.

Plating and picking of the full progenitor collection that will be used to produce the whole-genome knockout collection should proceed only after determining an adequate transposon inser-tion mutant library construction protocol, suitable antibiotic concentrations, and the viability time of plated mutant colonies. To minimize multiplicity in the plated progenitor collection, we strongly suggest that you perform the adapted transposon insertion mutant library construction protocol, but do not pause to calculate the transposon insertion mutant density (Step 17), instead relying on the density that was calculated during the opti-mization of the protocol and immediately plating the library.

Robotic picking (Steps 19–21 and 59–62). For organisms with genomes that require large progenitor collections, we recommend the use of colony-picking robots to speed construction. Although they are an expensive investment, some models can be rented at a low cost for several weeks and some core facilities will include them. Before picking a complete collection, adjust the robot to accommodate the sensitivities of the organism while maintaining a high picking rate and minimizing cross-contamination events. We suggest picking into a checkerboard pattern (48 colonies per 96-well plate) to assess the effect of splatter on cross-con-tamination. If this is an issue, adjust the speed of picking pin descent during inoculation. If considerable growth failure dur-ing test picking occurs, increase both the inoculation time and pin cooling time after sterilization. We used a Norgren Systems CP7200 robot to pick the progenitor collection for the S. oneidensis Sudoku collection3. After all adjustments, we achieved an average picking rate, rpick, of ~20,000 colonies per d (or ≈ 1,500 colonies per h). We were able to pick ~200 colonies from each 16-mm Petri dish (ndish).

Growth of collection (Step 20). Use a high-speed, tight-orbit incubator shaker specifically modified for growth of cultures on microwell plates to incubate the picked progenitor collection. Measure the mass of a 96-well plate that is filled with growth medium. Consult the shaker’s documentation or contact its man-ufacturer to determine its maximum safe loading capacity and

divide this by the mass of a filled 96-well plate to calculate the maximum number of plates and hence the number of colonies that can be grown in a single load, Nincubate. Divide the maximum loading capacity by the time needed for the organism to reach saturation, ∆Tsaturation, to calculate the rate of mutant incuba-tion, rincubate. Pay attention to special growth requirements of the organism of interest, including high illumination and high CO2 concentrations. Note that if the progenitor collection plates are not fully filled with colonies, for instance by picking into a checkerboard pattern, there will be almost no effect on picking rate, but the number of colonies that can be incubated in a single load will be substantially reduced.

Estimate pooling rate (Steps 22–24 and 65). Practice pooling a small test collection (using either freshly prepared LB or water with a small amount of detergent in place of microbial culture) with a full team of assistants following the protocol detailed in

1 3 4

5 6 8

10

13

12

16

2

14

11

15

Progenitorcollection

Condensedcollection

Single-occupancywells are

re-arrayed

Multiple-occupancywells are struck outfor single colonies

Colonies picked

Quality-controlledcollection

Second section

Re-array is validated byrepooling and

sequencing

Desired isolated mutantsare identified by

repooling and sequencingand re-arrayed

First section Second section Third section

Second-choicemutants formissing genesadded fromprogenitorcollection

9

7

C1 C3

C4 C6

P1 P25

QC1 QC3 QC4 QC5

First-choice mutantin single-occupancywell

Second-choice mutantin single-occupancywell

First-choice mutant inmultiple-occupancywell

Other mutant

Blank well

First section

Figure 5 | Construction of the condensed and quality-controlled collections from the progenitor collection (Steps 50–76). First-choice mutants for each gene in the genome are identified by the progenitor collection condensation program (Step 51) (purple and purple–orange wells). First-choice mutants that are in single-occupancy wells (purple wells) are re-arrayed to make the first section of the condensed collection (Step 57A(i–ix) or Step 57B(i–vii)). Multiple-occupancy wells that contain first-choice mutants (purple–orange wells) are struck out onto Petri dishes for colony purification. Several colonies (the minimum number likely to contain at least one copy of the desired mutant is calculated by the progenitor collection condensation program (Step 51)) are picked into the wells of the second section of the condensed collection. The entire condensed collection is repooled and resequenced (Steps 63–76). The first section is simply validated, whereas the wells containing first-choice mutants in the second section are identified and one representative for each gene is re-arrayed into the second section of the quality-controlled collection. Finally, the second-choice mutants for any genes that were not identified in the condensed collection by the validation (Step 75) are added from the progenitor collection to the third section of the quality-controlled collection (green wells) (Steps 77–79).

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


Box 2 to estimate your achievable plate pooling rate, rpool. In our experience, a maximum pooling rate of ≈ one 96-well plate per min, or ≈ 400 plates per d when allowing time for cryopreserva-tion, was achievable with a team of five people. Note that if the

progenitor collection plates are not fully filled with colonies, for instance by picking into a checkerboard pattern, there will be no effect on the rate of plates that can be pooled per day but the number of colonies that can be pooled per day will be reduced.

Condensed collection

Yes

Validated condensed collection catalog

Error rate acceptable?

Yes


No

Quality- controlled collection

Amplicon library generation

≈ 1 d

Box 3

Steps 68–69

Illumina sequencing

Sequencing analysis≈ 1 d


Sequences OK?

NoTS

Yes

> kosudoku-isitinthere

Validated progenitor collection catalog

Produce list of desired mutants for condensed

collection

Steps 50–56> kosudoku-condense

All plates colony

purified?

Return to Step 57

Yes

No

Genomic DNA extraction

≈ 2 h

Box 4

Step 67

No


Amplicon library size selection

≈ 6 h

Box 5

Step 70

Yes

No


Steps 77–78

Generate instructions for construction of

quality-controlled collection

> kosudoku-qc

Re-array single representative from colony purified mutant

section of condensed collection and add any

missing mutants

Step 79

Steps 73–76

Steps 63–66

Pooling and re-cryopreservation of condensed collection

≈ 1 d per 400 96-well plates

Box 2

Robot available?

Steps 71–72

Yes

No

Amplicon library OK?

Amplicon library OK?

Manual re-array of mutants

from singly occupied wells

Step 57A

(i–ix)

Robotic re-array of mutants

from singly occupied wells

Step 57B

(i–vii)

Cryopreservation ofre-arrayed mutants

Step 58

Colony purification of mutants from

multiply occupied wells

Steps 59–62

Cryopreservation of re-arrayed mutants

Step 58

Colony purification of mutants from

multiply occupied wells

Steps 59–62

Figure 6 | Workflow for creation of condensed and quality-controlled collections (Steps 50–79). Knockout Sudoku contains a mutant selection algorithm that picks a progenitor collection mutant for each gene that is most likely to disrupt gene function and is most easily isolated. Mutants that show no signs of cross-contamination are re-arrayed into the first section of the condensed collection. Mutants with well co-occupants are colony purified, and the number of colonies needed to isolate the desired mutant are picked into the second section of the condensed collection. The condensed collection is repooled and resequenced to validate the identity of each mutant. Finally, single validated representatives for each desired mutant from the second section of the condensed collection are re-arrayed and combined with the first section of the condensed collection to make the quality-controlled collection. TS, Troubleshooting.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



Calculate maximum pickable and poolable mutant batch size (Step 7). Many microorganisms are highly perishable on agar plates or in liquid medium. Use the useful life span of the parental organism on solid (∆Tsolid) and liquid (∆Tliquid) media as inputs to the kosudoku-batch program in the kosudoku package to estimate the maximum mutant batch size, Nbatch, that can be plated, picked, and pooled without risking the viability of the collection. The maximum batch size is equal to the smallest of the following: the maximum batch that can be plated (Nplate); the maximum batch that can be picked (Npick); and the maximum batch that can be pooled (Npool).

If the minimum progenitor collection size needed to achieve saturating coverage of the genome, Np, exceeds Nbatch, the con-struction of the progenitor collection should be split into at least Np/Nbatch batches of mutants that are plated, picked, pooled, and cryopreserved before another batch is started.

The kosudoku-batch program estimates the maximum batch size and provides guidelines for the maximum picking time per day and the number of days to be dedicated to colony picking. Use of this program is described in Step 7.

It is safe to assume that the maximum number of colonies that can be plated onto agar in a single day is far larger than any other limiting factor. The upper limit on Nplate is more likely to be the maximum number of colonies that can be incubated, Nincubate,

N Nplate incubate≤ ( )1 (1)(1)

We estimate the upper limit on the number of colonies that can be picked, Npick, as the smaller of the maximum picking rate per day and the maximum number that can be incubated per day multiplied by the useful lifetime of plated colonies, ∆Tsolid,

N r r Tpick pick incubate solid= min ( , ) ( )∆ 2

In the event that the life span of the organism on a liquid medium is shorter than that on a solid medium,

N r r Tpick pick incubate liquid= min ( , ) ( )∆ 3

We strongly recommend completing picking before moving on to pooling. If the time needed for picked colonies to reach satu-rating density in liquid medium, ∆Tsaturation, exceeds 24 h for the organism of interest, pick for a single day only, waiting for colonies to reach saturation and immediately moving on to pooling.

Estimate the time available for pooling by considering the maxi-mum viability time of the progenitor collection. This is dependent upon the shelf life of the organism in liquid medium; the time spent picking, ∆Tpick; the amount of rest taken between picking and pooling, ∆Trest; and the time needed for picked colonies to reach saturating density,

∆ ∆ ∆ ∆ ∆T T T T Tpool liquid pick rest saturation≤ − − +( )

(2)(2)

(3)(3)

(4)(4)

Box 2 | Pooling and cryopreservation of transposon insertion mutant collection ● tIMInG ≈1 d per 400 96-well plates or ≈1 min per 96-well plate 1. Set the vertical stop posts on the left-side Liquidator pads so that the tips do not touch the destination plates.2. Wipe down the Liquidator and surrounding workspace with 70% ethanol.3. Label all PR and PC OmniTrays.4. Prepare a stack of PR OmniTrays and a stack of PC OmniTrays in descending order (The PR1 OmniTray should be on the top of the PR stack and the PR20 OmniTray should be on the bottom).5. Prepare a set of row and column plates (96-well plate format trays in which each row or column is an entire well). These can be reused, but we find that it is more convenient to use new ones each time a plate is almost full to avoid spillover between wells. Carefully mark the orientation of these plates.6. Place the PC OmniTray stack next to the Liquidator. Place the PR OmniTray stack further away. Place the first (next) PR OmniTray onto the back-left Liquidator pad and leave it there until all plates in the plate row have been pooled.7. Load the Liquidator with a 20-µl sterile nonfilter tip box (from the back-right pad). Make sure to press down firmly on the loading arm.8. Place the current progenitor collection source plate onto the front-right Liquidator pad.9. Place the row plate onto the front-left Liquidator pad. Withdraw 10 µl of culture from the source plate and deposit into the row plate. Make sure that the tips do not touch the destination plate.10. Place the column plate onto the front-left Liquidator pad. Withdraw 10 µl of culture from the source plate and deposit into the column plate.11. Place the PC tray onto the front-left Liquidator pad. Withdraw 10 µl of culture from the source plate and deposit into the PC tray.12. Move the left-side sliding tray on the Liquidator forward so that the PR tray can now be accessed by the pipettor. Withdraw 10 µl of culture from the source plate and deposit into the PR tray.13. Eject the Liquidator tips and discard. Return to step 6 in this box and place the next plate on the front-right Liquidator pad. Continue this loop until all plates in the plate row are pooled. Set aside the PR tray, as it is complete. Place the next PR OmniTray onto the back-left Liquidator pad and continue. After all plates for that day have been pooled, proceed to cryopreserving the plates as follows.14. Sample the volume of culture left in the source plates (there should be ~50 µl), and add an equal volume of 20% (vol/vol) glycerol using sterile, 200-µl Liquidator filter tips.15. Seal each plate with aluminum sealing foil.16. Place the sealed plates in the shaker incubator and shake them for ~2 min to mix the glycerol. Freeze the plates at −80 °C. Plates can be stored at −80 °C for at least 1 year.17. Transfer the contents of the pooling plates to Falcon tubes for genomic DNA extraction.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


The number of colonies that can be pooled can be estimated as

N r T npool pool pool plate= ∆ ( )5

where nplate is the number of colonies per plate. Thus, the maxi-mum colony batch size that can be processed without risking the viability of any member is

N N N Nbatch plate pick pool= min ( , , ) ( )6

Essential: cryopreservation (Steps 23 and 58). Before com-mencing construction of the progenitor collection, deter-mine the optimal freezing storage conditions necessary for prolonged viability of the target organism. In particular, con-ditions should not only ensure retrieval of the wild type but also be sufficiently mild for compromised mutants. In many cases, these conditions will be well known from literature or experience in the laboratory. For example, it is commonly accepted that the optimal cryopreservation condition for S. oneidensis is 15–20% (vol/vol) glycerol. However, we do rec-ommend testing variations on this condition to find potential improvements in the accuracy and speed of pipetting of the cryoprotectant into microwell plates and optimize compa-tibility with the antibiotic used for selection of transposon insertion mutants.

For construction of the S. oneidensis Sudoku collection, we tested a range of antibiotic and glycerol concentrations for growth and retrieval from frozen storage at −80 °C. We found that a final concentration of 10% (vol/vol) glycerol produced by mix-ing an equal volume of culture containing 30 µg/ml kanamycin with a stock of 20% (vol/vol) glycerol offered the best combina-tion of cryopreservation and fast and accurate pipetting from a 96-channel pipettor. We also independently tested the antibi-otic concentration in media used for resuscitation from cryop-reservation and found that 30 µg/ml kanamycin was adequate. Finally, make sure to test storage and retrieval of mutants of the parental organism in the exact type of plate that will be used for storage of the progenitor collection. Should the cryopreserva-tion conditions used to store the collection substantially deviate from well-established conditions, we suggest that the viability of the collection be tested over long time spans (ranging from months to years).

It is useful to measure the number of freeze–thaw cycles that members of the collection can withstand. If the progenitor collec-tion can tolerate multiple freeze–thaw cycles, the need to prevent it from thawing during re-arraying to produce the condensed collection is removed, allowing either much faster manual re-arraying or even faster robotic re-arraying. We suggest picking a test set of mutants into a 96-well plate and freezing it under the chosen conditions. Thaw this plate and replicate it before refreezing. Repeat this multiple times and note how many wells are successfully replicated after each thaw–freeze cycle. Although the results of these short-term tests do not guarantee the long-term viability of the collection, they will highlight any egregious problems introduced by thawing and refreezing the collection. Simply put, although the tests do not guarantee that you can thaw and freeze the collection multiple times, they will definitely tell you if you cannot. Example data from an experiment of this type can be found in Supplementary Figure 2.

(5)(5)

(6)(6)

Essential: combinatorial pooling layout (Steps 22 and 51). Decide upon a combinatorial pooling layout before picking and pooling of the progenitor collection. Estimate how many colonies you plan to pick and how many plates will be required to store these. Assign each plate to a position in a square grid and prepare labels for each plate that include its number and coordinates in the grid. A sample plate grid is shown in Table 1. The kosudoku-condense code used to select mutants for the condensed col-lection will also provide a plate grid layout for pooling of the condensed collection for sequence validation (Step 51).

Essential: primer design for amplicon library generation (Steps 27 and 68). Primers used for the generation of the amplicon libraries in the construction of the S. oneidensis Sudoku collec-tion3 are shown in Supplementary Table 1. To guide adaptation of these primers to an alternative transposon system, a detailed schematic of the amplicon library generation procedure is shown in Figure 4. Should a transposon system other than pMiniHimar be used, place the defined transposon binding primers as close to the transposon–genome junction as possible to avoid creation of extremely short amplicons that contain no genomic sequence. Ensure that the primer melting temperatures are ~58 °C.

Barcode sequence design (Steps 6, 27 and 68). Supplementary Tables 1 and 2 contain the 64 unique pool barcode sequences used in construction of the S. oneidnesis whole-genome knockout collection3. These sequences are sufficient to label eight row pools (A–H), 12 column pools (1–12), 22 PR pools (PR1–PR22), and 22 PC pools (PC1–PC22). In total, these barcodes will allow labeling of a collection of 22 × 22 = 484 96-well plates containing up to 46,464 transposon insertion mutants. If pooling of a larger trans-poson insertion mutant collection is required, a barcode primer design program (kosudoku-barcodes) has been included in the kosudoku package (Step 6).

Controls and validation (Steps 15, 28, 44–49 and 63–76). Before, during, and after the construction of a whole-genome knockout collection with Knockout Sudoku, we suggest that several valida-tion and control experiments be performed to test the quality of the collection.

Knockout Sudoku is capable of detecting cross-contaminated wells (multiple-occupancy wells) in the progenitor collection and suggesting that the contents of these wells be separated by colony purification or avoided. Because the method relies on a transpo-son system that produces only single-gene disruption mutants, one can be certain that multiple-occupancy wells are not a result of multiple transposon insertions in the same genome but rather indicate the presence of two or more different mutants. Before proceeding with construction of a whole-genome knockout col-lection, if you have any doubt that the transposon system that you are using inserts only into single locations in the genome, we strongly suggest that you build a pilot transposon collection and verify single transposon insertion through next-generation sequencing. Although we were confident that the pMiniHimar sys-tem that we used produces only single-gene insertion mutants29, we nonetheless performed this control before constructing the S. oneidensis Sudoku collection.

During collection construction, we suggest that you perform sev-eral biological controls and validation procedures. When generating

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



taBl

e 1

| Pl

ate

grid

for co

mbi

nato

rial p

oolin

g of

the

S. o

neid

ensi

s pr

ogen

itor

col

lect

ion.

pc01

pc02

pc03

pc04

pc05

pc06

pc07

pc08

pc09

pc10

pc11

pc12

pc13

pc14

pc15

pc16

pc17

pc18

pc19

pc20

pc21

PR01

12

34

56

78

910

1112

1314

1516

1718

1920

21

PR02

2223

2425

2627

2829

3031

3233

3435

3637

3839

4041

42

PR03

4344

4546

4748

4950

5152

5354

5556

5758

5960

6162

63

PR04

6465

6667

6869

7071

7273

7475

7677

7879

8081

8283

84

PR05

8586

8788

8990

9192

9394

9596

9798

9910

010

110

210

310

410

5

PR06

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

PR07

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

PR08

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

PR09

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

PR10

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

PR11

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

PR12

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

PR13

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

PR14

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

PR15

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

PR16

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

PR17

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

PR18

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

PR19

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

PR20

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

Blan

kBl

ank

Blan

kEa

ch o

f th

e 41

7 96

-wel

l pla

tes

that

mak

e up

the

S.

onei

dens

is p

roge

nito

r co

llect

ion

wer

e as

sign

ed p

osit

ions

in a

21-

PC ×

20-

PR g

rid.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


the initial transposon mutant library by conjugation, we suggest that you build no-donor and no-recipient controls and plate these out and test for the absence of transformants (Step 15).

After the generation of pooled amplicon libraries from the progenitor collection (Step 27 and Box 3), we suggest that you run out each pool on a large-format agarose gel and compare the molecular weight distributions with those of the S. oneidensis progenitor collection amplicon pools shown in Supplementary Figure 3 and those of the S. oneidensis condensed collection amplicon pools in Supplementary Figure 4. The molecular weight distributions of the pooled amplicon libraries should be continuous, similar to those shown in Supplementary Figure 3 (i.e., they should look like smears), rather than the discrete bands that appear Supplementary Figure 4. Similarly, after genera-tion of pooled amplicon libraries from the condensed collection (Step 68 and Box 3), we suggest that you run out each pool on a large-format gel and compare the gel image with Supplementary Figure 4. The molecular weight distribution of condensed col-lection amplicon libraries should be similarly discrete as those of the S. oneidensis condensed collection amplicon pools shown in Supplementary Figure 4.

After the initial annotation of the progenitor collection, it is useful to test the predicted identity of several mutants with con-ventional Sanger sequencing (Steps 44–49).

After construction of the condensed collection, Knockout Sudoku includes a resequencing procedure to test the identity of each mutant in the condensed collection (Steps 63–76).

Once construction of the quality-controlled whole-genome knockout collection is complete, we suggest that you perform a genetic screen for a well-known phenotype for that organism. For instance, we tested the S. oneidensis collection for reduction of anthraquinone-2,6-disulfonate, an analog for the redox active moieties found in humic substances that can act as electron accep-tors during anaerobic respiration3,41. The screen of the collection should recapitulate all previously reported results.

Timing estimates and scheduling. The Knockout Sudoku proto-col benefits greatly from careful time management and schedul-ing. Carefully estimate timings for each step, schedule time for the implementation of the PROCEDURE, and closely track adherence to this schedule.

Execution of most steps in Knockout Sudoku can typically be performed with a team of two, but pooling and re-array do benefit from additional personnel. Before beginning any section of this pro-tocol, we predetermine a schedule that ensures these labor-intensive steps fall on weekdays, when help is most easily available.

We have found that interchanging roles during any labor-inten-sive step helps with focus, although some researchers may prefer to stay with a single role for the entire procedure. For example, dur-ing the combinatorial pooling steps (Box 2), we recommend that one person operate the 96-channel pipettor while another acts as a spotter and handles the operator pooling plates. A third person should be responsible for organizing the progenitor collection plates and a fourth should be responsible for sealing and unsealing the plates. Finally, a reserve fifth person should be resting.

The timing estimates given in this protocol are based upon our experience in the construction of the S. oneidensis Sudoku col-lection. Many of the timing estimates for steps involving micro-bial growth assume the use of microorganisms that can grow to saturation overnight. When creating the S. oneidensis Sudoku collection, we made careful measurements of the time needed to reach saturating density on 96-well plates in a high-speed incubator. For S. oneidensis, we found that 20–21 h was typically required to reach saturating density (≈108 colony-forming units (CFU) per ml), as compared with ~16 h for E. coli under similar conditions (≈109 CFU per ml).

Timing estimates for picking of the progenitor collection assumes the use of a Norgren Systems CP7200 colony-picking robot adjusted to accommodate the sensitivities of an organism such as S. oneidensis. We suggest that you perform control experi-ments to determine timing of picking for your specific situation.

Timing estimates for Sanger validation of Knockout Sudoku pre-dictions assume the availability of an overnight sequencing service such as GeneWiz and that samples for sequencing can be delivered to a dropbox before the pickup deadline. Make sure to start this protocol early in the day to ensure that you meet this deadline.

Manual re-arraying can be done by one person if necessary but can be readily parallelized, so this section of the protocol benefits from a large organized team. The kosudoku-condensetiming program will allow you to calculate estimates of the time needed for condensation based upon the number of people, robots, and pipettors available.

MaterIalsREAGENTS

LB medium powder, containing 10 g tryptone, 5 g yeast extract, and 10 g NaCl per liter of rehydrated medium (MP Biomedicals, cat. no. MP113002042 or equivalent or alternative for chosen organism)Kanamycin sulfate (Sigma-Aldrich, cat. no. K1377 or equivalent or alternative for chosen transposon system)Isopropanol, low grade for sterilization (Sigma-Aldrich, cat. no. W292912) ! cautIon Isopropanol is flammable.Glycerol, molecular-biology-grade (Sigma-Aldrich, cat. no. G5516 or equivalent)OneTaq DNA polymerase (New England BioLabs, cat. no. M0480S) crItIcal At the time of writing, we have not tested Knockout Sudoku with any other DNA polymerase and cannot guarantee its success with an alternative choice.Deoxyribonucleotide (dNTP) solution set; 100 mM of dNTP (New England BioLabs, cat. no. N0446S)OneTaq Standard Reaction Buffer (New England BioLabs, cat. no. B9022S)

•

•

•

•

•

•

•

2,6-Diaminopimelic acid (DAP; Sigma-Aldrich, cat. no. D1377-5G) ! cautIon DAP is an irritant. We recommend handling under a fume hood or wearing a dust mask and eye protection while handling.Genomic DNA Mini-prep Kit (Zymo Research, cat. no. D3006) crItIcal At the time of writing, we have not tested Knockout Sudoku with any other genomic DNA extraction kit, and so we cannot guarantee the success of the PROCEDURE when using an alternative kit.Gel DNA Extraction Kit (Zymo Research, cat. no. D4001)DNA Clean and Concentrator Kit, 5 µg capacity (Zymo Research, cat. no. D4003)SYBR Safe DNA Gel Stain (Thermo Fisher Scientific, cat. no. S33102) ! cautIon Although SYBR Safe is less mutagenic than ethidium bromide, we still recom-mend caution when handling and prefer to dispose of it as hazardous waste.6× Gel loading dye; blue (New England BioLabs, cat. no. B7021S)1-kb DNA ladder (New England BioLabs, cat. no. N3232S)Primers (Integrated DNA Technologies; sequences are listed in Supplementary Table 1)

•

•

••

•

•••

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



Agarose (Seakem LE; Lonza, cat. no. 50004 or equivalent)Tris acetate-EDTA (TAE) buffer concentrate (Sigma-Aldrich, cat. no. 67996-10L-F)pMiniHimarFRT transposon delivery plasmid (available from author B.B.)E. coli strain WM3064 (available from W. Metcalf, University of Illinois at Urbana-Champaign)Shewanella oneidensis MR-1 (available from J. Gralnick, University of Minnesota)

EQUIPMENTDurable equipment

Colony-picking robot (Norgren Systems, cat. no. CP7200 or equivalent) ! cautIon Exercise caution when operating the robot, as picking pins can pierce the skin.Shaking incubator, 3-mm orbit, 900 r.p.m. maximum speed (Infors HT; Multitron Std., cat. no. I10003 or equivalent) crItIcal Cultures in microwell plates should be grown in a specially adapted shaking incubator.Humidity control option for 3-mm orbit shaker (Infors HT, cat. no. I52000 or equivalent)Shaking incubator; 25-mm orbit (Multitron Standard; Infors HT, cat. no. I10102 or equivalent)Microplate Post Kit for 3-mm orbit shaker, 7 inch (Infors HT)Microplate spacers for high-speed tight-orbit shaker incubator (Infors HT, cat. no. I20099 or equivalent)Tube rack for shaking incubator, 39 × 18 mm (ATR18; Infors HT, cat. no. 12305 or equivalent)PCR thermocycler, 96-well aluminum block (Eppendorf; Nexus Gradient, cat. no. 6331000025 or equivalent) crItIcal The nested PCR reactions used in Knockout Sudoku seem to be far more tolerant of differences and deficiencies in primer design in PCR cyclers with silver heating blocks. The primers presented here work reliably in a cycler with an aluminum block, but should you need to change them, be aware that you may meet success sooner when using a cycler with a silver block.50-ml Erlenmeyer flask (Corning, cat. no. 7098050, or equivalent)PCR capping aid tool (Eppendorf, cat. no. 951023108 or equivalent)Storage box, airtight, gasketed, 19 liter (Sterilite, cat. no. 1932 or equivalent)Agarose gel electrophoresis box, wide format (Thermo Fisher Scientific, cat. no. D314 or equivalent)Agarose gel electrophoresis box, narrow format (Thermo Fisher Scientific, cat. no. B1A or equivalent)Power supply for gel electrophoresis (Thermo Fisher Scientific, cat. no. EC300XL2 or equivalent)Eight-channel pipettor (2–20 µl; Rainin, cat. no. L8-20XLS+ or equivalent)96-Channel pipettor (5–200 µl; Rainin, cat. no. LIQ-96-200 or equivalent)Petri-dish-pouring robot (Integra Biosciences MediaJet, or equivalent)Microcentrifuge (Eppendorf, cat. no. 5418 or equivalent)IR thermometer gun (62 Max Plus; Fluke, cat. no. FLUKE-62MAX or equivalent)Ice bucket; 9 liter (Diversified Biotech, cat. no. IPAN-3000 or equivalent)Cooling block for PCR microplates (Eppendorf, cat. no. 022510541 or equivalent)Massively parallel sequencing device (Illumina, model no. HiSeq 2500 or equivalent)BioAnalyzer (Agilent, model no. 2100, or equivalent)Gel documentation systemScalpel (Elmers X-acto no. 2 or equivalent)Unix-compatible computer (Apple iMac, 3.5 GHz quad core i7 processor, 16 GB RAM)

ConsumablesPetri dishes, sterile, 15 mm (Fisher Scientific, cat. no. FB0875712) crItIcal The large volume of Petri dishes used in Knockout Sudoku means that robotic plate pouring is extremely useful. The specific brand of Petri dish is not important, but the fit of the dish in the plate-pouring robot is.

••

••

•

•

•

•

•

••

•

•

••••

•

•

•••••

••

•

••••

•

96-Well storage microplates, sterile, polypropylene, U-bottom (USA Scientific, cat. no. 1830-9610) crItIcal The specific brand of 96-well plate is unimportant, but the fit of the plate in the shaking incubator in which they are used is. We recommend finding a plate that fits snugly, but not so tightly that it is difficult to remove, in the shaker that will be used for collection construction.96-Well microplate lids, sterile, low evaporation, polystyrene (Corning, cat. no. 3931)96-Well format row plates, sterile (Phenix Research Products, cat. no. RRI-3028-ST)96-Well format column plates, sterile (Phenix Research Products, cat. no. RRI-3029-ST)96-Well format OmniTray single-well microplates, sterile (Nalgene Nunc, cat. no. 242811)Aluminum foil seals for 96-well microplates, nonsterile (USA Scientific, cat. no. 2923-0100)AeraSeal membranes for 96-well microplates, sterile (Excel Scientific, cat. no. BS-25)Snap cap tubes, 14 ml, sterile (Corning, cat. no. 352001)50-ml Serological pipettes (Genesee Scientific, cat. no. 12-107)10-ml Serological pipettes (Genesee Scientific, cat. no. 12-104)5-ml Serological pipettes (Genesee Scientific, cat. no. 12-102)Liquidator tips, 20 µl, sterile, nonfilter (Rainin, cat. no. LQR-20S)Liquidator tips, 200 µl, sterile, filter (Rainin, cat. no. LQR-20F)Liquidator medium-dispensing reservoirs (Rainin, cat. no. LR-R2-PB-5-S)Pipette tips for gel loading, 20 µl, nonsterile (Rainin, cat. no. SS-L10)Pipette tips for enzyme handling, 20 µl, sterile, filter, low retention (Rainin, cat. no. RT-L10FLR)Pipette tips for PCR preparation, 20 µl, sterile, filter (Rainin, cat. no. RT-L10F)Pipette tips for PCR preparation, 200 µl, sterile, filter (Rainin, cat. no. RT-L200F)Cell-scraping tools, sterile (Corning, cat. no. 353085)Disposable cell spreaders, sterile (Heathrow Scientific, cat. no. HS8151)Rattler Plating Beads, sterile (Zymo Research, cat. no. S1001)96-Well full-skirt PCR microplates (Axygen, cat. no. PCR-96-FS-C)Eight-cap PCR microplate sealing strips (Eppendorf, cat. no. 0030124847)Centrifuge tubes, 50 ml, sterile (Chemglass, cat. no. CLS-4302)Tough-Spots, 3/8 inch (Diversified Biotech, cat. no. T-SPOTS)Microcentrifuge tubes, low DNA binding, 2.0 ml (Eppendorf, cat. no. 022431048)

Softwarebowtie2 (available at http://www.bowtie-bio.sourceforge.net/bowtie2/ index.shtml)kosudoku (available at https://www.github.com/buzbarstow/kosudoku/)python (available on Unix operating systems through package managers such as apt-get and fink or at https://www.python.org)ipython (available on Unix operating systems through package managers such as apt-get and fink or at http://www.ipython.org)matplotlib (available on Unix operating systems through package managers such as apt-get and fink or at http://www.matplotlib.org)numpy (available on Unix operating systems through package managers such as apt-get and fink or at http://numpy.org)scipy (available on Unix operating systems through package managers such as apt-get and fink or at https://www.scipy.org)blast (available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ LATEST/)biopython (available on Unix operating systems through package managers such as apt-get and fink or at http://www.biopython.org)

•

•

•

•

•

•

•

•••••••

••

•

•

•••••

•••

•

••

•

•

•

•

•

•

proceDurecalculation of the progenitor collection construction timing schedule ● tIMInG ≈1 d1| Obtain a GenBank-format whole-genome sequence for the collection parental organism and a listing of gene essentiality data. A sample gene essentiality data file can be found in supplementary table 2.

http://www.bowtie-bio.sourceforge.net/bowtie2/index.shtml

http://www.bowtie-bio.sourceforge.net/bowtie2/index.shtml


https://www.python.org

http://www.ipython.org

http://www.matplotlib.org

http://numpy.org

https://www.scipy.org

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

http://www.biopython.org

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


2| Run the kosudoku-egeneindex program using the following command to generate an index of all possible transposon insertion locations in the parental organism genome and the associated locus. An example input file is shown in supplementary Data 1.

> kosudoku-egeneindex ../input/kosudoku-egeneindex/kosudoku-egeneindex.inp

crItIcal step Sample input files for all programs are found in supplementary Data 1–13 and can be found in the sample input section of the kosudoku package.

3| Prepare an input file for the kosudoku-collectionmc program, which lists the location of the output of kosudoku-egeneindex from Step 2. An example input file is shown in supplementary Data 3.

4| Run the kosudoku-collectionmc program using the following command to generate a rarefaction curve that estimates genome coverage as a function of progenitor collection size.

> kosudoku-collectionmc ../input/kosudoku-collectionmc/kosudoku-collectionmc.inp

5| Should the number of mutants required for complete coverage of the genome exceed the number that the barcoding primers attached in supplementary tables 1 and 3 can accommodate, prepare an input file for the barcode primer generation software that contains the number of 96-well plates that you intend to use and the common sequences of the barcoding primers. An example input file for the barcoding primer generation software is included as supplementary Data 4.

6| Run the barcoding primer generation as follows:

> kosudoku-barcodes ../input/kosudoku-barcodes/kosudoku-barcodes.inp

7| Run the kosudoku-batch program using the following command to calculate the maximum size of a batch of transposon mutants that can be pooled without compromising sample viability (Nbatch), and the maximum break time allowed between finishing picking the progenitor collection (Step 19) and beginning pooling the progenitor collection (Step 23). A sample input file for kosudoku-batch is included as supplementary Data 5.

> kosudoku-batch ../input/kosudoku-batch/kosudoku-batch.inp

Building of a transposon insertion library ● tIMInG ~4 d crItIcal Building a high-quality transposon insertion library is one of the most important steps in Knockout Sudoku. Perform trial experiments to carefully adapt this section of the protocol to your specific situation, as detailed in the Experimental Design section.8| Day 1: Streak out E. coli WM3064 carrying pMiniHimarFRT for single colonies on LB agar with 50 µg/ml kanamycin (kan) and 90 µM diaminopimelic acid (DAP). Grow the colonies to saturating density overnight at 37 °C. crItIcal step Recovery of DAP auxotrophs such as E. coli WM3064 from frozen stocks can sometimes be challenging, so be prepared to streak out more material from the frozen stock than from conventional E. coli frozen stocks.

9| Day 1: Streak out the parental strain for single colonies on LB agar (or a preferred alternative medium). Grow the colonies to saturating density (≈21 h for S. oneidensis) at a preferred temperature (30 °C for S. oneidensis).

10| Day 2: Pick a single colony of E. coli WM3064 and inoculate a 20-ml culture of LB medium with 50 µg/ml kan and 90 µM DAP. Grow the culture to saturating density overnight at 37 °C.

11| Day 2: Pick a single colony of the parental strain, inoculate a 3-ml culture of LB (or a preferred alternative medium), and grow the culture to saturating density (≈ 21 h for S. oneidensis).

12| Day 3: Resuspend the cells in 5 ml of freshly prepared LB by centrifuging the E. coli culture at 500g for 5 min at room temperature (22.7 °C) to pellet the cells; carefully remove the supernatant with a sterile serological pipette; add 5 ml of freshly prepared LB, and resuspend the cells by gently vortexing (this step is commonly referred to as resuspending the cells in 5 ml of freshly prepared LB). Repeat this procedure to wash the cells in 5 ml of freshly prepared LB. Finally, resuspend the E. coli cell pellet in 800 µl of LB.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



13| Day 3: Resuspend the parental strain culture in 5 ml of LB, and then resuspend the culture again in 600 µl of LB.

14| Day 3: Gently mix 400 µl of resuspended parental strain culture with 400 µl of resuspended E. coli. Gently spot the 800 µl of mixed cell suspension onto an LB agar (or an alternative medium) Petri dish and incubate the dish overnight (≈21 h for S. oneidensis and E. coli) at the preferred growth temperature of the parental strain.

15| Day 3: As controls, also gently spot the remaining 200 µl of resuspended parental strain (for adequate killing of nonmutagenized cells) and 400 µl of E. coli (for adequate DAP starvation of the transposase delivery plasmid donor) onto separate LB Petri dishes. No colonies should appear on either of these dishes.

16| Day 4: Wet the mating mixture and controls with 5 ml of freshly prepared LB each and gently scrape them from their respective plates (from Steps 14 and 15) with sterile cell-scraping tools. Carefully collect the resuspended cells into sterile disposable test tubes with sterile serological pipettes. If optimizing the progenitor collection construction protocol, proceed to Step 17 and calculate the transposon insertion mutant density. Once an acceptable mutant density has been achieved and you are satisfied with the mating procedure parameters, we recommend that you discard the mating mix from the successful trial and restart the mating procedure from the beginning (Step 8), using exactly the same parameters to construct a fresh progenitor library, skipping Step 17 and proceeding immediately to Step 18. crItIcal step Although the mating mixture can be stored for several days at 4 °C, we have found that collections that were not plated rapidly (i.e., immediately) tended to be highly repetitive (containing many more copies of identical transposon insertion mutants than random chance would suggest). If constructing the progenitor collection, we do not recommend pausing at this point, or creating delay by measuring the transposon mutant density (as detailed in Step 17). We have found that the transposon insertion mutant density varies very little from trial to trial, so users can instead move immediately to plate out the entire progenitor collection (Step 18) using the transposon insertion mutant density calculated from a previous run through this protocol.

17| (Optional) Days 4 and 5: If optimizing the protocol, make a dilution series of the resuspended mating mixture and controls from Step 16 at dilution factors ranging from 10−1 to 10−8 in an appropriate minimal medium without a carbon source (for instance, Shewanella basal medium) or PBS. Immediately plate 100 µl of each cell suspension dilution onto an LB-kan agar Petri dish with a sterile spreading tool. Incubate these plates overnight (≈21 h for S. oneidensis) at 30 °C. On the following day, calculate the density of transposon insertion mutants in the mating mixture. To do this, count colonies on the dilution series Petri dishes and prepare a log–log plot of the dilution factor versus the colony count. Use the linear part of the curve (usually between colony counts of ≈1 and ≈1,000) to extrapolate the colony count at no dilution (0 on the logarithmic axis). Multiply the no-dilution colony count by 10 to calculate the number of CFU per milliliter in the mating mixture. Check that there is no cell growth on either of the control plates.

18| Day 4: To construct the progenitor collection, dilute the library (the resuspended mating mixture from Step 16) to 2,000 CFU per ml in LB using the transposon mutant density calculated in Step 17. Dilute the control mixtures by the same amount (although there should be no CFU in these mixtures). If plating the progenitor collection onto 16-mm Petri dishes (for instance, if picking the progenitor collection by hand with toothpicks or with a colony-picking robot that uses only 16-mm Petri dishes such as the Norgren Systems CP7200), spread 100 µl (200 colonies) with plating beads. If plating onto colony-picker trays (for instance, if you are using a colony-picking robot that accepts large-format colony-picker trays such as a Molecular Devices QPix series device), plate 1 ml of the progenitor collection (2,000 colonies) with a cell-spreading tool. Plate 100 µl of each of the control mixtures onto 16-mm Petri dishes in both cases. Consider plating an excess of colonies (25–50% more than needed) when preparing a progenitor collection batch. Incubate these plates overnight (≈21 h for S. oneidensis) at the preferred growth temperature of the parental strain. Check that there is no cell growth on either of the control plates.

picking of the progenitor collection ● tIMInG ≈1 d per 20,000 colonies19| Robotically pick colonies into 96-well plates containing 100 µl of LB (or a preferred growth medium) with antibiotics. Seal each plate with a sterile AeraSeal membrane.! cautIon Exercise caution when operating the colony-picking robot, as picking pins can descend rapidly.! cautIon When sealing 96-well plates with AeraSeal membranes, it is often useful to trim them with a disposable razor blade or scalpel. If you choose to do this, exercise caution.

20| Grow mutant colonies for ∆Tsaturation (21 h for S. oneidensis) in a shaking incubator at 900 r.p.m. at a preferred growth temperature (30 °C for S. oneidensis).

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


Box 3 | Generation of pool amplicon libraries ● tIMInG ≈1 d 1. Pool amplicon library construction relies upon a two-step nested PCR reaction. Set up the first-step reactions according to the table below, which lists the contents of each reaction well. All components except for the isolated pool genomic DNA can be premixed as a master mix for convenience. Primer sequences are listed in supplementary table 1.

Component Stock concentration Volume to add per pool reaction Working concentration

Sterile deionized water – 7.2 µl –5× OneTaq standard reaction buffer 5× 4 µl 1×HimarSeq1.2 2 µM 2 µl 200 nMCEKG2C-IllR2 2 µM 2 µl 200 nMCEKG2D-IllR2 2 µM 2 µl 200 nMdNTPs 10 mM 0.4 µl 200 µMOneTaq DNA Polymerase 5 units per µl 0.4 µl 2 units

Total volume 18 µl

2. Dispense the master mix into a skirted 96-well PCR plate.3. Add 2 µl per well of purified pooled genomic DNA at a concentration of ≈100 ng/µl as a template. crItIcal step Do not attempt to normalize the amount of genomic DNA template loaded into the amplicon generation reaction for the amount of each species estimated to be in each pool. Add equal volumes of template to each reaction, unless there is a gross discrepancy in concentration between pools of extracted genomic DNA.4. Seal the plate with PCR strip caps.5. Transfer the plate to the PCR cycler and run the following program:

Step Temperature (°C) Time (min) Cycles Total minimum time (min)

Initial melting 95 5 1 5Melting 95 0.5 6 24Annealing 42 (lowering by 1 °C per cycle) 0.5 Extension 68 3 Melting 95 0.5 24 96Annealing 45 0.5 Extension 68 3 Final extension 68 5 1 5Storage 4 Indefinitely

Total minimum time for the procedure (min) 130

pause poInt The first-step PCR products can be stored at 4 °C overnight.6. Set up the second-step reactions according to the table below, which lists the contents of each reaction well. Primer sequences are listed in supplementary table 1.

Component Stock concentration Volume to add Working concentration

Sterile deionized water – 12.5 µl –5× OneTaq standard reaction buffer 5× 10 µl 1×HimarSeq2.4-4N 500 nM 5 µl 50 nMHimarSeq2.4-5N 500 nM 5 µl 50 nMHimarSeq2.4-6N 500 nM 5 µl 50 nMHimarSeq2.4-7N 500 nM 5 µl 50 nMdNTPs 10 mM 1 µl 200 µMOneTaq DNA Polymerase 5 units per µl 1 µl 5 units

Total volume 44.5 µl

7. Dispense the master mix into the PCR plate.8. Add 200 nM (5 µl) of the appropriate pool barcoding primer (supplementary table 1) to each well. crItIcal step We recommend ordering the barcoding primers on a 96-well plate for convenience and transferring them, using a multichannel or 96-channel pipettor.9. Add 0.5 µl of corresponding reaction 1 product as a template from step 5 of this box. crItIcal step We recommend transferring the reaction products from step 5 of this box using a multichannel or a 96-channel pipettor.10. Seal the plate with PCR strip caps.

(continued)

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



21| If the number of picking days calculated in Step 7 is >1, return to Step 19 the next day until the colony batch is complete. pause poInt Although a break here is certainly possible, do not risk the viability of the progenitor collection by exceeding the break time determined in Step 7.

pooling and cryopreservation of transposon insertion mutant collection ● tIMInG ≈1 d per 400 96-well plates or ≈1 min per 96-well plate22| For each plate in the progenitor collection, ensure that it is laid out in the order specified in the plate grid (table 1). We recommend stacking the progenitor collection plates in descending order by plate row (for example, plate 1 should be at the top of the PR1 stack and plate 21 should be at the bottom; plate 22 should be on the top of the PR2 stack and plate 42 should be at the bottom).

23| Pool the progenitor collection using the pooling protocol described in Box 2.

24| If pooling of the progenitor is not completed by the end of the day, return to Step 23 the next day. pause poInt Pooled, unlysed cultures appear to be stable for at least several days when stored at 4 °C without any impact on the quality of the extracted genomic DNA or mutation location. However, we strongly recommend that you be mindful of the viability time of the organism in liquid medium and the time since the first 96-well plates in the batch reached saturation after picking.

extraction of genomic Dna from pooled mutants ● tIMInG ≈6 h25| Extract genomic DNA from the pools of mutants prepared in Step 23 using the genomic DNA extraction protocol in Box 4. pause poInt The extracted genomic DNA can be stored for at least 1 year at −80 °C.

26| If more batches of colonies need to be picked to achieve saturating coverage of the genome, return to Step 8. Otherwise, continue to Step 27.

Generation of pool amplicon libraries ● tIMInG ≈1 d27| Use the PCR reactions detailed in Box 3 to generate amplicon libraries for next-generation sequencing and examine them by agarose gel electrophoresis.

Box 3 | Generation of pool amplicon libraries ● tIMInG ≈1 d (continued) 11. Transfer the plate to the PCR cycler and run the following program:

Step Temperature (°C) Time (min) Cycles Total minimum time (min)

Melting 95 0.5 30 120Annealing 56 0.5 Extension 68 3 Final extension 68 5 1 5Storage 4 Indefinitely

Total minimum time for procedure (min) 125

pause poInt The second-step PCR products can be stored for several months at −20 °C.12. To validate the pool amplicon library generation reaction, cast a large (≈200 ml) 1% (wt/vol) agarose gel with 1× TAE and SYBR-SAFE DNA-binding stain. Use a high-well-density comb to allow the transfer of samples directly from a PCR plate with a multichannel pipettor. Allow the gel to solidify for at least 1 h.13. Combine 5 µl of each pool amplicon library with 1 µl of 6× loading dye in a clean PCR plate. Mix by vortexing.14. Load the mixed pool amplicon libraries onto the gel with a multichannel pipettor and 1 µl of kb DNA ladder to at least one well per row on the gel.15. Run the gel at 150 V for 30 min.16. Image the gel with a gel documentation system with appropriate filters for SYBR-SAFE gels.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


28| Compare the molecular weight distributions of the pool amplicon libraries in the images of the gels generated in Step 27 with the example images in supplementary Figures 3 and 4. If the distributions have more distinct bands such as those in supplementary Figure 4 rather than the continuous bands in supplementary Figure 3, see the Troubleshooting section.? trouBlesHootInG pause poInt The pool amplicon libraries can be stored for several months at −20 °C.

combining of pool amplicon libraries and purification by molecular weight ● tIMInG ≈6 h29| Combine the pool amplicon libraries generated in Step 28 using the protocol in Box 5. crItIcal step Do not attempt to normalize the amount of PCR product added to the sequencing library for the amount of each species estimated to be in each pool. Instead, add equal volumes from each library. pause poInt The combined and purified pool amplicon library can be stored for several months at −20 °C.

sequencing of pool amplicon libraries by Illumina sequencing ● tIMInG ≈ 16 h30| Submit the combined pool amplicon libraries for quality control by your local sequencing service. The suitability of the molecular weight distribution of the pool amplicon libraries for next-generation sequencing will be determined by the sequencing service. If a problem arises, see the Troubleshooting section.? trouBlesHootInG

31| Sequence combined libraries on two lanes of an Illumina HiSeq in 67-bp single-end read mode, which should take ~16 h. pause poInt As long as the progenitor collection is stable at −80 °C, there can be an indefinite pause for data analysis.

sequencing data analysis: prediction of the contents of each well in the progenitor collection ● tIMInG ≈1 d 32| Build a bowtie2 index for your genome of interest using the following command:

> bowtie2-build sample_genome.fa sample_genome

33| Generate a barcode file that contains a list of your barcode sequences and the corresponding pool that they are used with. An example is shown in supplementary table 3.

34| Prepare an input file for the sequencing data set analysis software containing the location of the sequencing data acquired in Step 31, the bowtie2 index prepared in Step 32, the barcode file prepared in Step 33, and the desired locations of the output files. An example file is shown in supplementary Data 6.

35| Run the sequencing data set analysis software to build a pool presence table that, for each transposon insertion location observed in the sequencing data set collected in Step 31, enumerates the number of reads with a particular barcode sequence (see Fig. 1c for an example). This table will facilitate the location of mutants in the progenitor collection. Use the input file from Step 34 and the following command:

> kosudoku-seqanalyze ../input/kosudoku-seqanalyze/kosudoku-seqanalyze.inp

Box 4 | Extraction of genomic DNA from pooled mutants ● tIMInG ≈6 h 1. Obtain Falcon tubes containing each pool and vortex them briefly to resuspend the cells.2. Using a pipettor, extract 2 ml of culture from each pool into a 2-ml low-DNA-binding microcentrifuge tube.3. Spin down the cells at 21,000g for 30 s at room temperature (22.7 °C) and discard the supernatant.4. Add 500 µl of genomic lysis buffer to each tube and immediately vortex the tubes. Allow the tubes to stand for 5–10 min. pause poInt Cell extracts can be stored at 4 °C for several hours.5. Transfer the lysed culture to a spin column in a collection tube.6. Centrifuge the tube at 10,000g for 1 min at room temperature and discard the flow-through.7. Add 200 µl of prewash buffer.8. Centrifuge the tube at 10,000g for 1 min at room temperature and discard the flow-through.9. Add 500 µl of wash buffer.10. Centrifuge the tube at 10,000g for 1 min at room temperature and discard the flow-through.11. Place the spin column in a fresh low-DNA-binding microcentrifuge tube.12. Add 50 µl of elution buffer and incubate the tube for 2–5 min.13. Centrifuge the tube at 16,000g (or the highest acceleration that your centrifuge can achieve) for 30 s at room temperature to elute.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



36| Examine the output of the sequence analysis program to find the percentage of reads that contain valid pool barcode and transposon sequences and that can be aligned to the reference genome. If this percentage is below ≈90%, you should see the Troubleshooting section.? trouBlesHootInG

37| Prepare an input file for the pool presence table analysis code containing the location of the pool presence table generated in Step 35 and the plate grid file (table 1) prepared in Step 22. An example input file is shown in supplementary Data 7.

38| Run the pool presence table analysis code, which will calculate the effect of the read count threshold (the minimum number of reads needed for a coordinate in the pool presence table to be used in mutant location, typically set to 5 reads) on the taxonomy of the pool presence table solution (for example, how many mutants can be directly located at single locations in the progenitor collection, how many mutants can lack a full set of location coordinates, and how many mutants will require probabilistic location solution as a function of the read count threshold) using the input file from Step 37 and the following command:

> kosudoku-poolanalyze ../input/kosudoku-poolanalyze/kosudoku-poolanalyze.inp

39| Use the output of the pool presence table analysis code to decide upon a read count threshold. We typically use a read count threshold of 5 reads, but this should be decided only after careful examination of the output of kosudoku-poolanalyze. We suggest a read count threshold that maximizes the number of transposon locations that unambiguously map to locations within the progenitor collection. crItIcal step Should the number of unique mutant sequences identified at this stage be considerably lower than the number of colonies picked to generate to the progenitor collection, see the Troubleshooting section.? trouBlesHootInG

40| Use the read count threshold determined in Step 39, the plate grid file (table 1) prepared in Step 22, and the pool presence table prepared in Step 35 to prepare an input file for the pool presence table read count ratio-fitting program. An example input file is shown in supplementary Data 8.

41| Run the pool presence table read count ratio-fitting program, which will determine the Bayesian inference parameters needed to locate mutants that appear at multiple locations within the progenitor collection using the input file from Step 40 and the following command:

> kosudoku-poolfit ../input/kosudoku-poolfit/kosudoku-poolfit.inp

Box 5 | Combining of pool amplicon libraries and purification by molecular weight ● tIMInG ≈6 h 1. Cast a small (≈ 8-cm-long or 50-ml) 2% (wt/vol) agarose gel with 1× TAE and SYBR-SAFE DNA-binding stain. Allow to solidify for at least 1 h.2. Combine all amplicon samples into a single microcentrifuge tube. crItIcal step Do not attempt to normalize the amount of PCR product added to the sequencing library for the amount of each species estimated to be in each pool.3. Measure the concentration of the sample by UV–vis spectrophotometry. The concentration should be ~200 ng/µl.4. Combine 10–15 µl of combined PCR product (≈3 µg) with an appropriate volume of 6× loading dye (2 µl of dye for 10 µl of product). Mix by vortexing.5. Load the PCR sample onto the gel and add 1 µl of kb DNA ladder into the neighboring well.6. Run the gel at 150 V for 30 min.7. Image the gel with a gel documentation system with appropriate filters for SYBR-SAFE gels. An example gel is shown in supplementary Figure 5.8. With a scalpel, carefully excise the section of the gel containing amplicons with lengths from 500 to 1,000 bp.9. Use a gel extraction kit to extract DNA from the gel fragment. Follow manufacturer’s instructions and elute into a DNA low-binding microcentrifuge tube.10. Measure the concentration of the sample by UV–vis spectrophotometry.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


42| Use the location of the Bayesian inference parameters file generated in Step 41, the read count threshold deter-mined in Step 39, the plate grid file (table 1) prepared in Step 22, and the pool presence table prepared in Step 35 to prepare an input file for the pool presence table solution program that will locate mutants in the progenitor collec-tion. An example input file is shown in supplementary Data 9 and example Bayesian inference parameters are shown in supplementary table 4.

43| Run the pool presence table solution program to gener-ate a file listing the transposon mutants in each well of each plate of the progenitor collection (the progenitor collection catalog) as follows:

> kosudoku-poolsolve ../input/kosudoku-poolsolve/kosudoku-poolsolve.inp

Verification of the predictions of progenitor collection contents with sanger sequencing ● tIMInG ≈2 d44| Pick a random set of mutants from the progenitor collection (the entire set of transposon insertion mutants that were generated, picked, and pooled in Steps 19–24, which will be condensed to form a whole-genome knockout collection) and re-array them into a single 96-well plate. Grow the mutants to saturation overnight (≈21 h for S. oneidensis). crItIcal step If you are new to this protocol or do not feel confident in the Knockout Sudoku predictions, we suggest that you verify the identity of a large number of mutants. For example, when developing Knockout Sudoku, we verified the identity of 174 unique mutants3. In this case, we recommend that you verify the identities of as many mutants as your sequencing service will allow you to place on a 96-well plate (GeneWiz allows up to 94). If, on the other hand, you have performed this protocol before and are confident in its performance, you can choose to verify fewer. However, we do recommend that you verify the identity of at least ten mutants.

45| Prepare and run the nested two-step PCR as described in steps 1–11 of Box 3, with the following adaptations—in step 1 of Box 3, use the HimarSeq2, CEKG2C, and CEKG2D primers (supplementary table 1) and adjust the volume to 19 µl per well with deionized sterile water. In step 3 of Box 3, add 1 µl per well of fresh saturated bacterial culture from Step 44 as template. In step 6 of Box 3, use a total reaction volume of 20 µl per well containing 4 µl of 5× OneTaq standard reaction buffer and two units of OneTaq DNA Polymerase (0.4 µl). Add 200 nM of each primer (HimarFRTSeq2 and CEKG4) and 200 µM dNTPs, and adjust the volume with deionized sterile water so that volume per well is 19.5 µl.

46| Send the amplicons for standard Sanger sequencing with PCR product clean up.

47| After receiving Sanger sequencing data for verification of Knockout Sudoku predictions, prepare a verification data description file describing the location of the Sanger sequencing data files and the corresponding originating wells in the progenitor collection (as shown in supplementary table 5). In addition, prepare an input file for the verification program containing the location of the verification data description file, the location of a file containing the genome sequence of the parental organism for the collection, and the location of the progenitor collection solution file prepared in Step 43 (as shown in supplementary Data 10). The verification program will align each Sanger read to the parental organism genome and calculate the transposon location in the genome and then compare this to the predicted location in the progenitor collection solution file. A workflow for the verification program is shown in Figure 7.

48| Run the prediction verification program as follows:

> kosudoku-verify ../input/kosudoku-verify/kosudoku-verify.inp

49| If the predictions match the results of Sanger sequencing, continue to Step 50. If not, see the Troubleshooting section.? trouBlesHootInG

Sanger sequencing reads

read*.seq

Progenitor collection catalog

progenitor_catalog.xml

Summary table describing sanger reads

sanger_summary_table.csv

Reference genomeGenBank fileshewy.gb

Knockout Sudoku prediction verification program> kosudoku-verify

Table of prediction verification results

verification_summary.csv

Figure 7 | Software workflow for Sanger verification of Knockout Sudoku predictions (Steps 47–49). We developed the kosudoku-verify program to assist in verification of Knockout Sudoku predictions through conventional Sanger sequencing of individual transposon insertion mutants. Programs are marked with a “>” symbol.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



selection of mutants from the progenitor collection for preparation of the condensed collection and estimation of timing for condensed collection construction ● tIMInG ≈3 h50| Prepare an input file for the progenitor collection condensation program that contains the progenitor collection solution file prepared in Step 43, the location of a file containing an annotated genome sequence of the parental organism for the collection, and the location of the output file for the condensation instructions. The progenitor collection condensation program will select a complete, nonredundant set of gene disruption mutants from the progenitor collection. For example, we found disruption mutants for 3,667 genes in the S. oneidensis progenitor collection, and picked a single disruption mutant for each one. Example input and output files for the progenitor collection condensation program are shown in supplementary Data 11, supplementary table 6, and the kosudoku package. A schematic showing how the condensed collection is composed by re-array of single occupancy wells in the progenitor collection (first section) and colony purification of multiple-occupancy wells in the progenitor collection (second section) is shown in Figure 5.

51| Run the progenitor collection condensation program as follows:

> kosudoku-condense ../input/kosudoku-condense/kosudoku-condense.inp

52| If you will be manually re-arraying selected mutants (identified in Step 51) from the progenitor collection in Step 57 (option A), use a spreadsheet program to sort the condensed collection re-array instructions generated in Step 51 so that all wells on the same progenitor collection plate can be processed at the same time. If you are using a colony-picking robot for re-array of mutants from the progenitor collection that do not require colony purification (Step 57, option B), use a spreadsheet program or text editor to convert the condensed collection re-array instructions (output from Step 51) into a format that is understood by the colony-picking robot controller software. Include only wells in this file that need re-array. Use a spreadsheet program and the condensed collection re-array instructions (output from Step 51) to prepare a second human-readable file that only includes instructions for wells that require colony purification.

53| Estimate the time needed to prepare the condensed collection with the progenitor collection condensation timing program. Prepare an input file for the progenitor collection condensation timing program listing the number of humans, robots, and multichannel pipettors available for condensation; the hours available per day; the average time needed to streak out a sample from a 96-well plate onto a Petri dish; the average times needed to seal and unseal a 96-well plate with aluminum seals and air-porous AeraSeal membranes; the average time needed to add cryoprotectant to a 96-well plate; and the time needed to pick a colony from a Petri dish and transfer it to a 96-well plate. An example input file is shown in supplementary Data 2.

54| Run the progenitor collection condensation timing program as follows:

> kosudoku-condensetiming ../input/kosudoku-condensetiming/kosudoku-condensetiming.inp

55| Prepare an appropriate number of Petri dishes for colony purification with preferred growth medium, agar, and an appropriate antibiotic concentration (Experimental design). Make sure that the dishes have plenty of time (typically at least 16 h) to solidify.

56| Label and fill condensed collection plates with 100 µl per well of the preferred growth medium and antibiotics using sterile, 200-µl Liquidator filter tips. To make pooling of the condensed collection easier, add PR and PC assignments to the condensed collection plates. The first section of the condensed collection will be used as destinations for progenitor collection wells that need only re-array (in Steps 57A(iv) and 57B(ii)). The second section will be used as destinations for source wells that will be colony purified (Step 59).

re-array and colony purification of progenitor collection for construction of condensed collection57| To manually re-array the progenitor collection, follow option A; alternatively, if you are using a colony-picking robot for re-array of the progenitor collection, follow option B. crItIcal step Before taking samples from the progenitor collection, make sure to determine how many freeze–thaw cycles the organism of interest can tolerate (Experimental design). If the survivability of the collection is low, we recommend manual re-array of the collection, even if a colony-picking robot is available for re-array.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


(a) Manual re-array and colony purification of progenitor collection for construction of condensed collection ● tIMInG variable (i) Remove approximately ten progenitor collection plates from the freezer. If the plates are to be kept frozen during

re-arraying, transfer them to a foam cooler filled with dry ice and cover them with an inverted foam cooler. Alternatively, allow the plates to thaw on the bench.

(ii) Remove a single progenitor collection plate from the batch. If the plates are to be kept frozen, place them on a bed of dry ice. (iii) Wipe the foil seal with 70% (vol/vol) ethanol using a KimWipe. (iv) Select a well to be struck out or re-arrayed. Pierce the foil seal with a sharp sterile toothpick. If it is a single-

occupancy well and is to be re-arrayed, transfer a small amount of cells to the new condensed collection plate (prepared in Step 56) determined by the progenitor collection condensation program (output from Step 51). If it is a multiple-occupancy well and is to be colony purified, streak out the cells onto a Petri dish (prepared in Step 55) clearly labeled with the source and destination wells for the sample.

(v) Cover the hole created in the seal with a Tough-Spot. (vi) Return to Step 57A(iv) and continue until all designated wells on the source plate have been sampled. (vii) Return the plate to the freezer or a dry-ice-filled foam cooler. (viii) Once the condensed collection plate has been filled, cover it with an AeraSeal and grow it to saturating density

(≈21 h for S. oneidensis) in a shaking incubator at 900 r.p.m. at an appropriate temperature (30 °C for S. oneidensis). Continue retrieving samples from the progenitor collection until the end of the day, and the next day proceed to Step 58 for cryopreservation.

(ix) Take the Petri dishes to a static incubator and grow them overnight (≈21 h for S. oneidensis) at an appropriate temperature (30° C for S. oneidensis). The next day proceed to Step 59 for colony picking from streaked plates.

(B) robotic re-array and colony purification of progenitor collection for construction of condensed collection ● tIMInG variable (i) Remove approximately ten progenitor collection plates from the freezer. Dust ice from the progenitor collection plates

and remove the aluminum seals before the well contents thaw. Set the plates aside under a sterile flame to thaw. crItIcal step Ensure that the progenitor collection plates are fully thawed to avoid risking damage to the robot’s picking pins.

(ii) Use the robot to re-array the selected single-occupancy wells on the progenitor collection plate onto a condensed collection plate (prepared in Step 56) using the condensation instructions generated in Step 51.

(iii) Transfer the progenitor collection plate to a person for streak-out of multiple-occupancy wells for colony purification. Streak out the cells onto a Petri dish (prepared in Step 55) clearly labeled with the source and destination wells for the sample.

(iv) Take Petri dishes to a static incubator and grow them overnight (≈21 h for S. oneidensis) at an appropriate temperature (30° C for S. oneidensis). The next day, proceed to Step 59 for colony picking from streaked plates.

(v) Once all samples from the progenitor collection plate have been either re-arrayed or struck out, reseal this plate with fresh aluminum foil and return it to a dry-ice-filled foam cooler.

(vi) Repeat Step 57B(ii–v) for each of the progenitor collection plates that were thawed in Step 57B(i) and then return to Step 57B(i) to process further batches of progenitor collection plates.

(vii) Once each condensed collection plate has been filled, cover it with an AeraSeal and grow each well to saturating den-sity in a shaking incubator at 900 r.p.m. (≈21 h for S. oneidensis) at an appropriate temperature (30° C for S. oneidensis). Continue retrieving samples from the progenitor collection until the end of the day, and the next day proceed to Step 58 for cryopreservation.

cryopreservation of re-arrayed condensed collection plates ● tIMInG variable58| The next day, when the cultures in the condensed collection plates have reached saturation, sample the volume of culture left in a few of the wells of the condensed collection plates (two to five wells in total, but this is left to your judgment; there should be ~90 µl of culture medium left from an original volume of 100 µl), and add an equal volume of 20% (vol/vol) glycerol using sterile, 200-µl Liquidator filter tips. Cover the samples with an aluminum seal for cryopreservation and store them at −80 °C. The samples should be stable for several years at −80 °C.

picking of colonies from colony purification petri dishes ● tIMInG variable59| When colonies have appeared on the Petri dishes from Step 57A(ix) or 57B(iv), pick the number of colonies specified in the progenitor collection condensation instructions (generated in Step 51) into the corresponding specified locations on a new set of fresh condensed collection plates (filled in Step 56). crItIcal step The orthogonal data analysis algorithm used by kosudoku-isitinthere (Step 75) tests for the presence of a particular sequence in a particular well rather than locating a sequence. Make sure to carefully follow the re-array and colony purification instructions and note any deviations from them.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



60| Cover each new condensed collection plate with an AeraSeal membrane and grow the plate to saturating density in a shaking incubator at 900 r.p.m. (≈21 h for S. oneidensis) at an appropriate temperature.

61| The next day, when the cultures in the condensed collection plates have reached saturation, repeat Step 58 to cryopreserve the re-arrayed condensed collection plates.

62| If more progenitor collection plates must be condensed, immediately return to Step 57; otherwise, proceed to Step 63. pause poInt As long as the condensed collection is stable at −80 °C, there can be an indefinite pause before sequence validation.

Validation of the identities of mutants in the condensed collection by resequencing ● tIMInG variable63| Plan out an approximately square plate grid for resequencing of the condensed collection. A suggested grid will have been included in the output of kosudoku-condense from Step 51.

64| Retrieve the condensed collection from the freezer.

65| Pool the condensed collection using the pooling procedure in Box 2.

66| Refreeze the condensed collection. We recommend this be done promptly before proceeding to genomic DNA extraction.

67| Extract genomic DNA using the procedure in Box 4. pause poInt. The extracted genomic DNA can be stored for at least 1 year at −80 °C.

68| Generate pool amplicon libraries using the protocol in Box 3.

69| Inspect the pool amplicon libraries produced from the condensed collection by gel electrophoresis. Compare the image of the gel with supplementary Figures 3 and 4. Although the molecular weight distribution of each pool should look considerably less continuous than that shown in supplementary Figure 3, it should not show considerably more band structure than that shown in supplementary Figure 4. If the molecular weight distribution is not as expected, see Troubleshooting section.? trouBlesHootInG pause poInt The pool amplicon libraries can be stored for several months at −20 °C.

70| Collect pool amplicon libraries into a single vial and purify by molecular weight using the protocol in Box 5. An image of the combined amplicon libraries run on an agarose gel is shown in supplementary Figure 5. pause poInt The combined and purified pool amplicon library can be stored for several months at −20 °C.

71| Submit the purified amplicon libraries for quality control by your local sequencing service. Should the libraries not be acceptable to the sequencing service, see the Troubleshooting section.? trouBlesHootInG

72| Sequence the libraries on one lane of an Illumina HiSeq 2500 in single-end 67-bp read mode, which should take ~16 h.

73| Run sequencing data set analysis software to build a pool presence table.

> kosudoku-seqanalyze ../input/kosudoku-seqanalyze/kosudoku-seqanalyze.inp

74| Prepare an input file for the condensed collection validation code that contains the progenitor collection solution file prepared in Step 43, the location of a file containing an annotated genome sequence of the parental organism for the collection, the location of the condensed collection pool presence table generated in Step 73, the collection re-array instructions generated in Step 51 that define the contents of the condensed collection, and locations for output files containing the validated contents of the condensed collection. If deviations from the progenitor collection condensation instructions prepared in Step 51 were made during the construction of the condensed collection Steps 57–62, these instructions can be easily edited with a spreadsheet or text editor program by a human operator.

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


75| Run the condensed collection validation code to validate the identities of the mutants in the condensed collection and generate a file listing each of the transposon mutants in each well of each plate of the condensed collection whose identities can be validated (the condensed collection catalog) as follows:

> kosudoku-isitinthere ../input/kosudoku-isitinthere/kosudoku-isitinthere.inp

76| Check the error rate in the condensed collection reported by kosudoku-isitinthere. We have found that a 4–8% error rate is acceptable, but if it is higher, see the Troubleshooting section.? trouBlesHootInG

Generation of the quality-controlled collection ● tIMInG variable77| Prepare an input file for the condensed collection quality-control program containing the condensed collection catalog generated in Step 75, the progenitor collection catalog generated in Step 43, and an annotated genome sequence file for the parental organism. An example input file is in supplementary Data 12.

78| The condensed collection quality-control program will generate instructions for further condensation of the colony puri-fied second section of the condensed collection (generated in Steps 57–62) and the addition of second-choice mutants for any genes not yet represented in the condensed collection to the third section of the quality-controlled collection. A sche-matic of this process is shown in Figure 5. Run the condensed collection quality-control program as follows:

> kosudoku-qc ../input/kosudoku-qc/kosudoku-qc.inp

79| Follow the re-array instructions generated in Step 78 to generate the quality-controlled collection. The quality-controlled collection for S. oneidensis contains a single representative for 3,667 genes out of a total of 4,587 genes in the S. oneidensis genome.

? trouBlesHootInGTroubleshooting advice can be found in table 2.

taBle 2 | Troubleshooting table.

step problem possible reason solution

28 and 69

The molecular weight distributions of the progenitor or condensed col-lection pool amplicon libraries show discrete rather than continuous (supplementary Fig. 3) band struc-ture. Be aware that the condensed collection pool amplicon libraries are always likely to have more discrete band structure (supplementary Fig. 4) than the progenitor collec-tion pool amplicon libraries

This is probably due to a lack of diversity in the progenitor or condensed collection. This could be due to several factors, including death of samples, either before or after pooling, and lack of diversity in either the plated progenitor collection or unplated progenitor library possibly because of the out-competition of some mutants relative to others. We have not experienced this problem at the stage of pooling the condensed collec-tion, but we have seen it at the stage of pooling the progenitor collection

Carefully examine the viable life span of the progenitor collection in liquid medium. In addition, consult your notes to check the time between pooling and genomic DNA extraction (Steps 22 and 25). To address lack of diversity in the progenitor library, carefully examine the mating protocol (Steps 8–18), paying particular attention to the time between scraping the mating mix and plating (Steps 16 and 18). In all cases, remaking the progenitor collection is highly recommended

30 and 71

When performing quality control on the Illumina-compatible amplicon library built by Knockout Sudoku (Box 3), we have often seen a discrepancy between the molecular weight distribution meas-ured by agarose gel electrophoresis (supplementary Fig. 4) and that measured by a BioAnalyzer, Genomic Tape Station or similar device. Your sequencing provider may advise against sequencing because the center of molecular weight distribu-tion of the library is too high

Cause unknown Although your sequencing facility may recommend against sequencing, we have found that these libraries sequence very well in all cases. We recommend that you take responsibility for any potential failure, but press ahead with sequencing

(continued)

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



taBle 2 | Troubleshooting table (continued).


36 Upon receipt of sequencing data, there is a high percentage of reads that do not contain a significant amount of, if any, genomic sequence

The random primers used in the library generation reaction (Box 3) are probably binding inside the transposon, before the junction with the genome

Redesign the transposon-specific primers to bind at sites closer to the junction while maintaining the same melting temperature (≈58 °C). Simply rerun the pool amplicon construction reaction (Box 3), combine, and repurify by molecular weight (Box 5). We have found that these reactions are more tolerant to deficiencies in primer design when run in PCR cyclers with a silver heating block

39 Analysis of the progenitor collection sequencing data set does not find many unique transposon sites (at the very least, there should be approximately as many sites as mutants that were picked to form the progenitor collection)

This is probably due to a lack of diversity in the progenitor collection. This could be due to several factors, including death of samples, either before or after pooling, and lack of diversity in either the plated progenitor collection or the unplated progenitor library, possibly because of the out-competition of some mutants rela-tive to others

To address a lack of diversity in the progeni-tor library, carefully examine the mating protocol (Steps 8–18), paying particular attention to the time between scraping the mating mix and plating. Remaking the progenitor collection is recommended

49 The results of Sanger sequencing verification of identities of mutants from the progenitor collection do not match predictions

Mutants were not picked from the correct places

When picking mutants for testing, sterilize a toothpick and then pierce the aluminum seal covering the plate with it. After retrieving the sample, cover the hole with a Tough-Spot so that it is possible to later check that the correct mutant was retrieved. To date, we have not seen a failure of pre-diction by Knockout Sudoku that could not be explained by a failure to retrieve the correct mutant from the progenitor collec-tion. In all these cases, we were able to use the progenitor collection catalog to track the mutant identified by Sanger sequencing back to a well adjacent to the one that was supposed to be picked from and were able to identify this failure thanks to the Tough-Spot patch

69 There is considerably more promi-nent band structure present in the molecular weight distribution of these amplicon libraries compared with that shown in supplementary Figure 4

This is probably due to a lack of diversity in the progenitor or condensed collection. This could be due to several factors, including perishment of samples either before or after pooling, and lack of diversity in either the plated progeni-tor collection or unplated progenitor library, possibly due to the out-competi-tion of some mutants relative to others. We have not experienced this problem at the stage of pooling the condensed collection but have seen it at the stage of pooling the progenitor collection

Repool the condensed collection

(continued)

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


● tIMInGTiming of steps is estimated for cataloging a progenitor collection containing ≈40,000 mutants.Steps 1–7, calculation of the progenitor collection construction timing schedule: ≈1 dSteps 8–18, building of a transposon insertion library: ≈4 dSteps 19–21, picking of the progenitor collection: ≈1 d per 20,000 colonies (at a rate of 20,000 colonies per day)Steps 22–24, pooling and cryopreservation of transposon insertion mutant collection: ≈1 d per 400 96-well plates or ≈1 min per 96-well plateSteps 25 and 26, extraction of genomic DNA from pooled mutants: ≈6 hSteps 27 and 28, generation of pool amplicon libraries: ≈1 dStep 29, combining of pool amplicon libraries and purification by molecular weight: ≈6 hSteps 30 and 31, sequencing of pool amplicon libraries by Illumina sequencing: ≈16 hSteps 32–43, prediction of the contents of each well in the progenitor collection: ≈1 dSteps 44–49, verification of the predictions of progenitor collection contents with Sanger sequencing: ≈2 dSteps 50–56, selection of mutants from the progenitor collection for the condensed collection and estimation of timing for condensed collection construction: ≈3 hStep 57A, manual re-array and colony purification of progenitor collection for construction of condensed collection: variable—estimate the timing with kosudoku-condensetiming code in Step 54Step 57B, robotic re-array and colony purification of progenitor collection for construction of condensed collection: variable—estimate the timing with kosudoku-condensetiming code in Step 54Step 58, cryopreservation of re-arrayed condensed collection plates: variable—estimate the timing with kosudoku-condensetiming code in Step 54Steps 59–62, picking of colonies from colony purification Petri dishes: variable—estimate the timing with kosudoku-con-densetiming code in Step 54Steps 63–76, validation of the identities of mutants in the condensed collection by re-sequencing: variable—estimate the timing with kosudoku-condensetiming code in Step 54Steps 77–79, generation of the quality-controlled collection: variable—estimate the timing with kosudoku-condensetiming code in Step 54Box 2, pooling and cryopreservation of transposon insertion mutant collection: ≈1 d per 400 96-well plates or ≈1 min per 96-well plateBox 3, generation of pool amplicon libraries: ≈1 dBox 4, extraction of genomic DNA from pooled mutants: ≈6 hBox 5, combining of pool amplicon libraries and purification by molecular weight: ≈6 h

antIcIpateD resultsThe goal of this protocol is to produce a whole-genome knockout collection. In our experience with creating a whole-genome knockout collection for S. oneidensis by cataloging a progenitor collection of 40,000 transposon insertion mutants, we were able to find representative mutants for 3,667 genes, which we estimated to represent ≈97% of nonessential genes in the

taBle 2 | Troubleshooting table (continued).


76 The error rate in the condensed collection is high (>≈8%)

This is probably due to errors in manual re-arraying and colony picking

When picking mutants for testing, sterilize a toothpick and then pierce the aluminum seal covering the plate with it. After retrieving the sample, cover the hole with a Tough-Spot so that it is possible to later check that the correct mutant was retrieved. To date, we have not seen a failure of prediction by Knockout Sudoku that could not be explained by a failure to retrieve the correct mutant from the progenitor collection. In all these cases, we were able to use the progenitor col-lection catalog to track the mutant identified by Sanger sequencing back to a well adjacent to the one that was supposed to be picked from and were able to identify this failure; thanks to the Tough-Spot patch

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol



S. oneidensis genome3. When using this protocol, if the progenitor collection size is chosen appropriately, similar coverage should be achieved. To assist in reaching this goal, we have summarized anticipated intermediate results from the protocol to allow the experimenters to check their progress.step 28 and Box 3

The molecular weight distributions of the pool amplicon libraries should show continuous bands as in supplementary Figure 3; if they have more distinct bands, such as those in supplementary Figure 4, see the Troubleshooting section.step 51

Using the progenitor collection condensation program (kosudoku-condense) to select a complete, nonredundant set of gene disruption mutants from the progenitor collection, for the S. oneidensis progenitor collection, we found disruption mutants for 3,667 genes.step 69 and Box 3

The molecular weight distribution of each pool should look considerably less continuous than that shown in supplementary Figure 3, but it should not show considerably more band structure than that shown in supplementary Figure 4. If the molecular weight distribution is not as expected, see the Troubleshooting section.step 70 and Box 5

The anticipated molecular weight distribution for the combined pool amplicon libraries is shown in supplementary Figure 5.steps 75 and 76

When validating the identities of the mutants in the condensed collection, the number of correct predictions should be ~92% or greater. If not, see the Troubleshooting section.step 79

The quality-controlled collection for S. oneidensis contains a single representative for 3,667 genes out of a total of 4,587 genes in the S. oneidensis genome and for ≈99% of the genes disrupted in the progenitor collection.

Note: Any Supplementary Information and Source Data files are available in the online version of the paper.

acKnowleDGMents We thank N. Ando, K. Davis, A. Palmer, S. Meisburger, B. Chang, E. Adler, K. Malzbender, and C. Kyauk for experimental assistance; W. Metcalf for providing the E. coli strain WM3064; J. Gralnick for providing Shewanella oneidensis MR-1; L. Kovacs, J. Miller, L.R. Parsons, S. Silverman, W. Wang, and J. Wiggins for assistance with next-generation sequencing and media preparation; and N. Ando for critical reading of the manuscript. This work was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and Princeton University startup funds (B.B.) and Fred Fox Class of 1939 funds (I.A.A.).

autHor contrIButIons M.B. and B.B. conceived the Knockout Sudoku method; L.S., M.B., and B.B. conceived and designed the experiments. I.A.A., L.S., O.A., and B.B performed the experiments; I.A.A., O.A., and B.B. wrote the manuscript; I.A.A., O.A., and B.B. prepared the figures and/or tables; B.B. analyzed the data and developed the kosudoku software; all authors reviewed and revised the manuscript.

coMpetInG FInancIal Interests The authors declare competing financial interests: details are available in the online version of the paper.

Reprints and permissions information is available online at http://www.nature.com/reprints/index.html. Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1. van Opijnen, T. & Camilli, A. Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms. Nat. Rev. Microbiol. 11, 435–442 (2013).

2. Hutchison, C.A. et al. Design and synthesis of a minimal bacterial genome. Science 351, aad6253 (2016).

3. Baym, M., Shaket, L., Anzai, I.A., Adesina, O. & Barstow, B. Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku. Nat. Commun. 7, 13270 (2016).

4. Richie, D.L. et al. Identification and evaluation of novel acetolactate synthase inhibitors as antifungal agents. Antimicrob. Agents Chemother. 57, 2272–2280 (2013).

5. Brotcke, A. & Monack, D.M. Identification of fevR, a novel regulator of virulence gene expression in Francisella novicida. Infect. Immun. 76, 3473–3480 (2008).

6. Ahlund, M.K., Rydén, P., Sjöstedt, A. & Stöven, S. Directed screen of Francisella novicida virulence determinants using Drosophila melanogaster. Infect. Immun. 78, 3118–3128 (2010).

7. Fey, P.D. et al. A genetic resource for rapid and comprehensive phenotype screening of nonessential Staphylococcus aureus genes. MBio 4, e00537-12 (2013).

8. Zerikly, M., & Challis, G.L. Strategies for the discovery of new natural products by genome mining. Chembiochem 10, 625–633 (2009).

9. Seyedsayamdost, M.R. High-throughput platform for the discovery of elicitors of silent bacterial gene clusters. Proc. Natl. Acad. Sci. USA 111, 7266–7271 (2014).

10. Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).

11. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006).

12. Giaever, G. & Nislow, C. The yeast deletion collection: a decade of functional genomics. Genetics 197, 451–465 (2014).

13. van Opijnen, T., Bodi, K.L. & Camilli, A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat. Methods 6, 767–772 (2009).

14. Erlich, Y. et al. DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Res. 19, 1243–1253 (2009).

15. Goodman, A.L. et al. Identifying genetic determinants needed to establish a human gut symbiont in its habitat. Cell Host Microbe 6, 279–289 (2009).

16. Goodman, A.L., Wu, M. & Gordon, J.I. Identifying microbial fitness determinants by insertion sequencing using genome-wide transposon mutant libraries. Nat. Protoc. 6, 1969–1980 (2011).

17. Gallagher, L.A. et al. Sequence-defined transposon mutant library of Burkholderia thailandensis. MBio 4, e00604–e00613 (2013).

18. Porwollik, S. et al. Defined single-gene and multi-gene deletion mutant collections in Salmonella enterica sv Typhimurium. PLoS One 9, e99820 (2014).

19. Gallagher, L.A. et al. Resources for genetic and genomic analysis of emerging pathogen Acinetobacter baumannii. J. Bacteriol. 197, 2027–2035 (2015).

20. Vandewalle, K. et al. Characterization of genome-wide ordered sequence-tagged Mycobacterium mutant libraries by Cartesian pooling-coordinate sequencing. Nat. Commun. 6, 7106 (2015).

21. Lane, J.M. & Rubin, E.J. Scaling down: a PCR-based method to efficiently screen for desired knockouts in a high density Mycobacterium tuberculosis picked mutant library. Tuberculosis (Edinb) 86, 310–313 (2006).



http://www.nature.com/reprints/index.html

http://www.nature.com/reprints/index.html

©20

17 M

acm

illan

Pu

blis

her

s L

imit

ed, p

art

of

Sp

rin

ger

Nat

ure

. All

rig

hts

res

erve

d.


protocol


22. Chun, K.T., Edenberg, H.J., Kelley, M.R. & Goebl, M.G. Rapid amplification of uncharacterized transposon-tagged DNA sequences from genomic DNA. Yeast 13, 233–240 (1997).

23. Manoil, C. Tagging exported proteins using Escherichia coli alkaline phosphatase gene fusions. Methods Enzymol. 326, 35–47 (2000).

24. Jacobs, M.A. et al. Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc. Natl. Acad. Sci. USA 100, 14339–14344 (2003).

25. Spradling, A.C., Bellen, H.J. & Hoskins, R.A. Drosophila P elements preferentially transpose to replication origins. Proc. Natl. Acad. Sci. USA 108, 15948–15953 (2011).

26. Barquist, L., Boinett, C.J. & Cain, A.K. Approaches to querying bacterial genomes with transposon-insertion sequencing. RNA Biol. 10, 1161–1169 (2014).

27. Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 24, 2033–2040 (2014).

28. Deutschbauer, A. et al. Evidence-based annotation of gene function in Shewanella oneidensis MR-1 using genome-wide fitness profiling across 121 conditions. PLoS Genet. 7, e1002385 (2011).

29. Bouhenni, R., Gehrke, A. & Saffarini, D. Identification of genes involved in cytochrome c biogenesis in Shewanella oneidensis, using a modified mariner transposon. Appl. Environ. Microbiol. 71, 4935–4937 (2005).

30. Melnyk, R.A., Clark, I.C., Liao, A. & Coates, J.D. Transposon and deletion mutagenesis of genes involved in perchlorate reduction in Azospira suillum PS. MBio 5 e00769-13 (2013).

31. Zou, L. et al. SlyA regulates Type III Secretion System (T3SS) genes in parallel with the T3SS master regulator HrpL in Dickeya dadantii 3937. Appl. Environ. Microbiol. 78, 2888–2895 (2012).

32. Maier, T.M., Pechous, R., Casey, M., Zahrt, T.C. & Frank, D.W. In vivo Himar1-based transposon mutagenesis of Francisella tularensis. Appl. Environ. Microbiol. 72, 1878–1885 (2006).

33. Rollefson, J.B., Levar, C.E. & Bond, D.R. Identification of genes involved in biofilm formation and respiration via mini-Himar transposon mutagenesis of Geobacter sulfurreducens. J. Bacteriol. 191, 4207–4217 (2009).

34. Bonis, B.M. & Gralnick, J.A. Marinobacter subterrani, a genetically tractable neutrophilic Fe(II)-oxidizing strain isolated from the Soudan Iron Mine. Front. Microbiol. 6, 719 (2015).

35. Yu, R. & Kaiser, D. Gliding motility and polarized slime secretion. Mol. Microbiol. 63, 454–467 (2007).

36. Dey, A. & Wall, D. A genetic screen in Myxococcus xanthus identifies mutants that uncouple outer membrane exchange from a downstream cellular response. J. Bacteriol. 196, 4324–4332 (2014).

37. Zhu, L.-P. et al. Allopatric integrations selectively change host transcriptomes, leading to varied expression efficiencies of exotic genes in Myxococcus xanthus. Microb. Cell Fact. 14, 105 (2015).

38. Kwon, Y.M., Ricke, S.C. & Mandal, R.K. Transposon sequencing: methods and expanding applications. Appl. Microbiol. Biotechnol. 100, 31–43 (2015).

39. Kumar, A. Using yeast transposon-insertion libraries for phenotypic screening and protein localization. Cold Spring Harb. Protoc. 2016, http://dx.doi.org/10.1101/pdb.prot085217 (2016).

40. Alonso, J.M. et al. Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301, 653–657 (2003).

41. Newman, D. & Kolter, R. A role for excreted quinones in extracellular electron transfer. Nature 405, 94–97 (2000).

http://dx.doi.org/10.1101/pdb.prot085217

protocol rapid curation of gene disruption collections ... · quorum sensing, or, in our case,...

Documents