Introduction In addition to protein-coding sequences, the essential genome of any organism contains essential structural elements, non-coding RNAs and regulatory sequences. We have identified the Caulobacter crescentus essential genome to 8 bp resolution by performing ultrahigh-resolution transposon mutagenesis followed by high-throughput DNA sequencing to determine the transposon insertion sites. A notable feature of C. crescentus is that the regulatory events that control polar differentiation and cell-cycle progression are highly integrated, and they occur in a temporally restricted order (McAdams and Shapiro, 2011). Many components of the core regulatory circuit have been identified and simulation of the circuitry has been reported (Shen et al, 2008). The identification of all essential DNA elements is essential for a complete understanding of the regulatory networks that run a bacterial cell. Essential protein-coding sequences have been reported for several bacterial species using relatively low-throughput transposon mutagenesis (Hutchison et al, 1999; Jacobs et al, 2003; Glass et al, 2006) and in-frame deletion libraries (Kobayashi et al, 2003; Baba et al, 2006). Two recent studies used high-throughput transposon mutagenesis for fitness and genetic interaction analysis (Langridge et al, 2009; van Opijnen et al, 2009). Here, we have reliably identified all essential coding and non-coding chromosomal elements, using a hyper-saturated transposon mutagenesis strategy that is scalable and can be extended to obtain rapid and highly accurate identification of the entire essential genome of any bacterial species at a resolution of a few base pairs. Results and discussion We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010; Supplementary Figure 1A; Materials and methods). Thus, the Tn5Pxyl element can activate or disrupt transcription at any site of integration, depending on the insertion orientation. About 8 × 105 viable Tn5Pxyl transposon insertion mutants capable of colony formation on rich media (PYE) plates were pooled. Next, DNA from hundred of thousands of transposon insertion sites reading outwards into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired-end sequencing (Figure 1; Supplementary Figure 1B; Materials and methods). A single sequencing run yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions (Supplementary Figure 1C) and were subsequently mapped to the 4-Mbp genome, allowing us to determine the location and orientation of 428 735 independent transposon insertions with base-pair accuracy (Figure 2A; Materials and methods). Eighty percent of the genome sequence showed an ultrahigh density of transposon hits; an average of one insertion event every 7.65 bp. The largest gap detectable between consecutive insertions was 6% of all essential ORFs (30 out of 480) appear to be shorter than the annotated ORF (Supplementary Table 1), suggesting that these are probably mis-annotated, as well. Thus, 145 ORFs showed all regions were essential, 60 ORFs showed non-essential C-termini and the start of 30 ORFs were mis-annotated. The remaining 245 ORFs tolerated occasional insertions within a few amino acids of the ORF boundaries (Supplementary Figure 3; Materials and methods). The majority of the essential ORFs have annotated functions. They participate in diverse core cellular processes such as ribosome biogenesis, energy conversion, metabolism, cell division and cell-cycle control. Forty-nine of the essential proteins are of unknown function (Table I; Supplementary Table 2). We attempted to delete 11 of the genes encoding essential hypothetical proteins and recovered no in-frame deletions, confirming that these proteins are indeed essential (Supplementary Table 3). Among the 480 essential ORFs, there were 10 essential transcriptional regulatory proteins (Supplementary Table 4), including the cell-cycle regulators ctrA, gcrA, ccrM, sciP and dnaA (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010), plus 5 uncharacterized putative transcription factors. We surmise that these five uncharacterized transcription factors either comprise transcriptional activators of essential genes or repressed genes that would move the cell out of its replicative state. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti-sigma factor ChrR, which mitigates rpoE-dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor comprise the essential core transcriptional regulators for growth on rich media. Essential promoter elements To characterize the core components of the Caulobacter cell-cycle control network, we identified essential regulatory sequences and operon transcripts (Supplementary Data-DT3 and DT4). Figure 3A illustrates the transposon scanning strategy used to locate essential promoter sequences. The promoter regions of 210 essential genes were fully contained within the upstream intergenic sequences, and promoter regions of 101 essential genes extended upstream into flanking ORFs (Table I). We also identified 206 essential genes that are co-transcribed with the corresponding flanking gene(s) and experimentally mapped 91 essential operon transcripts (Table I; Supplementary Data-DT4). One example of an essential operon is the transcript encoding ATPase synthase components (Figure 3B). Altogether, the 480 essential protein-coding and 37 essential RNA-coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression (Table I). Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007). We found that 79/105 essential promoter regions extended on average 53 bp upstream beyond previously identified TSS (Figure 3C; McGrath et al, 2007). These essential control elements accommodate binding sites for transcription factors and RNA polymerase sigma factors (Supplementary Table 5). Of the 402 essential promoter regions, 26 mapped downstream of the predicted TSS. To determine if these contained an additional TSS, we fused the newly identified promoter regions with lacZ and found that 24 contained an additional TSS (Supplementary Table 6). Therefore, 24 genes contain at least 2 TSS and only the downstream site was found to be essential during growth on rich media. The upstream TSS may be required under alternative growth conditions. Cell cycle-regulated essential genes Of the essential ORFs, 84 have a cell cycle-dependent transcription pattern (McGrath et al, 2007; Supplementary Data-DT5). The cell cycle-regulated essential genes had statistically significant longer promoter regions compared with non-cell cycle-regulated genes (median length 87 versus 41 bp, Mann–Whitney test, P-value 0.0018). The genes with longer promoter regions generally have more complex transcriptional control. Among these are key genes that are critical for the commitment to energy requirements and regulatory controls for cell-cycle progression. For example, the cell-cycle master regulators ctrA, dnaA and gcrA (Collier et al, 2006) ranked among the genes with the longest essential promoter regions (Figure 3D and E; Supplementary Data-DT5). Other essential cell cycle-regulated genes with exceptionally long essential promoters included ribosomal genes, gyrB encoding DNA gyrase and the ftsZ cell-division gene (Figure 3E). The essential promoter region of ctrA extended 171 bp upstream of the start codon (Figure 3F) and included two previously characterized promoters that control its transcription by both positive and negative feedback regulation (Domian et al, 1999; Tan et al, 2010). Only one of the two upstream SciP binding sites in the ctrA promoter (Tan et al, 2010) was contained within the essential promoter region (Figure 3F), suggesting that the regulatory function of the second SciP binding site upstream is non-essential for growth on rich media. Altogether, the essential Caulobacter genome contains at least 492 941 bp. Essential protein-coding sequences comprise 90% of the essential genome. The remaining 10% consists of essential non-coding RNA sequences, gene regulatory elements and essential genome replication features (Table I). Essential genome features are non-uniformly distributed along the Caulobacter genome and enriched near the origin and the terminus regions, indicating that there are constraints on the chromosomal positioning of essential elements (Figure 4A). The chromosomal positions of the published E. coli essential coding sequences are preferentially located at either side of the origin (Figure 4A; Rocha, 2004). The question of what genes constitute the minimum set required for prokaryotic life has been generally estimated by comparative essentiality analysis (Carbone, 2006) and for a few species experimentally via large-scale gene perturbation studies (Akerley et al, 1998; Hutchison et al, 1999; Kobayashi et al, 2003; Salama et al, 2004). Of the 480 essential Caulobacter ORFs, 38% are absent in most species outside the α-proteobacteria and 10% are unique to Caulobacter (Figure 4B). Interestingly, among 320 essential Caulobacter proteins that are conserved in E. coli, more than one third are non-essential (Figure 4C). The variations in essential gene complements relate to differences in bacterial physiology and life style. For example, ATP synthase components are essential for Caulobacter, but not for E. coli, since Caulobacter cannot produce ATP through fermentation. Thus, the essentiality of a gene is also defined by non-local properties that not only depend on its own function but also on the functions of all other essential elements in the genome. The strategy described here provides a direct experimental approach that, because of its simplicity and general applicability, can be used to quickly determine the essential genome for a large class of bacterial species. Materials and methods Supplementary information includes descriptions of (i) transposon construction and mutagenesis, (ii) DNA library preparation and sequencing, (iii) sequence processing, (iv) essentiality analysis and (v) statistical data analysis. Supplementary Material Supplementary Information Supplementary Figures S1–3, Supplementary Tables S1–7 Dataset 1 Excel file containing several Supplemental data tables in different worksheets Review Process File