21
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genomic Analysis of Bacillus licheniformis CBA7126 Isolated from a Human Fecal Sample

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Introduction Bacillus licheniformis is a Gram-positive, endospore-forming, saprophytic organism that occurs in plant and soil (Veith et al., 2004). A taxonomical approach shows that it is closely related to Bacillus subtilis (Lapidus et al., 2002; Xu and Côte, 2003; Rey et al., 2004). Generally, most bacilli are predominantly aerobic; however, B. licheniformis is a facultative anaerobe compared to other bacilli in ecological niches (Alexander, 1977). The commercial utility of the extracellular products of B. licheniformis makes this microorganism an economically interesting species (Kovács et al., 2009). For example, B. licheniformis is used industrially for manufacturing biochemicals, enzymes, antibiotics, and aminopeptidase. Several proteases such as α-amylase, penicillinase, pentosanase, cycloglucosyltransferase, β-mannanase, and certain pectinolytic enzymes are synthesized industrially using B. licheniformis (Rodríguez-Absi and Prescott, 1978; Rey et al., 2004). The proteases are used in the detergent industry and the amylases are utilized for starch hydrolysis, desizing of textiles, and sizing of paper (Erickson, 1976). In addition, certain strains are utilized to produce peptide antibiotics, specialty chemicals, and poly-γ-glutamic acid (Nierman and Maglott, 1989; Rey et al., 2004). The annotated genome sequence of B. licheniformis has been previously analyzed to assess the biotechnological importance of the organism (Veith et al., 2004). Since the first sequencing, the genomes of specific B. licheniformis strains have been sequenced to completely realize its industrial potential. In this study, genome sequencing of B. licheniformis CBA7126 isolated from a human fecal sample was performed to understand bacterial specificity. The genome sequence of CBA7126 revealed features such as stress response genes, antibiotic-resistance genes, and genes for resistance to toxic compounds, which are of considerable biotechnological value. Materials and methods Bacterial isolation, culture conditions, and DNA extraction B. licheniformis CBA7126 was isolated from the feces of a 74-year-old man in Geochang-gun, South Korea and was cultured under anaerobic conditions in Gifu Anaerobic Medium (GAM) (containing per liter of deionized distilled water: 10 g peptone, 3 g soytone, 10 g proteose peptone, 13.5 g bovine serum albumin, 5 g yeast extract, 2.2 g beef extract, 2.5 g monopotassium phosphate, 1.2 g liver extract, 3 g sodium chloride, 0.3 g l-cystein, 0.3 g sodium thioglychollate, 3 g dextrose, 5 g soluble starch) at 37°C for 48 h. Genomic DNA of strain CBA7126 was extracted using the QIAamp DNA extraction kit (Qiagen, USA) and QuickGene DNA tissue kit S (Kurabo, Japan), and purified using the MG genomic DNA purification kit (Doctor Protein, Korea) according to the manufacturer's instructions. The purity and concentration of the extracted genomic DNA were measured using the Nanodrop spectrophotometer (NanoDrop Technologies, UK). Genome sequencing, assembly, and annotation The genome of B. licheniformis CBA7126 was sequenced using a 20-kb SMRTbell library and PacBio RS II system (Pacific Biosciences, USA), and de novo assembly was performed using the HGAP2 protocol in PacBio SMRT Analysis version 2.3.0. rRNAs and tRNAs were analyzed using RNAmmer 1.2 (Lagesen et al., 2007) and tRNAscan-SE 1.21 (Lowe and Eddy, 1997), respectively. The potential coding regions and functional genes were predicted via a combination of Glimmer 3.02 (Delcher et al., 1999), COG database (Tatusov et al., 2003), the Rapid Annotation Search Tool (RAST) (Aziz et al., 2008), and the National Center for Biotechnology Information (NCBI) prokaryotic genome annotation pipeline (PGAP) 4.1 (Tatusova et al., 2016). Prophages in the genome were identified using the PHAge Search Tool (PHAST) (Zhou et al., 2011). In addition, pathogenicity of strain CBA7126 was predicted using PathogenFinder 1.1 (Cosentino et al., 2013). Carbohydrate-active enzymes were annotated using dbCAN (Yin et al., 2012). Comparative genomic analysis To identify the unique features of strain CBA7126, the genomes of B. licheniformis and Bacillus sp. strains (B. licheniformis B4164, B. licheniformis VTM3R78, B. licheniformis V30, B. licheniformis B4124, and Bacillus sp. H15-1) were selected for comparative genomic analysis using the NCBI genome database (http://www.ncbi.nlm.nih.gov/genome/). For calculation of overall genome relatedness, average nucleotide identity (ANI), and orthologous average nucleotide identity (OrthoANI) analysis of B. licheniformis CBA7126 was performed on sequences of related species using the ANI calculator (http://enve-omics.ce.gatech.edu/ani/) and orthologous average nucleotide identity tool (OAT) of ChunLab (Lee et al., 2016). The genome structure of strain CBA7126 was compared to those of B. licheniformis B4164 (LQYQ00000000.1), B. licheniformis VTM3R78 (FOFE00000000.1), B. licheniformis V30 (LQRR00000000.1), Bacillus sp. H15-1 (CP018249.1), and B. licheniformis B4124 (LKPQ00000000.1) having symmetric identity >97% with strain CBA7126, using the alignment program MAUVE (Darling et al., 2004). Pan-genome Orthologous Groups (POGs) were analyzed using BIOiPLUG Comparative Genomics Database (https://www.bioiplug.com/). Venn diagram was constructed based on the number of POGs of strain CBA7126 and the related strains. Clustered regularly interspaced short palindromic repeats (CRISPR) was analyzed using CRISPRfinder (Grissa et al., 2007). Multilocus sequence typing (MLST) Multilocus sequence typing (MLST) analysis based on internal sequences of adk, ccpA, recF, rpoB, spo0A, and sucC genes was performed (Larsen et al., 2012; Madslien et al., 2012). The MLST sequence type of strain CBA7126 was determined using the MLST 1.8 database (https://cge.cbs.dtu.dk/services/MLST/) of B. licheniformis (Larsen et al., 2012). Ethics approval The study protocol was approved by the institutional review board of the Theragen ETEX Bio Institute (700062-20160804-JR-005-02). Results General genomic features of B. licheniformis CBA7126 The genome of B. licheniformis CBA7126 was 4,216,391 bp long with a G + C content of 46.24 mol% (Table 1). The genome is predicted to contain two contigs of 4,209,959 and 6,972 bp. Strain CBA7126 genome contained 4,276 coding sequences, 24 rRNA genes (8 of the 16S-5S-23S RNA gene operon), and 81 tRNA genes (Figure 1). For functional classification, the genome of strain CBA7126 was analyzed using the Cluster of Orthologous Groups (COG) database (http://www.ncbi.nlm.nih.gov/COG/), and 3,743 genes were annotated. The annotated genes belonged to the following categories: function unknown (S; 884 genes), general function prediction only (R; 344), transcription (K; 319 genes), carbohydrate transport, and metabolism (G; 316 genes), amino acid transport and metabolism (E; 298 genes), inorganic ion transport and metabolism (P; 219 genes), energy production and conversion (C; 180 genes), replication, recombination, and repair (L; 140 genes), and secondary metabolite biosynthesis, transport, and catabolism (Q; 62 genes) (Supplementary Table 1). In addition, SEED viewer version 2.0 revealed that >9% of the major categories contained genes required for metabolism of “carbohydrates” (610 genes), “amino acids and derivatives” (457 genes), and “cofactors, vitamins, prosthetic group, pigments” (280 genes) (Supplementary Figure 1). A total of 193 CAZyme-encoding genes were annotated using dbCAN, including five for auxiliary activities (AAs), 39 for carbohydrate-binding modules (CBMs), 36 for carbohydrate esterases (CEs), 68 for glycoside hydrolases (GHs), 39 for glycosyl transferases (GTs), and 6 for polysaccharide lyases (PLs). Table 1 Features of the Bacilus licheniformis CBA7126 genome. B. licheniformis CBA7126 Sequencing platform PacBio RS II system Assembler PacBio SMRT Analysis 2.3.0 Assembly accession GCA_001950175.1 Methods reads 90,824 Assembly size (bp) 4,216,931 Contig numbers 2 N50 4,209,959 L50 1 Genome coverage 319.23 DNA G + C content (mol%) 46.24 CDSs 4,276 rRNA number 24 tRNA number 81 Genes assigned to COGs 3,743 CRISPRs 0 Figure 1 Graphic circular map of the Bacillus licheniformis CBA7126 genome. The outer circle shows RNA genes (red, tRNA; blue, rRNA) and genes on the sense and antisense strands (colored according to COG categories), shown from the outside of the circle to the center. The inner circle shows the GC skew, with yellow and blue indicating positive and negative values, respectively; the GC content is indicated in red and green. This genome map was visualized using CLgenomics 1.55 (Chun Lab Inc.). Comparative genomic data Analysis of the orthoANI values among Bacillus genome sequences with symmetric identity of >97% revealed that B. licheniformis CBA7126 has higher than 99% genome sequence similarity with other species. The genome of strain CBA7126 was closest to that of B. licheniformis VTM3R78 (99.99% orthoANI), followed by B. licheniformis B4164 (99.98%), Bacillus sp. H15-1 (99.85%), B. licheniformis B4124 (99.81%), and B. licheniformis V30 (99.80%) (Supplementary Figure 2). Similar results were also obtained using ANI. Based on the results of Lee et al. (2016), similarity values >95–96% indicated that two strains belong to the same species. Therefore, strain CBA7126 was confirmed to be a species of B. licheniformis. The genome of strain CBA7126 was aligned with more than 97% symmetric identity with those of strains B. licheniformis B4164, B. licheniformis VTM3R78, B. licheniformis V30, Bacillus sp. H15-1, and B. licheniformis B4124 using MAUVE. The genomic representations of the other strains were rearranged based on the structure of strain CBA7126. Gene order comparison was established for seven regions with Local Collinear Blocks (LCBs). The structure of strain CBA7126 was similar to that of B. licheniformis B4124 and B. licheniformis V30 (Supplementary Figure 3). Comparison of strain CBA7126 genomic structure with that of Bacillus sp. H15-1 showed that two major regions were in opposite direction. Analysis based on the POG of strain CBA7126 and the closely related strains identified 4,108 shared genes and 137 unique genes (Supplementary Figure 4). Strain CBA7126 possessed 19 genes among the unique genes: 1 poly (glycerol-phosphate) alpha-glucosyltransferase, 1 thymidylate synthase (FAD), 2 prophage-derived protein, and 15 hypothetical proteins. The three genes among the 19 unique genes of strain CBA7126 were classified to one carbon pool by folate, pyrimidine metabolism, and metabolic pathways, based on KEGG analysis. In addition, CRISPR analysis indicated that strain CBA7126 did not harbor any known CRISPRs. Phage and pathogenesis-related genes PHAST analysis was performed for identifying prophage contamination in the genome sequence of strain CBA7126. Contig 1 contained three intact and two incomplete prophages, whereas contig 2 contained only one incomplete prophage (Supplementary Figure 5). Intact regions of prophages were located between positions 1,596,547–1,623,555, 1,775,723–1,820,161, and 3,429,284–3,483,201 bp, respectively. Strain CBA7126 was identified to be a human pathogen with 0.81 probability in PathogenFinder 1.1. Analysis of pathogenesis-related genes showed that all the 238 analyzed genes encoded pathogenesis-associated proteins. Multilocus sequence typing (MLST) analysis MLST analysis of strain CBA7126 was performed using six housekeeping genes (adk, ccpA, recF, rpoB, spo0A, and sucC). MLST analysis showed that strain CBA7126 belonged to sequence type 3 since this organism harbored adk_2, ccpA_1, recF_1, rpoB_1, spo0A_1, and sucC_2 (Supplementary Table 2). Previously reported isolates of sequence type 3 are B. licheniformis NVH1023, F5520, CCUG41412, NVH1111, NVH1113, LMG17661, and M3. Stress response genes and resistance to toxic compounds Comparison with NCBI PGAP 4.1 showed that the genome of strain CBA7126 harbors several stress response genes and various genes required for resistance to antibiotics and toxic compounds (Tatusova et al., 2016). The identified stress tolerance genes encode general stress proteins (WP_003179040.1; WP_009329495.1; WP_011198337.1; WP_003186243.1), universal stress proteins (WP_011197701.1; WP_003178013.1), cold shock proteins (WP_003153604.1; WP_003179166.1), and the UV-damage repair protein UvrX (WP_003183238.1). These genes were closely associated with the survival of bacteria in the natural environment. The genes identified for resistance to toxic compounds encode monooxygenase (required for antibiotic resistance) (WP_017473926.1; WP_003181975.1), l-asparaginase (WP_003183042.1; WP_003183042.1; WP_061565867.1), the multidrug resistance protein NorM (WP_009328059.1), YkkD (WP_003180981.1), YkkC (WP_003180979.1), arginase (WP_009330115.1; WP_009330115.1; WP_003178878.1; WP_003178436.1), chemical damaging agent resistance protein C (WP_017474008.1; WP_003178723.1), toxic anion resistance protein (WP_003178733.1), lantibiotic-related proteins (WP_003186355.1; WP_003186351.1; WP_003186379.1; WP_003186381.1), bacitracin, and various proteins of the ABC transporter family. Among the genes related to stress response, l-asparaginase, arginase, lantibiotic, and bacitracin are used for industrial application. Data access The genome sequence of B. licheniformis CBA7126 has been deposited in DDBJ/ENA/GenBank under the accession numbers BDJJ01000001–BDJJ01000002. Author contributions SR and YN designed and coordinated all the experiments. HS performed cultivation, DNA extraction, and purification. CL, JK, HS, YK, YC, and CY performed the sequencing, genome assembly, gene prediction, gene annotation, and comparative genomic analysis. CL, YN, and SR wrote manuscript. All authors have read and approved the manuscript. Conflict of interest statement The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: not found
          • Article: not found

          tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The complete genome sequence of Bacillus licheniformis DSM13, an organism with great industrial potential.

            The genome of Bacillus licheniformis DSM13 consists of a single chromosome that has a size of 4,222,748 base pairs. The average G+C ratio is 46.2%. 4,286 open reading frames, 72 tRNA genes, 7 rRNA operons and 20 transposase genes were identified. The genome shows a marked co-linearity with Bacillus subtilis but contains defined inserted regions that can be identified at the sequence as well as at the functional level. B. licheniformis DSM13 has a well-conserved secretory system, no polyketide biosynthesis, but is able to form the lipopeptide lichenysin. From the further analysis of the genome sequence, we identified conserved regulatory DNA motives, the occurrence of the glyoxylate bypass and the presence of anaerobic ribonucleotide reductase explaining that B. licheniformis is able to grow on acetate and 2,3-butanediol as well as anaerobically on glucose. Many new genes of potential interest for biotechnological applications were found in B. licheniformis; candidates include proteases, pectate lyases, lipases and various polysaccharide degrading enzymes. Copyright 2004 S. Karger AG, Basel
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species

              Background Bacillus licheniformis is a Gram-positive, spore-forming bacterium widely distributed as a saprophytic organism in the environment. This species is a close relative of Bacillus subtilis, an organism that is second only to Escherichia coli in the level of detail at which it has been studied. Unlike most other bacilli, which are predominantly aerobic, B. licheniformis is a facultative anaerobe, which may allow it to grow in additional ecological niches. Certain B. licheniformis isolates are capable of denitrification; the relevance of this characteristic to environmental denitrification may be small, however, as the species generally persists in soil as endospores [1]. There are numerous commercial and agricultural uses for B. licheniformis and its extracellular products. The species has been used for decades in the manufacture of industrial enzymes including several proteases, α-amylase, penicillinase, pentosanase, cycloglucosyltransferase, β-mannanase and several pectinolytic enzymes. The proteases from B. licheniformis are used in the detergent industry as well as for dehairing and bating of leather [2,3]. Amylases from B. licheniformis are deployed for the hydrolysis of starch, desizing of textiles and sizing of paper [3]. Specific B. licheniformis strains are also used to produce peptide antibiotics such as bacitracin and proticin in addition to a number of specialty chemicals such as citric acid, inosine, inosinic acid and poly-γ-glutamic acid [4]. Some B. licheniformis isolates can mitigate the affects of fungal pathogens on maize, grasses and vegetable crops [5]. As an endospore-forming bacterium, the ability of the organism to survive under unfavorable environmental conditions may enhance its potential as a natural biocontrol agent. B. licheniformis can be differentiated from other bacilli on the basis of metabolic and physiological tests [6,7]; however, biochemical and phenotypic characteristics may be ambiguous among closely related species. Recent taxonomic studies indicate that B. licheniformis is closely related to B. subtilis and Bacillus amyloliquefaciens on the basis of comparisons of 16S rDNA and 16S-23S internal transcribed spacer (ITS) nucleotide sequences [8]. Lapidus et al. [9] recently constructed a physical map of the B. licheniformis chromosome using a PCR approach, and established a number of regions of colinearity where gene content and organization were conserved with the B. subtilis genome. Given that B. licheniformis is an industrial organism used for the manufacture of enzymes, antibiotics, and chemicals, important in nutrient cycling in the environment, and a species that is taxonomically related to B. subtilis, perhaps the best studied of all Gram-positive bacteria, we derived the complete nucleotide sequence of the B. licheniformis type strain (ATCC 14580) genome. With this data in hand, functional and comparative genomics studies can be initiated that may ultimately lead to new strategies for improving industrial strains as well as better understanding of genome evolution among the species within the subtilis-licheniformis group. Results and discussion General features of the B. licheniformis genome The genome of B. licheniformis ATCC 14580 consists of a circular chromosome of 4,222,336 base-pairs (bp) with an average G+C content of 46.2% (Table 1). No plasmids were found during the genome analysis, and none were found by agarose gel electrophoresis (data not shown). Using a combination of several gene-finding programs and manual inspection, 4,208 protein-coding sequences (CDSs) were predicted. These CDSs constitute 87% of the genome and have an average length of 873 bp (ranging from 78 to 10,767 bp). They are oriented on the chromosome primarily in the direction of replication (Figure 1) with 74.4% of the genes on the leading strand and 25.6% on the lagging strand. Among the 4,208 protein coding genes, 3,948 (94%) had significant similarity to proteins in PIR, 3,187 (76%) of these gene models contain Interpro motifs, and 2,895 (69%) contain protein motifs found in PFAM. The number of hypothetical and conserved hypothetical proteins in the B. licheniformis genome with hits in the PIR database was 1,318 (212 conserved hypothetical proteins). Among the list of hypothetical and conserved hypothetical gene products, 683 (52%) have protein motifs contained in PFAM (148 conserved hypothetical proteins). There are 72 tRNA genes representing all 20 amino acids and seven rRNA operons. The likely origin of replication (Figure 1) was identified by similarities to several features of the corresponding regions in B. subtilis and other bacteria. These included co-localization of four genes (rpmH, dnaA, dnaN, and recF) near the origin, GC nucleotide skew ((G-C)/(G+C)) analysis, and the presence of multiple dnaA-boxes and AT-rich sequences immediately upstream of the dnaA gene [10-12]. On the basis of these observations we assigned a cytosine residue of the BstBI restriction site between the rpmH and dnaA genes to be the first nucleotide of the B. licheniformis genome. The replication termination site was localized near 2.02 megabases (Mb) by GC skew analysis. This region lies roughly opposite the origin of replication (Figure 1). Unlike B. subtilis, there was no apparent gene encoding a replication terminator protein (rtp) in B. licheniformis. The Bacillus halodurans genome also lacks an obvious rtp function [13]; therefore, it seems likely that B. subtilis acquired the rtp gene following its divergence from B. halodurans and B. licheniformis. Transposable elements, prophages and atypical regions The genome of B. licheniformis ATCC 14580 contains nine identical copies of a 1,285 bp insertion sequence element termed IS3Bli1 [9]. This sequence shares a number of features with other IS3 family elements [9] including direct repeats of 3-5 bp, a 10-bp left inverted repeat, and a 9 bp right inverted repeat (Figure 2). IS3Bli1 encodes two predicted overlapping CDSs, designated orfA and orfB in relative translational reading frames of 0 and -1. The presence of a 'slippery heptamer' motif, AAAAAAG, before the stop codon in orfA may indicate that programmed translational frameshifting occurs between these two coding sequences, resulting in a single gene product [14]. The orfB gene product harbors the DD(35)E(7)K motif, a highly conserved pattern among insertion sequences. Eight of these IS3Bli1 elements lie in intergenic regions, and one interrupts the comP gene as noted previously [9]. In addition to these insertion sequences, the genome encodes a putative transposase that is most closely related (E = 1.8 × 10-11) to one identified in the Thermoanaerobacter tengcongensis genome [15]; however, similar transposase genes are also found in the chromosomes of B. halodurans [13], Oceanobacillus iheyensis [16], Streptococcus agalactiae [17] and Streptococcus pyogenes [18]. The presence of several bacteriophage lysogens or prophage-like elements was revealed by Smith-Waterman comparisons to other bacterial genomes and by their AT-rich signatures (Figure 3, Table 2). Prophage sequences, designated NZP1 and NZP3 (similar to B. subtilis prophages PBSX and φ-105), were discovered by noting the presence of nearby genes that code for the large subunit of terminase, a signature protein that is highly conserved among prophages [19]. Interestingly, a terminase gene was not observed in the third putative prophage, termed NZP2 (similarity to B. subtilis phage SPP1); however, its absence may be the result of genome deterioration during evolution. Interestingly, we observed that regions in which the G+C content is less than 39% usually encoded proteins that have no B. subtilis ortholog and share identity only with hypothetical and conserved hypothetical genes. Two of these AT-rich segments correspond to the NZP2 and NZP3 prophages. An isochore plot (Figure 3) also revealed the presence of a region with an atypically high (62%) G+C content. This segment contains two hypothetical genes whose sizes (3,831 and 2,865 bp) greatly exceed the size of an average CDS in B. licheniformis. The first gene encodes a protein of 1,277 amino acids for which Interpro predicts 16 collagen triple-helix repeats, and the amino acid pattern TGATGPT is repeated 75 times within the polypeptide. The second CDS is smaller, and encodes a protein with 11 collagen triple-helix repeats; the same TGATGPT motif recurs 56 times. The primary translation products from these genes do not contain canonical signal peptides for secretion, and they do not contain motifs for the twin-arginine or sortase-mediated translocation pathways. Therefore, it is not likely that they are exported to the cell surface or the extracellular medium. Interestingly, the chromosomal region (19 kb) adjacent to these genes is clearly non-colinear with the B. subtilis genome, and virtually all of the predicted genes encode hypothetical or conserved hypothetical proteins. There are a number of bacterial proteins listed in PIR that also contain collagen triple-helix repeat regions, including two from Mesorhizobium loti (accession numbers NF00607049 and NF00607035) and three from B. cereus (accession numbers NF01692528, NF01269899 and NF01694666). These putative orthologs share 53-76% amino-acid sequence identity with their counterparts in B. licheniformis, and their functions are unknown. Extracellular enzymes and metabolic activities In the Bacillus licheniformis genome, 689 of the 4,208 gene models have signal peptides forecast by SignalP [20]. Of these, 309 have no transmembrane domain predicted by TMHMM [21] and 134 are hypothetical or conserved hypothetical genes. Based on a manual examination of the remaining 175 genes, at least 82 are likely to encode secreted proteins and enzymes. Moreover, there are 27 predicted extracellular proteins encoded by the B. licheniformis ATCC 14580 genome that are not found in B. subtilis 168. In accordance with its saprophytic lifestyle, the secretome of B. licheniformis encodes numerous secreted enzymes that hydrolyze polysaccharides, proteins, lipids and other nutrients. Cellulose is the most abundant polysaccharide on Earth, and microorganisms that hydrolyze cellulose contribute to the global carbon cycle. Interestingly, two gene clusters involved in cellulose degradation and utilization were discovered in B. licheniformis, and there are no counterparts in B. subtilis 168. The enzymes encoded by the first gene cluster include two putative endoglucanases belonging to glycoside hydrolase families GH9 and GH5, a probable cellulose-1,4-β-cellobiosidase of family GH48, and a putative β-mannanase of family GH5. The β-mannanase (GH5) and endoglucanase (GH9) both harbor carbohydrate-binding motifs. With the exception of the cellulose-1,4-β-cellobiosidase (GH48), all of the gene products encoded in this cluster have secretory signal peptides, and all have homologs in Bacillus species other than B. subtilis. The overall G+C content of this cluster (48%) does not appear to differ appreciably from that of the genome average (46%). The second gene cluster encodes a putative β-glucosidase (GH1) and three components of a cellobiose-specific PTS transport complex. A second β-glucosidase (GH3) gene is present at an unlinked locus in the genome. Collectively, the genes in these two clusters should enable B. licheniformis to utilize cellulose as a carbon and energy source, converting it to cellobiose and ultimately glucose. In this regard, we have confirmed that B. licheniformis ATCC 14580 is capable of growth on carboxymethyl cellulose as a sole carbon source (not shown). The chromosome of B. licheniformis ATCC 14580 encodes a number of additional carbohydrase activities that may allow the organism to grow on a broad range of polysaccharides. These include xylanase, endo-arabinase and pectate lyase that may be involved in degradation of hemicellulose, α-amylase and α-glucosidase for starch hydrolysis, chitinases for the breakdown of chitooligosaccharides from fungi and insects, and levanase for utilization of β-D-fructans (levans). Several of these activities are marketed as industrial enzymes. Saprophytic organisms must utilize a variety of nitrogenous compounds as nutrients for growth and metabolism. On the basis of the information encoded in its genome, B. licheniformis ATCC 14580 possesses the ability to acquire nitrogen from exogenous proteins, peptides, amino acids, ammonia, nitrate and nitrite. Like B. subtilis, the repertoire of extracellular proteases produced by B. licheniformis includes serine proteases (aprE, epr, vpr), metalloprotease (mpr), and an assortment of endo- and exopeptidases (yjbG, ydiC, gcp, ykvY, ampS, bpr (two copies), yfxM, yuiE, yusX, ywaD, pepT). However, B. licheniformis also has the capacity to produce a number of additional proteases and peptidases that are not encoded in the B. subtilis genome. These include a clostripain-like protease, a zinc-metallopeptidase, a probable glutamyl endopeptidase, an aminopeptidase C homolog, two putative dipeptidases and a zinc-carboxypeptidase. B. licheniformis also has the ability to utilize amino and imino nitrogen from arginine, asparagine and glutamine via arginine deiminase, arginase, asparaginase and glutaminase activities. Interestingly, there appear to be two genes each for arginase, asparaginase and glutaminase. Presumably, the arginine deiminase activity is expressed during anaerobic growth on arginine, whereas the arginase activities are predominant during aerobic growth. The occurrence of putative arginase genes is somewhat of an enigma in B. licheniformis, because there are no genes encoding urease activity for the hydrolysis of urea that is generated by the arginase reaction. In addition to the absence of urease gene homologs (ureABC) in B. licheniformis, the glutamine ABC transporters (glnH, glnM, glnP, glnQ gene products) are also lacking. It appears that nitrogen assimilation and transport pathways may be coordinated similarly in B. licheniformis and B. subtilis owing to the presence of key genes such as glnA, glnR, tnrA and nrgA in both species. Likewise, the pathways for nitrate/nitrite transport and metabolism in B. licheniformis appear to be analogous to the corresponding pathways in B. subtilis as suggested by the presence of nasABC (nitrate transport), narGHIJ (respiratory nitrate reductase), and nasDEF (NADH-dependent nitrite reductase) genes. Unlike B. subtilis, B. licheniformis evidently possesses the capability for anaerobic respiration using nitric oxide reductase. Moreover, the gene encoding this activity lies in a cluster that includes CDSs for narK (nitrite extrusion protein), two putative fnr proteins (transcriptional regulators of anaerobic growth), and a dnrN-like gene product (nitric oxide-dependent regulator). These observations are consistent with previous findings that certain B. licheniformis isolates are capable of denitrification [22]. While denitrification is a process of major ecological importance, the contribution of B. licheniformis may be small as the species exists predominantly as endospores in soil [1]. Microbial D-hydantoinase enzymes have been applied to the industrial production of optically pure D-amino acids for synthesis of antibiotics, pesticides, sweeteners and therapeutic amino acids [23]. This enzyme catalyzes the hydrolysis of cyclic ureides such as dihydropyrimidines and 5-monosubstituted hydantoins to N-carbamoyl amino acids. Hydantoinase activities have been detected in a variety of bacterial genera, and a cluster of six genes in B. licheniformis appears to confer a similar capability. This gene cluster encodes N-methylhydantoinase (ATP-hydrolyzing), hydantoin utilization proteins A and B (hyuAB homologs), a possible transcriptional regulator (TetR/AcrR family), a putative pyrimidine permease, and a hypothetical protein that contains an Interpro domain (IPR004399) for phosphomethylpyrimidine kinase. Protein export, sporulation and competence pathways Kunst et al. [10] listed 18 genes that have a major role in the secretion of extracellular enzymes by the classical (Sec) pathway in B. subtilis 168. This list includes several chaperonins, signal peptidases, components of the signal recognition particle and protein translocase complexes. All members of this list have B. licheniformis counterparts. In addition to the Sec pathway, some B. subtilis proteins are directed into the twin-arginine (Tat) export pathway, possibly in a Sec-independent manner. Curiously, the B. licheniformis genome encodes three tat gene orthologs (tatAY, tatCD, and tatCY), but two others (tatAC and tatAD) are conspicuously absent. Furthermore, specific proteins may be exported to the cell surface via lipoprotein signal peptides or sortase factors. Lipoprotein signal peptides are cleaved with a specific signal peptidase (Lsp) encoded by the lspA gene in B. subtilis. An lspA homolog can be found in B. licheniformis as well, suggesting that this species may possess the ability to export lipoproteins via a similar mechanism. Lastly, surface proteins in Gram-positive bacteria are frequently attached to the cell wall by sortase enzymes, and genome analyses have revealed that more than one sortase is often produced by a given species. In this regard, three possible sortase gene homologs were detected in the genome of B. licheniformis ATCC 14580. Together these observations suggest that the central features of the protein export machinery are principally conserved in B. subtilis and B. licheniformis. From the list of 139 sporulation genes tabulated by Kunst et al. [10], all but six have obvious counterparts in B. licheniformis. These six exceptions (spsABCEFG) comprise an operon involved in synthesis of a spore coat polysaccharide in B. subtilis. In addition, the response regulator gene family (phrACEFGI) appears to have a low level of sequence conservation between B. subtilis and B. licheniformis. Natural competence (the ability to take up and process exogenous DNA in specific growth conditions) is a feature of few B. licheniformis strains [24]. The reasons for variability in competence phenotype have not been explored at the genetic level, but the genome data offer several possible explanations. Although the B. licheniformis genome encodes all of the late competence functions ascribed in B. subtilis (for example, comC, comEFG operons, comK, mecA), it lacks an obvious comS gene, and the comP gene is punctuated by an insertion sequence element (IS3Bli1), suggesting that the early stages of competence development have been pre-empted in B. licheniformis ATCC 14580. Whether these early functions can be restored by introducing the corresponding genes from B. subtilis is unknown. In addition to an apparent deficiency in DNA uptake, two type I restriction-modification systems were discovered that may also contribute to diminished transformation efficiencies. These are distinct from the ydiOPS genes of B. subtilis, and could participate in degradation of improperly modified DNA from heterologous hosts used during construction of recombinant expression vectors. Each of these loci in B. licheniformis (designated as BliI and BliII) encode putative HsdS, HsdM and HsdR subunits that share significant amino-acid sequence identity to type I restriction-modification proteins in other bacteria. Curiously, the G+C-content for these loci (37%) is substantially lower than the overall genome average (46%) which may hint that they are the result of gene acquisitions. Lastly, the synthesis of a glutamyl polypeptide capsule has also been implicated as a potential barrier to transformation of B. licheniformis strains [25]. While laboratory strains of B. subtilis usually do not produce significant capsular material, the genome sequence of B. subtilis 168 indicates that they may harbor the genes required for synthesis of polyglutamic acid. In contrast, many B. licheniformis isolates produce copious amounts of capsular material, giving rise to colonies with a wet or slimy appearance. Six genes were predicted (ywtABDEF and ywsC orthologs) that may be involved in the synthesis of polyglutamic acid capsular material in B. licheniformis. Antibiotics, secondary metabolites and siderophores Bacitracin is a cyclic peptide antibiotic that is synthesized non-ribosomally by some B. licheniformis isolates [26]. While there is variation in the prevalence of bacitracin synthase genes among laboratory strains of this species, one study suggested that up to 50% may harbor the bac operon [27]. Interestingly, the bac operon is not present in the type strain (ATCC 14580) genome. Seemingly, the only non-ribosomal peptide synthase operon encoded by the B. licheniformis type strain genome is that responsible for lichenysin biosynthesis. Lichenysin structurally resembles surfactin from B. subtilis [28], and their respective biosynthetic operons are highly similar. Surprisingly, we found no B. licheniformis counterparts for the pps (plipastatin synthase) and polyketide synthase (pks) operons of B. subtilis. Collectively, these two regions represent sizeable portions (80 kb and 38 kb, respectively) of the chromosome in B. subtilis, although they are reportedly dispensable [29]. Unexpectedly, a cluster of 11 genes was found encoding a lantibiotic, with its associated modification and transport functions. We designated this peptide of 75 amino acids as lichenicidin, and its closest homolog is mersacidin from Bacillus sp. strain HIL-Y85/54728 [30]. Lantibiotics are ribosomally synthesized peptides that are modified post-translationally so that the final molecules contain rare thioether amino acids such as lanthionine and/or methyl-lanthionine [31]. Like mersacidin, lichenicidin appears to be a type B lantibiotic, comprising a rigid globular peptide with no net charge (7 acidic residues, 7 basic residues) and a leader peptide with a conserved double glycine cleavage site (GG-type leader peptide). These antimicrobial compounds have attracted much attention in recent years as models for the design and genetic engineering of improved antimicrobial agents [32]. However, since several post-translational modifications and product-specific export functions are required, a dedicated expression system is a prerequisite to provide all the factors necessary to synthesize, modify and transport the lantibiotic peptide. With its history of use in industrial microbiology, B. licheniformis may be an attractive candidate for the development of such an expression system. Like B. subtilis 168, the B. licheniformis ATCC 14580 chromosome harbors a siderophore biosynthesis gene cluster (dhbABCEF), and the organization of the cluster is similar to the corresponding chromosomal segment in B. subtilis. In addition, the B. licheniformis genome contains a second gene cluster of four genes (iucABCD) that show significant similarity to proteins involved in aerobactin biosynthesis in E. coli. Surprisingly, a gene encoding the receptor protein (iutA homolog) was not found in B. licheniformis. The B. halodurans genome also contains genes that are homologous to iucABCD, but like B. licheniformis, no iutA homolog could be found using BLAST or Smith-Waterman searches. Comparison of the B. licheniformis genome with those of other bacilli The B. licheniformis ATCC 14580 gene models were compared to the list of essential genes in B. subtilis [33]. Predictably, all of the essential genes in B. subtilis have orthologs in B. licheniformis, and most are present in a wide range of bacterial taxa. In pairwise BLAST comparisons, 66% of the predicted B. licheniformis genes have orthologs in B. subtilis, and 55% of the gene models are represented by orthologous sequences in B. halodurans (E-value threshold of 1 × 10-5; Figure 4). Using a reciprocal BLASTP analysis we found 1,719 orthologs that are common to all three species (E-value threshold of 1 × 10-5). As noted by Lapidus et al. [9], there are broad regions of colinearity between the genomes of B. licheniformis and B. subtilis (Figure 5). Less conservation of genome organization exists between B. licheniformis and B. halodurans, and substantial genomic segments have been inverted in B. halodurans with respect to B. licheniformis and B. subtilis. These observations clearly support previous hypotheses [8] that B. subtilis and B. licheniformis are phylogenetically and evolutionarily closer to each other than to B. halodurans. Conclusions In comparisons of shared regions, the genomes of B. licheniformis ATCC 14580 and B. subtilis 168 are approximately 84.6% identical at the nucleotide level and show extensive organizational similarity. Accordingly, their genome sequences represent potentially useful instruments for comparative and evolutionary studies among species within the subtilis-licheniformis group, and they may offer new information regarding the evolution and ecology of these closely related species. Despite the broad colinearity of B. licheniformis and B. subtilis genomes, there are local regions that are individually unique. These include chromosome segments that comprise prophage and insertion sequence elements, DNA restriction-modification systems, antibiotic synthases, and a number of extracellular enzymes and metabolic activities that are not present in B. subtilis. It is tempting to speculate that the presence of these genes forecasts the ability of B. licheniformis to grow on an expanded array of substrates and/or in additional ecological niches compared to B. subtilis. Together, the similarities and differences may hint at overlapping but non-identical environmental niches for these taxa. The subtilis-licheniformis group of bacilli includes many strains that are used to manufacture industrial enzymes, antibiotics and biochemicals. The availability of a complete genome from B. licheniformis should permit a thorough comparison of the biochemical pathways and regulatory networks in B. subtilis and B. licheniformis, thereby offering new opportunities and strategies for improvement of industrial strains. When considering the safety of B. licheniformis as an industrial organism it should be noted that the species is considered neither a human pathogen nor a toxigenic microorganism [34]; however, there are reports in the literature implicating it as a causal agent of food poisoning. In these isolated cases, specific strains were shown to produce a toxin similar to cereulide, the emetic toxin of B. cereus [35]. Cereulide is a cyclic depsipeptide synthesized non-ribosomally [36]. Importantly, the only non-ribosomal peptide synthase genes found in the B. licheniformis ATCC 14580 genome are those that involved in synthesis of lichenysin. Similarly, we detected no homologs of the B. cereus hemolytic and non-hemolytic enterotoxins (Swiss-Prot accession numbers P80567, P80568, P80172, and P81242). In a comparison of the genotypic and phenotypic characteristics among 182 soil isolates of B. licheniformis, Manachini et al. [37] observed that while this bacterial species appears to be phenotypically homogeneous, clear genotypic differences are evident between isolates. They postulated the existence of three genomovars for B. licheniformis. Similarly, De Clerck and De Vos [38] proposed that this species consists of two lineages that can be distinguished using several molecular genotyping methods. The genome sequence data presented in this work should provide a solid foundation on which to conduct future studies to elucidate the genotypic variation among B. licheniformis isolates. Materials and methods Shotgun DNA sequencing and genome assembly The genome of B. licheniformis ATCC 14580 was sequenced by a combination of the whole-genome shotgun method [39] and fosmid end sequencing [40]. Plasmid libraries were constructed using randomly sheared and MboI-digested genomic DNA that was enriched for fragments of 2-3 kb by preparative agarose gel electrophoresis. Approximately 49,000 random clones were sequenced using dye-terminator chemistry (Applied Biosystems) with ABI 377 and ABI 3700 automated sequencers yielding approximately 6× coverage of the genome. A combination of methods was used for gap closure, including sequencing on fosmids [40] and primer-walking on selected clones and PCR-amplified DNA fragments. We also incorporated data from both ends of approximately 1,975 fosmid clones with an average insert size of 40 kb to aid in validating the final assembly. In total, the number of input reads was 62,685, with 78.6% of these incorporated into the assembled genome sequence. Individual nucleotides were called using TraceTuner 2.0 (Paracel), and sequence reads were assembled into contigs using the Paracel Genome Assembler using optimized parameters and the quality score set to >20. Phrap, Crossmatch and Consed were used for sequence finishing [41]. Prediction and annotation of CDSs Protein-coding regions in the assembled genome sequence were identified using a combination of previously described software tools including EasyGene [42], Glimmer [43] and FrameD [44]. EasyGene was used as the primary gene finder in these studies. It searches for protein matches in the raw genome data to derive a good training set, and an HMM with states for coding regions as well as ribosome-binding sites (RBSs) is estimated from the dataset. This HMM is used to score all the predicted CDSs in the genome, and the score is converted to a measure of significance (R-value) which is the expected number of CDSs that would be predicted in 1 Mb of random DNA. Gene models with R-values lower than 10 and a log-odds score of greater than -10 were included/considered significant. The principal advantage of this significance measure is that it properly takes into account the length distribution of random CDSs. EasyGene has been shown to match or exceed other gene finders currently available [42]. Glimmer was used as a secondary gene finder to aid in identification of small genes (< 100 bp) that were sometimes missed by EasyGene. Glimmer models were post-processed with RBSFINDER [45] to pinpoint the positions of start codons by searching for consensus Shine-Dalgarno sequences. According to the RBS states in the EasyGene HMM model, the bases with the highest probability were AAAAGGAG (the bases in bold type had distinctly higher probabilities compared to the initial AA). This motif concurs with the consensus Shine-Dalgarno sequence for B. subtilis (AAAGGAGG) [46]. RBSFINDER identified the core AAGGAG motif in around 80% of the cases for Glimmer gene predictions and adjusted the start codon accordingly. Manual inspection and alignments to B. subtilis homologs were also used to determine the incidence of specific genes. During the gene-finding process, possible errors and frameshifts were detected by both visual inspection of the CDSs to look for interrupted or truncated genes and by deploying FrameD software [44]. Frameshifts were resolved by re-sequencing of PCR-amplified segments or subclones. After re-sequencing and manual editing a total of 27 frameshifts remain in the genome assembly (excluding those contained in the IS3Bli1 elements). It is not known at present whether these represent pseudogenes or instances of programmed translational frameshifting. The positions of rRNA operons in the genome assembly were confirmed by long-range PCR amplification using primers that annealed to genes flanking the rRNA genes. These PCR fragments were sequenced to high redundancy and the consensus sequences were manually inserted into the assembly. Among the seven rRNA operons, the nucleotide sequences of 16S and 23S genes are at least 99% identical, differing by only one to three nucleotides in pairwise comparisons. Protein-coding sequences were annotated in an automated fashion with the following software applications. Predicted proteins were compared to the nonredundant database PIR-NREF [47] and the B. subtilis genome [48] using BLASTP with a E-value threshold of 1 × 10-5. InterProScan was used to predict putative function [49]. The InterPro analysis included comparison to PFAM [50], TIGRFAM [51], Interpro [52] signal peptide prediction using SignalP [20] and transmembrane domain prediction using TMHMM [21]. These CDSs were assigned to functional categories based on the Cluster of Orthologous Groups (COG) database [53] with manual verification as described [54,55]. Phage gene boundaries were predicted using gene finding algorithms and by homology to known bacteriophage genes. Transfer RNA genes were identified using tRNAscan-SE [56]. B. licheniformis genes that shared significant homology with B. subtilis counterparts were named using the nomenclature in the SubtiList database [48] with updated gene names from the BSORF [57] and UniProt [58] databases. Comparative analyses VisualGenome software (Rational Genomics) was used for comparisons of ortholog distribution among B. licheniformis, B. subtilis and B. halodurans genomes with precomputed BLAST results stored in a local database. Accession of genome sequence information The GenBank accession number for the B. licheniformis ATCC 14580 genome is CP000002. An interactive web portal for viewing and searching the assembled genome based on the generic genome browser developed by Stein et al. [59] is available at [60].
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Pharmacol
                Front Pharmacol
                Front. Pharmacol.
                Frontiers in Pharmacology
                Frontiers Media S.A.
                1663-9812
                13 October 2017
                2017
                : 8
                : 724
                Affiliations
                [1] 1Microbiology and Functionality Research Group, World Institute of Kimchi , Gwangju, South Korea
                [2] 2Division of Environmental Science & Ecological Engineering, Korea University , Seoul, South Korea
                [3] 3Research Group of Gut Microbiome, Korea Food Research Institute , Sungnam, South Korea
                [4] 4Department of Food Biotechnology, University of Science and Technology , Daejeon, South Korea
                Author notes

                Edited by: Annalisa Bruno, Università degli Studi “G. d'Annunzio” Chieti - Pescara, Italy

                Reviewed by: Vasvi Chaudhry, Institute of Microbial Technology (CSIR), India; Georgios Paschos, University of Pennsylvania, United States

                *Correspondence: Young-Do Nam youngdo98@ 123456kfri.re.kr

                This article was submitted to Inflammation Pharmacology, a section of the journal Frontiers in Pharmacology

                Article
                10.3389/fphar.2017.00724
                5645497
                29081747
                637c1eee-9d03-4aec-9527-f97cd1aa9a0b
                Copyright © 2017 Lee, Kim, Song, Kim, Choi, Yoon, Nam and Roh.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 01 August 2017
                : 25 September 2017
                Page count
                Figures: 1, Tables: 1, Equations: 0, References: 23, Pages: 5, Words: 2979
                Funding
                Funded by: Ministry of Science ICT and Future Planning 10.13039/501100004083
                Award ID: World Institute of Kimchi (KE1702-2)
                Funded by: Korea Food Research Institute 10.13039/501100003712
                Award ID: E0170602-01
                Funded by: National Research Foundation of Korea 10.13039/501100003725
                Award ID: 2015R1D1A1A09061039
                Categories
                Pharmacology
                Data Report

                Pharmacology & Pharmaceutical medicine
                bacillus licheniformis,genome sequence,human fecal sample,stress response genes,multilocus sequence typing

                Comments

                Comment on this article