1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      High-quality pan-genome of Escherichia coli generated by excluding confounding and highly similar strains reveals an association between unique gene clusters and genomic islands

      ,
      Briefings in Bioinformatics
      Oxford University Press (OUP)

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The pan-genome analysis of bacteria provides detailed insight into the diversity and evolution of a bacterial population. However, the genomes involved in the pan-genome analysis should be checked carefully, as the inclusion of confounding strains would have unfavorable effects on the identification of core genes, and the highly similar strains could bias the results of the pan-genome state (open versus closed). In this study, we found that the inclusion of highly similar strains also affects the results of unique genes in pan-genome analysis, which leads to a significant underestimation of the number of unique genes in the pan-genome. Therefore, these strains should be excluded from pan-genome analysis at the early stage of data processing. Currently, tens of thousands of genomes have been sequenced for Escherichia coli, which provides an unprecedented opportunity as well as a challenge for pan-genome analysis of this classical model organism. Using the proposed strategies, a high-quality E. coli pan-genome was obtained, and the unique genes was extracted and analyzed, revealing an association between the unique gene clusters and genomic islands from a pan-genome perspective, which may facilitate the identification of genomic islands.

          Related collections

          Most cited references66

          • Record: found
          • Abstract: found
          • Article: not found

          Prokka: rapid prokaryotic genome annotation.

          T Seemann (2014)
          The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Fast and sensitive protein alignment using DIAMOND.

            The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Interactive Tree Of Life (iTOL) v4: recent updates and new developments

              Abstract The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. The current version introduces four new dataset types, together with numerous new features. Annotation options have been expanded and new control options added for many display elements. An interactive spreadsheet-like editor has been implemented, providing dataset creation and editing directly in the web interface. Font support has been rewritten with full support for UTF-8 character encoding throughout the user interface. Google Web Fonts are now fully supported in the tree text labels. iTOL v4 is the first tool which supports direct visualization of Qiime 2 trees and associated annotations. The user account system has been streamlined and expanded with new navigation options, and currently handles >700 000 trees from more than 40 000 individual users. Full batch access has been implemented allowing programmatic upload and export of trees and annotations.
                Bookmark

                Author and article information

                Contributors
                Journal
                Briefings in Bioinformatics
                Oxford University Press (OUP)
                1467-5463
                1477-4054
                July 18 2022
                July 18 2022
                July 18 2022
                July 18 2022
                July 09 2022
                : 23
                : 4
                Article
                10.1093/bib/bbac283
                d484f375-95c2-4295-9c59-07297061fbe3
                © 2022

                https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

                History

                Comments

                Comment on this article