High-quality pan-genome of <i>Escherichia coli</i> generated by excluding confounding and highly similar strains reveals an association between unique gene clusters and genomic islands

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The pan-genome analysis of bacteria provides detailed insight into the diversity and evolution of a bacterial population. However, the genomes involved in the pan-genome analysis should be checked carefully, as the inclusion of confounding strains would have unfavorable effects on the identification of core genes, and the highly similar strains could bias the results of the pan-genome state (open versus closed). In this study, we found that the inclusion of highly similar strains also affects the results of unique genes in pan-genome analysis, which leads to a significant underestimation of the number of unique genes in the pan-genome. Therefore, these strains should be excluded from pan-genome analysis at the early stage of data processing. Currently, tens of thousands of genomes have been sequenced for Escherichia coli, which provides an unprecedented opportunity as well as a challenge for pan-genome analysis of this classical model organism. Using the proposed strategies, a high-quality E. coli pan-genome was obtained, and the unique genes was extracted and analyzed, revealing an association between the unique gene clusters and genomic islands from a pan-genome perspective, which may facilitate the identification of genomic islands.

Related collections

Most cited references 66

Record: found
Abstract: found
Article: not found

Prokka: rapid prokaryotic genome annotation.

T Seemann (2014)

The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

0 comments Cited 4494 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Fast and sensitive protein alignment using DIAMOND.

Benjamin Buchfink, Chao Xie, Daniel Huson (2015)

The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

0 comments Cited 3254 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Interactive Tree Of Life (iTOL) v4: recent updates and new developments

Ivica Letunic, Peer Bork (2019)

Abstract The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. The current version introduces four new dataset types, together with numerous new features. Annotation options have been expanded and new control options added for many display elements. An interactive spreadsheet-like editor has been implemented, providing dataset creation and editing directly in the web interface. Font support has been rewritten with full support for UTF-8 character encoding throughout the user interface. Google Web Fonts are now fully supported in the tree text labels. iTOL v4 is the first tool which supports direct visualization of Qiime 2 trees and associated annotations. The user account system has been streamlined and expanded with new navigation options, and currently handles >700 000 trees from more than 40 000 individual users. Full batch access has been implemented allowing programmatic upload and export of trees and annotations.

0 comments Cited 2311 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Feng Gao: (View ORCID Profile)

Journal

Title: Briefings in Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Print): 1467-5463

ISSN (Electronic): 1477-4054

Publication date Created: July 18 2022

Publication date Other: July 18 2022

Publication date (Print): July 18 2022

Publication date (Electronic): July 09 2022

Volume: 23

Issue: 4

Article

DOI: 10.1093/bib/bbac283

SO-VID: d484f375-95c2-4295-9c59-07297061fbe3

License:

https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

High-quality pan-genome of Escherichia coli generated by excluding confounding and highly similar strains reveals an association between unique gene clusters and genomic islands

Read this article at

Abstract

Related collections

Pan African Medical Journal: COVID-19

Most cited references 66

Prokka: rapid prokaryotic genome annotation.

Fast and sensitive protein alignment using DIAMOND.

Interactive Tree Of Life (iTOL) v4: recent updates and new developments

Author and article information

Contributors

Journal

Article

History

Comments

Comment on this article

Similar content 116

Cited by 2

Most referenced authors 5,580