9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia, demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at https://github.com/vinuesa/get_phylomarkers. A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at https://hub.docker.com/r/csicunam/get_homologues/, which can be easily run on any platform.

          Related collections

          Most cited references101

          • Record: found
          • Abstract: found
          • Article: not found

          Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

          The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Clustal Omega for making accurate alignments of many protein sequences.

            Clustal Omega is a widely used package for carrying out multiple sequence alignment. Here, we describe some recent additions to the package and benchmark some alternative ways of making alignments. These benchmarks are based on protein structure comparisons or predictions and include a recently described method based on secondary structure prediction. In general, Clustal Omega is fast enough to make very large alignments and the accuracy of protein alignments is high when compared to alternative packages. The package is freely available as executables or source code from www.clustal.org or can be run on-line from a variety of sites, especially the EBI www.ebi.ac.uk.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation.

              The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Microbiol
                Front Microbiol
                Front. Microbiol.
                Frontiers in Microbiology
                Frontiers Media S.A.
                1664-302X
                01 May 2018
                2018
                : 9
                : 771
                Affiliations
                [1] 1Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México , Cuernavaca, Mexico
                [2] 2Estación Experimental de Aula Dei – Consejo Superior de Investigaciones Científicas , Zaragoza, Spain
                [3] 3Fundación Agencia Aragonesa para la Investigacion y el Desarrollo (ARAID) , Zaragoza, Spain
                Author notes

                Edited by: Jesus L. Romalde, Universidade de Santiago de Compostela, Spain

                Reviewed by: Francisco Jose Roig, Universitat de València, Spain; Anne-Kristin Kaster, Karlsruher Institut für Technologie, Germany

                This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

                †Bruno Contreras-Moreira orcid.org/0000-0002-5462-907X

                Article
                10.3389/fmicb.2018.00771
                5938378
                29765358
                991a9ea0-4ae9-4bfc-9f29-252afafbc9c1
                Copyright © 2018 Vinuesa, Ochoa-Sánchez and Contreras-Moreira.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 16 January 2018
                : 05 April 2018
                Page count
                Figures: 7, Tables: 3, Equations: 0, References: 139, Pages: 22, Words: 16428
                Funding
                Funded by: Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México 10.13039/501100006087
                Award ID: IN201806-2
                Award ID: IN211814
                Award ID: IN206318
                Funded by: Consejo Nacional de Ciencia y Tecnología 10.13039/501100007350
                Award ID: P1-60071
                Award ID: 179133
                Funded by: Consejo Superior de Investigaciones Científicas 10.13039/501100003339
                Award ID: 200720I038
                Award ID: FC-2105-2-879
                Funded by: Ministerio de Economía y Competitividad 10.13039/501100003329
                Award ID: AGL2013-48756-R
                Categories
                Microbiology
                Original Research

                Microbiology & Virology
                phylogenetics,genome-phylogeny,maximum-likelihood,species-tree,species delimitation,stenotrophomonas maltophilia complex,mexico

                Comments

                Comment on this article