1
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Why publish your research Open Access with G3: Genes|Genomes|Genetics?

      Learn more and submit today!

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Insights from the reanalysis of high-throughput chemical genomics data for Escherichia coli K-12

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Despite the demonstrated success of genome-wide genetic screens and chemical genomics studies at predicting functions for genes of unknown function or predicting new functions for well-characterized genes, their potential to provide insights into gene function has not been fully explored. We systematically reanalyzed a published high-throughput phenotypic dataset for the model Gram-negative bacterium Escherichia coli K-12. The availability of high-quality annotation sets allowed us to compare the power of different metrics for measuring phenotypic profile similarity to correctly infer gene function. We conclude that there is no single best method; the three metrics tested gave comparable results for most gene pairs. We also assessed how converting quantitative phenotypes to discrete, qualitative phenotypes affected the association between phenotype and function. Our results indicate that this approach may allow phenotypic data from different studies to be combined to produce a larger dataset that may reveal functional connections between genes not detected in individual studies.

          Related collections

          Most cited references39

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          KEGG as a reference resource for gene and protein annotation

          KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

            Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Lipopolysaccharide endotoxins.

              Bacterial lipopolysaccharides (LPS) typically consist of a hydrophobic domain known as lipid A (or endotoxin), a nonrepeating "core" oligosaccharide, and a distal polysaccharide (or O-antigen). Recent genomic data have facilitated study of LPS assembly in diverse Gram-negative bacteria, many of which are human or plant pathogens, and have established the importance of lateral gene transfer in generating structural diversity of O-antigens. Many enzymes of lipid A biosynthesis like LpxC have been validated as targets for development of new antibiotics. Key genes for lipid A biosynthesis have unexpectedly also been found in higher plants, indicating that eukaryotic lipid A-like molecules may exist. Most significant has been the identification of the plasma membrane protein TLR4 as the lipid A signaling receptor of animal cells. TLR4 belongs to a family of innate immunity receptors that possess a large extracellular domain of leucine-rich repeats, a single trans-membrane segment, and a smaller cytoplasmic signaling region that engages the adaptor protein MyD88. The expanding knowledge of TLR4 specificity and its downstream signaling pathways should provide new opportunities for blocking inflammation associated with infection.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                G3 (Bethesda)
                Genetics
                g3journal
                G3: Genes|Genomes|Genetics
                Oxford University Press
                2160-1836
                January 2021
                22 December 2020
                22 December 2020
                : 11
                : 1
                : 1-13
                Affiliations
                [1 ] Department of Biochemistry and Biophysics, Texas A&M University and Texas Agrilife Research , College Station, TX 77843-2128, USA
                [2 ] Department of Biology, Texas A&M University , College Station, TX 77843-3258, USA
                Author notes

                We mourn the unexpected death of J.C.H. who led this project and passed away on January 23, 2020. We hope this publication will continue his scientific legacy.

                Corresponding author: Department of Biology, Texas A&M University, College Station, TX 77843-3258, USA. siegele@ 123456bio.tamu.edu
                Author information
                https://orcid.org/0000-0001-5570-4871
                https://orcid.org/0000-0001-8935-0696
                Article
                jkaa035
                10.1093/g3journal/jkaa035
                8022724
                33561236
                98c0dab2-2653-4018-89b2-da8236f96186
                © The Author(s) 2020. Published by Oxford University Press on behalf of Genetics Society of America.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 July 2020
                : 11 November 2020
                Page count
                Pages: 13
                Funding
                Funded by: National Institutes of Health, DOI 10.13039/100000002;
                Award ID: R01GM089636
                Categories
                Investigation
                AcademicSubjects/SCI01180
                AcademicSubjects/SCI01140
                AcademicSubjects/SCI00010
                AcademicSubjects/SCI00960

                Genetics
                phenotypic profiling,functional genomics,microbial genomics,microbial genetics,high-throughput studies

                Comments

                Comment on this article