17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

      review-article
      1 , * , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 13 , 6 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 5 , 22 , 1 , 28 , 29 , 30 , 31 , *
      Frontiers in Microbiology
      Frontiers Media S.A.
      microbiome, machine learning, disease prediction, biomarker identification, feature selection

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.

          Related collections

          Most cited references194

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            KEGG: kyoto encyclopedia of genes and genomes.

            M Kanehisa (2000)
            KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

              Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                URI : http://loop.frontiersin.org/people/614300/overview
                URI : http://loop.frontiersin.org/people/1202983/overview
                URI : http://loop.frontiersin.org/people/889468/overview
                URI : http://loop.frontiersin.org/people/1152668/overview
                URI : http://loop.frontiersin.org/people/599842/overview
                URI : http://loop.frontiersin.org/people/1229302/overview
                URI : http://loop.frontiersin.org/people/1228668/overview
                URI : http://loop.frontiersin.org/people/1227259/overview
                URI : http://loop.frontiersin.org/people/1031651/overview
                URI : http://loop.frontiersin.org/people/689327/overview
                URI : http://loop.frontiersin.org/people/1228499/overview
                URI : http://loop.frontiersin.org/people/1195979/overview
                URI : http://loop.frontiersin.org/people/39693/overview
                URI : http://loop.frontiersin.org/people/1196718/overview
                URI : http://loop.frontiersin.org/people/668409/overview
                URI : http://loop.frontiersin.org/people/1227246/overview
                URI : http://loop.frontiersin.org/people/1164241/overview
                URI : http://loop.frontiersin.org/people/1227104/overview
                URI : http://loop.frontiersin.org/people/59664/overview
                URI : http://loop.frontiersin.org/people/542326/overview
                URI : http://loop.frontiersin.org/people/62776/overview
                URI : http://loop.frontiersin.org/people/1000913/overview
                URI : http://loop.frontiersin.org/people/1197349/overview
                URI : http://loop.frontiersin.org/people/829866/overview
                URI : http://loop.frontiersin.org/people/1005587/overview
                URI : http://loop.frontiersin.org/people/90461/overview
                Journal
                Front Microbiol
                Front Microbiol
                Front. Microbiol.
                Frontiers in Microbiology
                Frontiers Media S.A.
                1664-302X
                19 February 2021
                2021
                : 12
                : 634511
                Affiliations
                [1] 1Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute , Madrid, Spain
                [2] 2Faculty of Engineering and Natural Sciences, International University of Sarajevo , Sarajevo, Bosnia and Herzegovina
                [3] 3Faculty of Technical Sciences, University of Novi Sad , Novi Sad, Serbia
                [4] 4Faculty of Mathematics and Computer Science, Nicolaus Copernicus University , Toruń, Poland
                [5] 5Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University , Skopje, North Macedonia
                [6] 6Institute of Genomics, Estonian Genome Centre, University of Tartu , Tartu, Estonia
                [7] 7Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu , Tartu, Estonia
                [8] 8Université Paris-Saclay, INRAE, MGP , Jouy-en-Josas, France
                [9] 9Department of Computer Networks and Systems, Silesian University of Technology , Gliwice, Poland
                [10] 10University Sarajevo School of Science and Technology , Sarajevo, Bosnia and Herzegovina
                [11] 11Department of Mathematical Analysis and Applications of Mathematics, Palacký University , Olomouc, Czechia
                [12] 12Department of Microbiology, University of Innsbruck , Innsbruck, Austria
                [13] 13South West University “Neofit Rilski” , Blagoevgrad, Bulgaria
                [14] 14Department of Computing, University of Turku , Turku, Finland
                [15] 15NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL , Caparica, Portugal
                [16] 16Centro de Matemática e Aplicações (CMA), FCT, UNL , Caparica, Portugal
                [17] 17Oncology Data Analytics Program, Catalan Institute of Oncology (ICO) Barcelona, Spain
                [18] 18Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL) , Barcelona, Spain
                [19] 19Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP) , Barcelona, Spain
                [20] 20Department of Clinical Sciences, Faculty of Medicine, University of Barcelona , Barcelona, Spain
                [21] 21EPIUnit – Instituto de Saúde Pública da Universidade do Porto , Porto, Portugal
                [22] 22Department of Computer Science, University of Crete , Heraklion, Greece
                [23] 23Department of Clinical Science, University of Bergen , Bergen, Norway
                [24] 24Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana , Ljubljana, Slovenia
                [25] 25Bioinformatics Research Unit, Riga Stradins University , Riga, Latvia
                [26] 26Department of Information Systems, Zefat Academic College , Zefat, Israel
                [27] 27Galilee Digital Health Research Center (GDH), Zefat Academic College , Zefat, Israel
                [28] 28School of Microbiology & APC Microbiome Ireland, University College Cork , Cork, Ireland
                [29] 29Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga , Málaga, Spain
                [30] 30Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III , Madrid, Spain
                [31] 31Institute of Molecular and Cell Biology, University of Tartu , Tartu, Estonia
                Author notes

                Edited by: George Tsiamis, University of Patras, Greece

                Reviewed by: Jonathan Badger, National Cancer Institute (NCI), United States; Suleyman Yildirim, Istanbul Medipol University, Turkey

                *Correspondence: Laura Judith Marcos-Zambrano, judith.marcos@ 123456imdea.org

                This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

                Article
                10.3389/fmicb.2021.634511
                7962872
                33737920
                8cedf571-0bc6-4447-acf0-9cc19eb2fc15
                Copyright © 2021 Marcos-Zambrano, Karaduzovic-Hadziabdic, Loncar Turukalo, Przymus, Trajkovik, Aasmets, Berland, Gruca, Hasic, Hron, Klammsteiner, Kolev, Lahti, Lopes, Moreno, Naskinova, Org, Paciência, Papoutsoglou, Shigdel, Stres, Vilne, Yousef, Zdravevski, Tsamardinos, Carrillo de Santa Pau, Claesson, Moreno-Indias and Truu.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 27 November 2020
                : 01 February 2021
                Page count
                Figures: 3, Tables: 4, Equations: 0, References: 192, Pages: 25, Words: 0
                Funding
                Funded by: Estonian Research Competency Council 10.13039/501100005189
                Award ID: PRG548
                Funded by: Ministerio de Ciencia e Innovación 10.13039/501100004837
                Award ID: IJC2019-042188-I
                Categories
                Microbiology
                Review

                Microbiology & Virology
                microbiome,machine learning,disease prediction,biomarker identification,feature selection

                Comments

                Comment on this article