26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Beegle: from literature mining to disease-gene discovery

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          ToppGene Suite for gene list enrichment analysis and candidate gene prioritization

          ToppGene Suite (http://toppgene.cchmc.org; this web site is free and open to all users and does not require a login to access) is a one-stop portal for (i) gene list functional enrichment, (ii) candidate gene prioritization using either functional annotations or network analysis and (iii) identification and prioritization of novel disease candidate genes in the interactome. Functional annotation-based disease candidate gene prioritization uses a fuzzy-based similarity measure to compute the similarity between any two genes based on semantic annotations. The similarity scores from individual features are combined into an overall score using statistical meta-analysis. A P-value of each annotation of a test gene is derived by random sampling of the whole genome. The protein–protein interaction network (PPIN)-based disease candidate gene prioritization uses social and Web networks analysis algorithms (extended versions of the PageRank and HITS algorithms, and the K-Step Markov method). We demonstrate the utility of ToppGene Suite using 20 recently reported GWAS-based gene–disease associations (including novel disease genes) representing five diseases. ToppGene ranked 19 of 20 (95%) candidate genes within the top 20%, while ToppNet ranked 12 of 16 (75%) candidate genes among the top 20%.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The genetic association database.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gene prioritization through genomic data fusion.

              The identification of genes involved in health and disease remains a challenge. We describe a bioinformatics approach, together with a freely accessible, interactive and flexible software termed Endeavour, to prioritize candidate genes underlying biological processes or diseases, based on their similarity to known genes involved in these phenomena. Unlike previous approaches, ours generates distinct prioritizations for multiple heterogeneous data sources, which are then integrated, or fused, into a global ranking using order statistics. In addition, it offers the flexibility of including additional data sources. Validation of our approach revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. The approach described here offers an alternative integrative method for gene discovery.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                29 January 2016
                17 September 2015
                17 September 2015
                : 44
                : 2
                : e18
                Affiliations
                [1 ]Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium
                [2 ]iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium
                [3 ]Inserm UMR-S1052, CNRS UMR5286, Cancer Research Centre of Lyon, Lyon, France
                [4 ]Université de Lyon 1, Villeurbanne, France
                [5 ]Centre Léon Bérard, Lyon, France
                [6 ]Wellcome Trust Genome Campus, Hinxton, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
                [7 ]Department of Computer Science (DTAI), KU Leuven, Leuven 3001, Belgium
                Author notes
                [* ]To whom correspondence should be addressed. Tel: +32 16 32 73 86; Fax: +32 16 32 19 70; Email: sarah.elshal@ 123456esat.kuleuven.be
                Article
                10.1093/nar/gkv905
                4737179
                26384564
                d4c9c15f-bd43-480b-9c18-6ecf2fd0ce6a
                © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 29 August 2015
                : 25 August 2015
                : 03 March 2015
                Page count
                Pages: 8
                Categories
                7
                24
                Methods Online
                Custom metadata
                29 January 2016

                Genetics
                Genetics

                Comments

                Comment on this article