107
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reactome pathway analysis: a high-performance in-memory approach

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.

          Results

          Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user’s sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.

          Conclusion

          Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub ( https://github.com/reactome/).

          Related collections

          Most cited references12

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            • Record: found
            • Abstract: found
            • Article: not found

            High-throughput sequencing technologies.

            The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application. Copyright © 2015 Elsevier Inc. All rights reserved.
              • Record: found
              • Abstract: found
              • Article: not found

              Global functional profiling of gene expression.

              The typical result of a microarray experiment is a list of tens or hundreds of genes found to be differentially regulated in the condition under study. Independent of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently, this is done through a tedious combination of searches through the literature and a number of public databases. We developed Onto-Express (OE) as a novel tool able to automatically translate such lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function, and chromosome location. Statistical significance values are calculated for each category. We demonstrate the validity and the utility of this comprehensive global analysis of gene function by analyzing two breast cancer datasets from two separate laboratories. OE was able to identify correctly all biological processes postulated by the original authors, as well as discover novel relevant mechanisms.

                Author and article information

                Contributors
                fabregat@ebi.ac.uk
                ksidiro@ebi.ac.uk
                gviteri@ebi.ac.uk
                oscar.forner.martinez@gmail.com
                pablo.marin@uv.es
                vicente.arnau@uv.es
                Peter.D'Eustachio@nyumc.org
                Lincoln.Stein@oicr.on.ca
                hhe@ebi.ac.uk
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                2 March 2017
                2 March 2017
                2017
                : 18
                : 142
                Affiliations
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
                [2 ]Open Targets, Wellcome Genome Campus, Hinxton, UK
                [3 ]ISNI 0000 0001 2173 938X, GRID grid.5338.d, , Fundación Investigación INCLIVA, Universitat de València, ; Valencia, Spain
                [4 ]Instituto de Medicina Genomica, Valencia, Spain
                [5 ]ISNI 0000 0001 2173 938X, GRID grid.5338.d, , Escuela Técnica Superior de Ingenierías, Universitat de València, ; Valencia, Spain
                [6 ]Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia Spain
                [7 ]ISNI 0000 0001 2109 4251, GRID grid.240324.3, , NYU Langone Medical Center, ; New York, USA
                [8 ]ISNI 0000 0004 0626 690X, GRID grid.419890.d, , Ontario Institute for Cancer Research, ; Toronto, Canada
                [9 ]GRID grid.17063.33, Department of Molecular Genetics, , University of Toronto, ; Toronto, Canada
                [10 ]State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine; National Center for Protein Sciences, 102206, Beijing, China
                Article
                1559
                10.1186/s12859-017-1559-2
                5333408
                28249561
                eaa1719d-d05a-4883-a400-e16c99e40bab
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 14 July 2016
                : 22 February 2017
                Funding
                Funded by: National Human Genome Research Institute at the National Institutes of Health
                Award ID: U41 HG003751
                Funded by: NIH BD2K grant
                Award ID: U54 GM114833
                Funded by: Ontario Research (GL2) Fund
                Funded by: European Bioinformatics Institute (EMBL-EBI)
                Funded by: Open Targets (The target validation platform)
                Categories
                Software
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                pathway analysis,over-representation analysis,data structures

                Comments

                Comment on this article

                Related Documents Log