4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike.

          Aim of Review

          To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science.

          Key Scientific Concepts of Review

          This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

          Electronic supplementary material

          The online version of this article (10.1007/s11306-019-1588-0) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          mixOmics: An R package for ‘omics feature selection and multiple data integration

          The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            MassBank: a public repository for sharing mass spectral data for life sciences.

            MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data. 2010 John Wiley & Sons, Ltd.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              METLIN: a metabolite mass spectral database.

              Endogenous metabolites have gained increasing interest over the past 5 years largely for their implications in diagnostic and pharmaceutical biomarker discovery. METLIN (http://metlin.scripps.edu), a freely accessible web-based data repository, has been developed to assist in a broad array of metabolite research and to facilitate metabolite identification through mass analysis. METLINincludes an annotated list of known metabolite structural information that is easily cross-correlated with its catalogue of high-resolution Fourier transform mass spectrometry (FTMS) spectra, tandem mass spectrometry (MS/MS) spectra, and LC/MS data.
                Bookmark

                Author and article information

                Contributors
                +61 (0)8-6304-2705 , stacey.n.reinke@ecu.edu.au
                +61 (0)8-6304-2705 , d.broadhurst@ecu.edu.au
                Journal
                Metabolomics
                Metabolomics
                Metabolomics
                Springer US (New York )
                1573-3882
                1573-3890
                14 September 2019
                14 September 2019
                2019
                : 15
                : 10
                : 125
                Affiliations
                [1 ]ISNI 0000 0004 0389 4302, GRID grid.1038.a, Centre for Metabolomics & Computational Biology, School of Science, , Edith Cowan University, ; Joondalup, 6027 Australia
                [2 ]ISNI 0000000121138138, GRID grid.11984.35, Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, ; Cathedral Street, Glasgow, G1 1XQ Scotland, UK
                Author information
                http://orcid.org/0000-0002-8832-2607
                http://orcid.org/0000-0002-8392-2822
                http://orcid.org/0000-0002-0758-0330
                http://orcid.org/0000-0003-0775-9581
                Article
                1588
                10.1007/s11306-019-1588-0
                6745024
                31522294
                8cec5c74-d161-4547-858c-fd1bd2f62d02
                © The Author(s) 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

                History
                : 30 May 2019
                : 7 September 2019
                Funding
                Funded by: Australian Research Council
                Award ID: LE170100021
                Award Recipient :
                Categories
                Review Article
                Custom metadata
                © Springer Science+Business Media, LLC, part of Springer Nature 2019

                Molecular biology
                open access,reproducibility,data science,statistics,cloud computing,jupyter
                Molecular biology
                open access, reproducibility, data science, statistics, cloud computing, jupyter

                Comments

                Comment on this article