42
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      R/parallel – speeding up bioinformatics analysis with R

      product-review
      1 , 2 , 1 , 2 ,
      BMC Bioinformatics
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians.

          Results

          We have designed and implemented an R add-on package, R/parallel, that extends R by adding user-friendly parallel computing capabilities. With R/parallel any bioinformatician can now easily automate the parallel execution of loops and benefit from the multicore processor power of today's desktop computers. Using a single and simple function, R/parallel can be integrated directly with other existing R packages. With no need to change the implemented algorithms, the processing time can be approximately reduced N-fold, N being the number of available processor cores.

          Conclusion

          R/parallel saves bioinformaticians time in their daily tasks of analyzing experimental data. It achieves this objective on two fronts: first, by reducing development time of parallel programs by avoiding reimplementation of existing methods and second, by reducing processing time by speeding up computations on current desktop computers. Future work is focused on extending the envelope of R/parallel by interconnecting and aggregating the power of several computers, both existing office computers and computing clusters.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Interval mapping of multiple quantitative trait loci.

          The interval mapping method is widely used for the mapping of quantitative trait loci (QTLs) in segregating generations derived from crosses between inbred lines. The efficiency of detecting and the accuracy of mapping multiple QTLs by using genetic markers are much increased by employing multiple QTL models instead of the single QTL models (and no QTL models) used in interval mapping. However, the computational work involved with multiple QTL models is considerable when the number of QTLs is large. In this paper it is proposed to combine multiple linear regression methods with conventional interval mapping. This is achieved by fitting one QTL at a time in a given interval and simultaneously using (part of) the markers as cofactors to eliminate the effects of additional QTLs. It is shown that the proposed method combines the easy computation of the single QTL interval mapping method with much of the efficiency and accuracy of multiple QTL models.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            MetaNetwork: a computational protocol for the genetic study of metabolic networks.

            We here describe the MetaNetwork protocol to reconstruct metabolic networks using metabolite abundance data from segregating populations. MetaNetwork maps metabolite quantitative trait loci (mQTLs) underlying variation in metabolite abundance in individuals of a segregating population using a two-part model to account for the often observed spike in the distribution of metabolite abundance data. MetaNetwork predicts and visualizes potential associations between metabolites using correlations of mQTL profiles, rather than of abundance profiles. Simulation and permutation procedures are used to assess statistical significance. Analysis of about 20 metabolite mass peaks from a mass spectrometer takes a few minutes on a desktop computer. Analysis of 2,000 mass peaks will take up to 4 days. In addition, MetaNetwork is able to integrate high-throughput data from subsequent metabolomics, transcriptomics and proteomics experiments in conjunction with traditional phenotypic data. This way MetaNetwork will contribute to a better integration of such data into systems biology.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Squid – a simple bioinformatics grid

              Background BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Results Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Conclusion Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2008
                22 September 2008
                : 9
                : 390
                Affiliations
                [1 ]Groningen Bioinformatics Centre (GBiC), Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Haren, The Netherlands
                [2 ]Computer Architecture and Operating Systems Department (CAOS), Universitat Autònoma de Barcelona, Bellaterra, Spain
                Article
                1471-2105-9-390
                10.1186/1471-2105-9-390
                2557021
                18808714
                f528aab8-2f00-4e06-ab20-3beffbb1355e
                Copyright © 2008 Vera et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 16 June 2008
                : 22 September 2008
                Categories
                Software

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article