R/parallel – speeding up bioinformatics analysis with R

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians.

Results

We have designed and implemented an R add-on package, R/parallel, that extends R by adding user-friendly parallel computing capabilities. With R/parallel any bioinformatician can now easily automate the parallel execution of loops and benefit from the multicore processor power of today's desktop computers. Using a single and simple function, R/parallel can be integrated directly with other existing R packages. With no need to change the implemented algorithms, the processing time can be approximately reduced N-fold, N being the number of available processor cores.

Conclusion

R/parallel saves bioinformaticians time in their daily tasks of analyzing experimental data. It achieves this objective on two fronts: first, by reducing development time of parallel programs by avoiding reimplementation of existing methods and second, by reducing processing time by speeding up computations on current desktop computers. Future work is focused on extending the envelope of R/parallel by interconnecting and aggregating the power of several computers, both existing office computers and computing clusters.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

Interval mapping of multiple quantitative trait loci.

Ritsert C Jansen (1993)

The interval mapping method is widely used for the mapping of quantitative trait loci (QTLs) in segregating generations derived from crosses between inbred lines. The efficiency of detecting and the accuracy of mapping multiple QTLs by using genetic markers are much increased by employing multiple QTL models instead of the single QTL models (and no QTL models) used in interval mapping. However, the computational work involved with multiple QTL models is considerable when the number of QTLs is large. In this paper it is proposed to combine multiple linear regression methods with conventional interval mapping. This is achieved by fitting one QTL at a time in a given interval and simultaneously using (part of) the markers as cofactors to eliminate the effects of additional QTLs. It is shown that the proposed method combines the easy computation of the single QTL interval mapping method with much of the efficiency and accuracy of multiple QTL models.

0 comments Cited 93 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

MetaNetwork: a computational protocol for the genetic study of metabolic networks.

Morris A Swertz, Jingyuan Fu, Ritsert C Jansen … (2007)

We here describe the MetaNetwork protocol to reconstruct metabolic networks using metabolite abundance data from segregating populations. MetaNetwork maps metabolite quantitative trait loci (mQTLs) underlying variation in metabolite abundance in individuals of a segregating population using a two-part model to account for the often observed spike in the distribution of metabolite abundance data. MetaNetwork predicts and visualizes potential associations between metabolites using correlations of mQTL profiles, rather than of abundance profiles. Simulation and permutation procedures are used to assess statistical significance. Analysis of about 20 metabolite mass peaks from a mass spectrometer takes a few minutes on a desktop computer. Analysis of 2,000 mass peaks will take up to 4 days. In addition, MetaNetwork is able to integrate high-throughput data from subsequent metabolomics, transcriptomics and proteomics experiments in conjunction with traditional phenotypic data. This way MetaNetwork will contribute to a better integration of such data into systems biology.

0 comments Cited 10 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Squid – a simple bioinformatics grid

Paulo C. F. Carvalho, Rafael Glória, Antonio B de Miranda … (2005)

Background BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Results Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Conclusion Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.

0 comments Cited 8 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2008

Publication date (Electronic): 22 September 2008

Volume: 9

Page: 390

Affiliations

[1 ]Groningen Bioinformatics Centre (GBiC), Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Haren, The Netherlands

[2 ]Computer Architecture and Operating Systems Department (CAOS), Universitat Autònoma de Barcelona, Bellaterra, Spain

Article

Publisher ID: 1471-2105-9-390

DOI: 10.1186/1471-2105-9-390

PMC ID: 2557021

PubMed ID: 18808714

SO-VID: f528aab8-2f00-4e06-ab20-3beffbb1355e

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

R/parallel – speeding up bioinformatics analysis with R

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 11

Interval mapping of multiple quantitative trait loci.

MetaNetwork: a computational protocol for the genetic study of metabolic networks.

Squid – a simple bioinformatics grid

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 171

Cited by 6

Most referenced authors 161