9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing.

          Results

          Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the “joint analysis” of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis.

          VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page.

          VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h.

          Conclusions

          We developed VarGenius, a “master” tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data.

          The software is freely available at: https://github.com/frankMusacchia/VarGenius

          Electronic supplementary material

          The online version of this article (10.1186/s12859-018-2532-4) contains supplementary material, which is available to authorized users.

          Related collections

          Author and article information

          Contributors
          f.musacchia@tigem.it
          andrea.ciolfi@opbg.net
          mutarelli@tigem.it
          a.bruselles@gmail.com
          castello@tigem.it
          m.pinelli@tigem.it
          swaraj.basu@workmail.com
          banfi@tigem.it
          casari@tigem.it
          marco.tartaglia@opbg.net
          nigro@tigem.it
          Journal
          BMC Bioinformatics
          BMC Bioinformatics
          BMC Bioinformatics
          BioMed Central (London )
          1471-2105
          12 December 2018
          12 December 2018
          2018
          : 19
          : 477
          Affiliations
          [1 ]Telethon Institute for Genetics and Medicine, Viale Campi Flegrei, 34, 80078 Pozzuoli (Naples), Italy
          [2 ]ISNI 0000 0001 0727 6809, GRID grid.414125.7, Genetics and Rare Diseases Research Division, , Bambino Gesù Children’s Hospital, Istituto di Ricovero e Cura a Carattere Scientifico, ; Rome, Italy
          [3 ]ISNI 0000 0000 9120 6856, GRID grid.416651.1, Department of Oncology and Molecular Medicine, , Istituto Superiore di Sanità, ; Rome, Italy
          [4 ]ISNI 0000 0001 2200 8888, GRID grid.9841.4, Università degli studi della Campania “Luigi Vanvitelli”, ; Caserta, Italy
          [5 ]ISNI 0000 0000 9919 9582, GRID grid.8761.8, Department of Medical Biochemistry and Cell Biology Institue of Biomedicine, , The Sahlgrenska Academy University of Gothenburg, ; Gothenburg, Sweden
          Author information
          http://orcid.org/0000-0001-9440-1080
          Article
          2532
          10.1186/s12859-018-2532-4
          6291943
          30541431
          e046779e-3aeb-4e3f-9bfe-ac3526d13214
          © The Author(s). 2018

          Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

          History
          : 12 June 2018
          : 21 November 2018
          Funding
          Funded by: FundRef http://dx.doi.org/10.13039/501100002426, Fondazione Telethon;
          Award ID: GSP15001
          Categories
          Software
          Custom metadata
          © The Author(s) 2018

          Bioinformatics & Computational biology
          Bioinformatics & Computational biology

          Comments

          Comment on this article