8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols.

          Results

          We have developed a web-based application allowing direct upload of sequences from multiple virome samples using custom parameters. The samples are then processed in parallel using an identical protocol, and can be easily reanalyzed. The pipeline performs de-novo assembly, taxonomic classification of viruses as well as sample analyses based on user-defined grouping categories. Tables of virus abundance are produced from cross-validation by remapping the sequencing reads to a union of all observed reference viruses. In addition, read sets and reports are created after processing unmapped reads against known human and bacterial ribosome references. Secured interactive results are dynamically plotted with population and diversity charts, clustered heatmaps and a sortable and searchable abundance table.

          Conclusions

          The Vipie web application is a unique tool for multi-sample metagenomic analysis of viral data, producing searchable hits tables, interactive population maps, alpha diversity measures and clustered heatmaps that are grouped in applicable custom sample categories. Known references such as human genome and bacterial ribosomal genes are optionally removed from unmapped (‘dark matter’) reads. Secured results are accessible and shareable on modern browsers. Vipie is a freely available web-based tool whose code is open source.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12864-017-3721-7) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          ART: a next-generation sequencing read simulator.

          ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Database resources of the National Center for Biotechnology Information

            In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An Integrated Pipeline for de Novo Assembly of Microbial Genomes

              Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.
                Bookmark

                Author and article information

                Contributors
                jake.lin@uta.fi
                lenka.kramna@lfmotol.cuni.cz
                reija.autio@uta.fi
                heikki.hyoty@uta.fi
                matti.nykter@uta.fi
                ondrej.cinek@lfmotol.cuni.cz
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                15 May 2017
                15 May 2017
                2017
                : 18
                : 378
                Affiliations
                [1 ]ISNI 0000 0001 2314 6254, GRID grid.5509.9, BioMediTech and Faculty of Medicine and Life Sciences, , University of Tampere, ; PB 100, FI-33014 Tampere, Finland
                [2 ]Department of Pediatrics, 2nd Faculty of Medicine, Charles University and University Hospital Motol, V Úvalu 84, 150 06 Praha 5, Czech Republic
                [3 ]ISNI 0000 0001 2314 6254, GRID grid.5509.9, School of Social Sciences, , University of Tampere, ; Kalevantie 4, 33100 Tampere, Finland
                [4 ]ISNI 0000 0004 0472 1956, GRID grid.415018.9, Fimlab Laboratories, , Pirkanmaa Hospital District, ; Tampere, Finland
                Author information
                http://orcid.org/0000-0001-9928-1663
                Article
                3721
                10.1186/s12864-017-3721-7
                5430618
                28506246
                e614b1f3-343d-4dc3-a411-904df42b2afa
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 28 January 2017
                : 25 April 2017
                Funding
                Funded by: Technologická Agentura České Republiky (CZ)
                Award ID: 15-31426A
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100004371, Tampereen Yliopisto;
                Award ID: BMT Grad School
                Award Recipient :
                Categories
                Software
                Custom metadata
                © The Author(s) 2017

                Genetics
                metagenomics,viromes,virus,assembly,ngs analysis,visualization,parallel processing,viral dark matter
                Genetics
                metagenomics, viromes, virus, assembly, ngs analysis, visualization, parallel processing, viral dark matter

                Comments

                Comment on this article