67
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PRICE: Software for the Targeted Assembly of Components of (Meta) Genomic Sequence Data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Low-cost DNA sequencing technologies have expanded the role for direct nucleic acid sequencing in the analysis of genomes, transcriptomes, and the metagenomes of whole ecosystems. Human and machine comprehension of such large datasets can be simplified via synthesis of sequence fragments into long, contiguous blocks of sequence (contigs), but most of the progress in the field of assembly has focused on genomes in isolation rather than metagenomes. Here, we present software for paired-read iterative contig extension (PRICE), a strategy for focused assembly of particular nucleic acid species using complex metagenomic data as input. We describe the assembly strategy implemented by PRICE and provide examples of its application to the sequence of particular genes, transcripts, and virus genomes from complex multicomponent datasets, including an assembly of the BCBL-1 strain of Kaposi’s sarcoma-associated herpesvirus. PRICE is open-source and available for free download (derisilab.ucsf.edu/software/price/ or sourceforge.net/projects/pricedenovo/).

          Most cited references40

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Genomic islands in pathogenic and environmental microorganisms.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Efficient de novo assembly of large genomes using compressed data structures.

              De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA (String Graph Assembler). We describe algorithms to error-correct, assemble, and scaffold large sets of sequence data. SGA uses the overlap-based string graph model of assembly, unlike most de novo assemblers that rely on de Bruijn graphs, and is simply parallelizable. We demonstrate the error correction and assembly performance of SGA on 1.2 billion sequence reads from a human genome, which we are able to assemble using 54 GB of memory. The resulting contigs are highly accurate and contiguous, while covering 95% of the reference genome (excluding contigs <200 bp in length). Because of the low memory requirements and parallelization without requiring inter-process communication, SGA provides the first practical assembler to our knowledge for a mammalian-sized genome on a low-end computing cluster.
                Bookmark

                Author and article information

                Journal
                G3 (Bethesda)
                Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes, Genomes, Genetics
                G3: Genes|Genomes|Genetics
                Genetics Society of America
                2160-1836
                1 May 2013
                May 2013
                : 3
                : 5
                : 865-880
                Affiliations
                [* ]Department of Biochemistry and Biophysics, University of California, San Francisco, California 94044
                []GW Hooper Foundation Laboratories, University of California, San Francisco, California 94044
                []Howard Hughes Medical Institute, Chevy Chase, Maryland 20815
                Author notes

                Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.113.005967/-/DC1

                Sequence data from this article have been deposited in the NCBI SRA (NCBI Sequence Read Archive) under accession no. SRS367470.

                [1 ]Corresponding author: UCSF/HHMI, Byers Hall, Room s403c, 1700 4th Street, San Francisco, CA 94044. E-mail: grahamruby@ 123456yahoo.com
                [2]

                Present address: Novartis Institutes for Biomedical Research, Emeryville, CA.

                Article
                GGG_005967
                10.1534/g3.113.005967
                3656733
                23550143
                dbb3c932-d577-4531-8c70-66f708e930a8
                Copyright © 2013 Ruby et al.

                This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 01 November 2012
                : 19 March 2013
                Page count
                Pages: 16
                Categories
                Investigations
                Custom metadata
                v1

                Genetics
                kshv,de novo genome assembly,high-throughput dna sequencing,metagenomics
                Genetics
                kshv, de novo genome assembly, high-throughput dna sequencing, metagenomics

                Comments

                Comment on this article