33
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Assembling genes from next-generation sequencing data is not only time consuming but computationally difficult, particularly for taxa without a closely related reference genome. Assembling even a draft genome using de novo approaches can take days, even on a powerful computer, and these assemblies typically require data from a variety of genomic libraries. Here we describe software that will alleviate these issues by rapidly assembling genes from distantly related taxa using a single library of paired-end reads: aTRAM, automated Target Restricted Assembly Method. The aTRAM pipeline uses a reference sequence, BLAST, and an iterative approach to target and locally assemble the genes of interest.

          Results

          Our results demonstrate that aTRAM rapidly assembles genes across distantly related taxa. In comparative tests with a closely related taxon, aTRAM assembled the same sequence as reference-based and de novo approaches taking on average < 1 min per gene. As a test case with divergent sequences, we assembled >1,000 genes from six taxa ranging from 25 – 110 million years divergent from the reference taxon. The gene recovery was between 97 – 99% from each taxon.

          Conclusions

          aTRAM can quickly assemble genes across distantly-related taxa, obviating the need for draft genome assembly of all taxa of interest. Because aTRAM uses a targeted approach, loci can be assembled in minutes depending on the size of the target. Our results suggest that this software will be useful in rapidly assembling genes for phylogenomic projects covering a wide taxonomic range, as well as other applications. The software is freely available http://www.github.com/juliema/aTRAM.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-015-0515-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: not found
          • Article: not found

          MapReduce

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Quake: quality-aware detection and correction of sequencing errors

            We introduce Quake, a program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake. Quake can be used for any size project, including more than one billion human reads, and is freely available as open source software from http://www.cbcb.umd.edu/software/quake.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA.

              To reliably identify lineages below the species level such as subspecies or varieties, we propose an extension to DNA-barcoding using next-generation sequencing to produce whole organellar genomes and substantial nuclear ribosomal sequence. Because this method uses much longer versions of the traditional DNA-barcoding loci in the plastid and ribosomal DNA, we call our approach ultra-barcoding (UBC). We used high-throughput next-generation sequencing to scan the genome and generate reliable sequence of high copy number regions. Using this method, we examined whole plastid genomes as well as nearly 6000 bases of nuclear ribosomal DNA sequences for nine genotypes of Theobroma cacao and an individual of the related species T. grandiflorum, as well as an additional publicly available whole plastid genome of T. cacao. All individuals of T. cacao examined were uniquely distinguished, and evidence of reticulation and gene flow was observed. Sequence variation was observed in some of the canonical barcoding regions between species, but other regions of the chloroplast were more variable both within species and between species, as were ribosomal spacers. Furthermore, no single region provides the level of data available using the complete plastid genome and rDNA. Our data demonstrate that UBC is a viable, increasingly cost-effective approach for reliably distinguishing varieties and even individual genotypes of T. cacao. This approach shows great promise for applications where very closely related or interbreeding taxa must be distinguished.
                Bookmark

                Author and article information

                Contributors
                juliema@illinois.edu
                daisie@mac.com
                quentin.cronk@ubc.ca
                kpjohnso@illinois.edu
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                25 March 2015
                25 March 2015
                2015
                : 16
                : 1
                : 98
                Affiliations
                [ ]Illinois Natural History Survey, University of Illinois, Champaign, IL 61820 USA
                [ ]Department of Botany and Beaty Biodiversity Centre, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
                Article
                515
                10.1186/s12859-015-0515-2
                4380108
                ac97731c-0974-4b5a-8e77-5fb9798562d5
                © Allen et al.; licensee BioMed Central. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 11 December 2014
                : 24 February 2015
                Categories
                Software
                Custom metadata
                © The Author(s) 2015

                Bioinformatics & Computational biology
                massively parallel sequence data,next-generation sequencing,targeted gene assembly,short-read archive,phylogenomics,phylogenetics

                Comments

                Comment on this article