132
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs.

          Results

          Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored.

          Conclusions

          Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je. Je can also be easily installed in Galaxy through the Galaxy toolshed.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-016-1284-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Quantitative single-cell RNA-seq with unique molecular identifiers.

          Single-cell RNA sequencing (RNA-seq) is a powerful tool to reveal cellular heterogeneity, discover new cell types and characterize tumor microevolution. However, losses in cDNA synthesis and bias in cDNA amplification lead to severe quantitative errors. We show that molecular labels--random sequences that label individual molecules--can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Accounting for technical noise in single-cell RNA-seq experiments.

            Single-cell RNA-seq can yield valuable insights about the variability within a population of seemingly homogeneous cells. We developed a quantitative statistical method to distinguish true biological variability from the high levels of technical noise in single-cell experiments. Our approach quantifies the statistical significance of observed cell-to-cell variability in expression strength on a gene-by-gene basis. We validate our approach using two independent data sets from Arabidopsis thaliana and Mus musculus.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Counting absolute numbers of molecules using unique molecular identifiers.

              Counting individual RNA or DNA molecules is difficult because they are hard to copy quantitatively for detection. To overcome this limitation, we applied unique molecular identifiers (UMIs), which make each molecule in a population distinct, to genome-scale human karyotyping and mRNA sequencing in Drosophila melanogaster. Use of this method can improve accuracy of almost any next-generation sequencing method, including chromatin immunoprecipitation-sequencing, genome assembly, diagnostics and manufacturing-process control and monitoring.
                Bookmark

                Author and article information

                Contributors
                charles.girardot@embl.de
                jelle.scholtalbers@embl.de
                sajoscha.sauer@embl.de
                shu-yi.su@embl.de
                eileen.furlong@embl.de
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                8 October 2016
                8 October 2016
                2016
                : 17
                : 419
                Affiliations
                European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, D-69117 Germany
                Author information
                http://orcid.org/0000-0003-4301-3920
                Article
                1284
                10.1186/s12859-016-1284-2
                5055726
                27717304
                b7228d2c-358b-4529-91ac-3c23c961768f
                © The Author(s). 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 1 December 2015
                : 28 September 2016
                Categories
                Software
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                software,genomics,ngs,umi,multiplexing,duplicates
                Bioinformatics & Computational biology
                software, genomics, ngs, umi, multiplexing, duplicates

                Comments

                Comment on this article