+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      UCHIME improves sensitivity and speed of chimera detection

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Internal Transcribed Spacer) to assess diversity or compare populations. Undetected chimeras may be misinterpreted as novel species, causing inflated estimates of diversity and spurious inferences of differences between populations. Detection and removal of chimeras is therefore of critical importance in such experiments.

          Results: We describe UCHIME, a new program that detects chimeric sequences with two or more segments. UCHIME either uses a database of chimera-free sequences or detects chimeras de novo by exploiting abundance data. UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences. In testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus. UCHIME is >100× faster than Perseus and >1000× faster than ChimeraSlayer.

          Contact: robert@

          Availability: Source, binaries and data:

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references 20

          • Record: found
          • Abstract: not found
          • Article: not found

          Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

           S Altschul (1997)
          The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
            • Record: found
            • Abstract: found
            • Article: not found

            Search and clustering orders of magnitude faster than BLAST.

             Robert Edgar (2010)
            Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Binaries are available at no charge for non-commercial use at
              • Record: found
              • Abstract: found
              • Article: not found

              Recent developments in the MAFFT multiple sequence alignment program.

              The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at

                Author and article information

                Oxford University Press
                15 August 2011
                23 June 2011
                23 June 2011
                : 27
                : 16
                : 2194-2200
                1Tiburon, CA, USA, 2Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142, 3Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA and 4School of Engineering, University of Glasgow, Glasgow G12 8LT, UK
                Author notes
                * To whom correspondence should be addressed.

                Associate Editor: Martin Bishop

                © The Author(s) 2011. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Pages: 7
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology


                Comment on this article