25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Alignment-free sequence comparison: benefits, applications, and tools

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13059-017-1319-7) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references142

          • Record: found
          • Abstract: found
          • Article: not found

          Improved tools for biological sequence comparison.

          We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Human-mouse alignments with BLASTZ.

            The Mouse Genome Analysis Consortium aligned the human and mouse genome sequences for a variety of purposes, using alignment programs that suited the various needs. For investigating issues regarding genome evolution, a particularly sensitive method was needed to permit alignment of a large proportion of the neutrally evolving regions. We selected a program called BLASTZ, an independent implementation of the Gapped BLAST algorithm specifically designed for aligning two long genomic sequences. BLASTZ was subsequently modified, both to attain efficiency adequate for aligning entire mammalian genomes and to increase its sensitivity. This work describes BLASTZ, its modifications, the hardware environment on which we run it, and several empirical studies to validate its results.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

              We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.
                Bookmark

                Author and article information

                Contributors
                wmk@amu.edu.pl
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                3 October 2017
                3 October 2017
                2017
                : 18
                : 186
                Affiliations
                [1 ]ISNI 0000 0001 2097 3545, GRID grid.5633.3, Department of Computational Biology, , Faculty of Biology, Adam Mickiewicz University in Poznan, ; Umultowska 89, 61-614 Poznan, Poland
                [2 ]ISNI 0000 0001 2181 4263, GRID grid.9983.b, IDMEC, Instituto Superior Técnico, Universidade de Lisboa, ; Av. Rovisco Pais 1, 1049-001 Lisbon, Portugal
                [3 ]ISNI 0000 0001 2216 9681, GRID grid.36425.36, Stony Brook University (SUNY), ; 101 Nicolls Road, Stony Brook, NY 11794 USA
                Article
                1319
                10.1186/s13059-017-1319-7
                5627421
                28974235
                26feb4e9-068b-40b9-88d1-f2d4ace0752d
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                Funding
                Funded by: KNOW RNA Research Centre in Poznan
                Award ID: 01/KNOW2/2014
                Categories
                Review
                Custom metadata
                © The Author(s) 2017

                Genetics
                Genetics

                Comments

                Comment on this article