6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning

      research-article
      , ,
      Scientific Reports
      Nature Publishing Group UK
      Data mining, Software

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Multiple sequence alignment (MSA) is an integral part of molecular biology. But handling massive number of large sequences is still a bottleneck for most of the state-of-the-art software tools. Knowledge driven algorithms utilizing features of input sequences, such as high similarity in case of DNA sequences, can help in improving the efficiency of DNA MSA to assist in phylogenetic tree construction, comparative genomics etc. This article showcases the benefit of utilizing similarity features while performing the alignment. The algorithm uses suffix tree for identifying common substrings and uses a modified Needleman-Wunsch algorithm for pairwise alignments. In order to improve the efficiency of pairwise alignments, a knowledge base is created and a supervised learning with nearest neighbor algorithm is used to guide the alignment. The algorithm provided linear complexity O(m) compared to O( m 2). Comparing with state-of-the-art algorithms (e.g., HAlign II), SPARK-MSNA provided 50% improvement in memory utilization in processing human mitochondrial genome (mt. genomes, 100x, 1.1. GB) with a better alignment accuracy in terms of average SP score and comparable execution time. The algorithm is implemented on big data framework Apache Spark in order to improve the scalability. The source code & test data are available at: https://sourceforge.net/projects/spark-msna/.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: not found
          • Article: not found

          MapReduce

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            An efficient k-means clustering algorithm: analysis and implementation

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              On-line construction of suffix trees

              E Ukkonen (1995)
                Bookmark

                Author and article information

                Contributors
                vineevishnu@gmail.com
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                29 April 2019
                29 April 2019
                2019
                : 9
                : 6631
                Affiliations
                ISNI 0000 0001 2179 5111, GRID grid.413002.4, Department of Computational Biology and Bioinformatics, , University of Kerala, ; Thiruvananthapuram, Kerala India
                Author information
                http://orcid.org/0000-0001-6445-7833
                Article
                42966
                10.1038/s41598-019-42966-5
                6488671
                31036850
                b4146d8a-7666-4e31-96cb-65c118bcb5a4
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 4 July 2018
                : 12 April 2019
                Categories
                Article
                Custom metadata
                © The Author(s) 2019

                Uncategorized
                data mining,software
                Uncategorized
                data mining, software

                Comments

                Comment on this article