27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976-0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: not found
          • Article: not found

          A Survey of App Store Analysis for Software Engineering

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Recalling the "imprecision" of cross-project defect prediction

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              On negative results when using sentiment analysis tools for software engineering research

                Bookmark

                Author and article information

                Journal
                20 March 2018
                Article
                10.1145/3196398.3196444
                1803.07292
                fd549e60-6a02-4b86-8deb-ce46ccd0db4b

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                MSR '18: 15th International Conference on Mining Software Repositories, May 28--29, 2018 Gothenburg, Sweden
                cs.SE

                Comments

                Comment on this article