10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifying Unmaintained Projects in GitHub

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Open source software has an increasing importance in modern software development. However, there is also a growing concern on the sustainability of such projects, which are usually managed by a small number of developers, frequently working as volunteers. Aims: In this paper, we propose an approach to identify GitHub projects that are not actively maintained. Our goal is to alert users about the risks of using these projects and possibly motivate other developers to assume the maintenance of the projects. Method: We train machine learning models to identify unmaintained or sparsely maintained projects, based on a set of features about project activity (commits, forks, issues, etc). We empirically validate the model with the best performance with the principal developers of 129 GitHub projects. Results: The proposed machine learning approach has a precision of 80%, based on the feedback of real open source developers; and a recall of 96%. We also show that our approach can be used to assess the risks of projects becoming unmaintained. Conclusions: The model proposed in this paper can be used by open source users and developers to identify GitHub projects that are not actively maintained anymore.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: not found
          • Article: not found

          Two case studies of open source software development: Apache and Mozilla

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Letter to the editor: Stability of Random Forest importance measures.

              The goal of this article (letter to the editor) is to emphasize the value of exploring ranking stability when using the importance measures, mean decrease accuracy (MDA) and mean decrease Gini (MDG), provided by Random Forest. We illustrate with a real and a simulated example that ranks based on the MDA are unstable to small perturbations of the dataset and ranks based on the MDG provide more robust results.
                Bookmark

                Author and article information

                Journal
                11 September 2018
                Article
                1809.04041
                3dc49760-8d58-441a-8ff7-b435e9728fa1

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Accepted at 12th International Symposium on Empirical Software Engineering and Measurement (ESEM), 10 pages, 2018
                cs.SE

                Software engineering
                Software engineering

                Comments

                Comment on this article