97
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F 1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.

          Results

          The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.

          Conclusions

          In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F 1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F 1 score in evaluating binary classification tasks by all scientific communities.

          Related collections

          Most cited references65

          • Record: found
          • Abstract: found
          • Article: not found

          The meaning and use of the area under a receiver operating characteristic (ROC) curve.

          A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Support vector machines

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Statistical comparison of classifiers over multiple data sets

                Bookmark

                Author and article information

                Contributors
                davidechicco@davidechicco.it
                jurman@fbk.eu
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                2 January 2020
                2 January 2020
                2020
                : 21
                : 6
                Affiliations
                [1 ]ISNI 0000 0004 0474 0428, GRID grid.231844.8, Krembil Research Institute, ; Toronto, Ontario, Canada
                [2 ]Peter Munk Cardiac Centre, Toronto, Ontario, Canada
                [3 ]ISNI 0000 0000 9780 0901, GRID grid.11469.3b, Fondazione Bruno Kessler, ; Trento, Italy
                Author information
                http://orcid.org/0000-0001-9655-7142
                http://orcid.org/0000-0002-2705-5728
                Article
                6413
                10.1186/s12864-019-6413-7
                6941312
                31898477
                44a19859-b041-441f-89ee-b3e699e5a45d
                © The Author(s) 2019

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 24 May 2019
                : 18 December 2019
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2020

                Genetics
                matthews correlation coefficient,binary classification,f1 score,confusion matrices,machine learning,biostatistics,accuracy,dataset imbalance,genomics

                Comments

                Comment on this article