Blog
About

5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Holy Grail and the Bad Sampling - A test for the homogeneity of missing proportions for evaluating the agreement between peer review and bibliometrics in the Italian research assessment exercises

      Preprint

      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Two experiments for evaluating the agreement between bibliometrics and informed peer review - depending on two large samples of journal articles - were performed by the Italian governmental agency for research evaluation. They were presented as successful and as warranting the combined use of peer review and bibliometrics in research assessment exercises. However, the results of both experiments were supposed to be based on a stratified random sampling of articles with a proportional allocation, even if solely subsets of the original samples in the strata were selected owing to the presence of missing articles. Such a kind of selection has the potential to introduce biases in the results of the experiments, since different proportions of articles could be missed in different strata. In order to assess the 'representativeness' of the sampling, we develop a novel statistical test for assessing the homogeneity of missing proportions between strata and we consider its application to data of both experiments. Outcome of the testing procedure show that the null hypotesis of missing proportion homogeneity should be rejected for both experiments. As a consequence, the obtained samples cannot be considered as 'representative' of the population of articles submitted to the research assessments. It is therefore impossible to exclude that the combined use of peer review and bibliometrics might have introduced uncontrollable major biases in the final results of the Italian research assessment exercises. Moreover, the two experiments should not be considered as valid pieces of knowledge to be used in the ongoing search of the Holy Grail of a definite agreement between peer review and bibliometrics.

          Related collections

          Most cited references 9

          • Record: found
          • Abstract: not found
          • Article: not found

          Bibliometric evaluation vs. informed peer review: Evidence from Italy

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Evaluating scientific research in Italy: The 2004–10 research evaluation exercise

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Do they agree? Bibliometric evaluation vs informed peer review in the Italian research assessment exercise

               ,   (2016)
              During the Italian research assessment exercise, the national agency ANVUR performed an experiment to assess agreement between grades attributed to journal articles by informed peer review (IR) and by bibliometrics. A sample of articles was evaluated by using both methods and agreement was analyzed by weighted Cohen's kappas. ANVUR presented results as indicating an overall 'good' or 'more than adequate' agreement. This paper re-examines the experiment results according to the available statistical guidelines for interpreting kappa values, by showing that the degree of agreement, always in the range 0.09-0.42 has to be interpreted, for all research fields, as unacceptable, poor or, in a few cases, as, at most, fair. The only notable exception, confirmed also by a statistical meta-analysis, was a moderate agreement for economics and statistics (Area 13) and its sub-fields. We show that the experiment protocol adopted in Area 13 was substantially modified with respect to all the other research fields, to the point that results for economics and statistics have to be considered as fatally flawed. The evidence of a poor agreement supports the conclusion that IR and bibliometrics do not produce similar results, and that the adoption of both methods in the Italian research assessment possibly introduced systematic and unknown biases in its final results. The conclusion reached by ANVUR must be reversed: the available evidence does not justify at all the joint use of IR and bibliometrics within the same research assessment exercise.
                Bookmark

                Author and article information

                Journal
                29 October 2018
                Article
                1810.12430

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                Custom metadata
                97k80
                14 pages, 3 tables
                stat.AP physics.soc-ph stat.OT

                General physics, Applications, General statistics

                Comments

                Comment on this article