32
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Altmetrics in Plastic Surgery Journals: Does It Correlate With Citation Count?

      1 , 2 , 3 , 1 , 1
      Aesthetic Surgery Journal
      Oxford University Press (OUP)

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Altmetrics (alternative metrics) have become one of the most commonly utilized metrics to track the impact of research articles across electronic and social media platforms.

          Objectives

          The goal of this study was to identify whether the Altmetric Attention Score (AAS) is a good proxy for citation counts and whether it can be employed as an accurate measure to complement the current gold standard.

          Methods

          The authors conducted a citation analysis of all articles published in 6 plastic surgery journals during the 2016 calendar year. Citation counts and AAS were abstracted and analyzed.

          Results

          A total of 1420 articles were identified. The mean AAS was 11 and the median AAS was 1. The journal with the highest mean AAS was Aesthetic Surgery Journal (31), followed by Plastic and Reconstructive Surgery (19). A weak positive correlation was identified (r = 0.33, P < .0001) between AAS and citations. Articles in the top 1% in terms of citation counts showed strong positive correlation between AAS and citation counts (r = 0.64, P = .01). On the contrary, articles in the top 1% of AAS had no significant correlation with citation counts (r = −0.31, P = .29).

          Conclusions

          Overall correlation between citations and AAS was weak, and therefor AAS may not be an accurate early predictor of future citations. The 2 metrics seem to measure different aspects of the impact of scholarly work and should be utilized in tandem for determining the reach of a scientific article.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: not found
          • Article: not found

          The history and meaning of the journal impact factor.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact

            Background Citations in peer-reviewed articles and the impact factor are generally accepted measures of scientific impact. Web 2.0 tools such as Twitter, blogs or social bookmarking tools provide the possibility to construct innovative article-level or journal-level metrics to gauge impact and influence. However, the relationship of the these new metrics to traditional metrics such as citations is not known. Objective (1) To explore the feasibility of measuring social impact of and public attention to scholarly articles by analyzing buzz in social media, (2) to explore the dynamics, content, and timing of tweets relative to the publication of a scholarly article, and (3) to explore whether these metrics are sensitive and specific enough to predict highly cited articles. Methods Between July 2008 and November 2011, all tweets containing links to articles in the Journal of Medical Internet Research (JMIR) were mined. For a subset of 1573 tweets about 55 articles published between issues 3/2009 and 2/2010, different metrics of social media impact were calculated and compared against subsequent citation data from Scopus and Google Scholar 17 to 29 months later. A heuristic to predict the top-cited articles in each issue through tweet metrics was validated. Results A total of 4208 tweets cited 286 distinct JMIR articles. The distribution of tweets over the first 30 days after article publication followed a power law (Zipf, Bradford, or Pareto distribution), with most tweets sent on the day when an article was published (1458/3318, 43.94% of all tweets in a 60-day period) or on the following day (528/3318, 15.9%), followed by a rapid decay. The Pearson correlations between tweetations and citations were moderate and statistically significant, with correlation coefficients ranging from .42 to .72 for the log-transformed Google Scholar citations, but were less clear for Scopus citations and rank correlations. A linear multivariate model with time and tweets as significant predictors (P < .001) could explain 27% of the variation of citations. Highly tweeted articles were 11 times more likely to be highly cited than less-tweeted articles (9/12 or 75% of highly tweeted article were highly cited, while only 3/43 or 7% of less-tweeted articles were highly cited; rate ratio 0.75/0.07 = 10.75, 95% confidence interval, 3.4–33.6). Top-cited articles can be predicted from top-tweeted articles with 93% specificity and 75% sensitivity. Conclusions Tweets can predict highly cited articles within the first 3 days of article publication. Social media activity either increases citations or reflects the underlying qualities of the article that also predict citations, but the true use of these metrics is to measure the distinct concept of social impact. Social impact measures based on tweets are proposed to complement traditional citation metrics. The proposed twimpact factor may be a useful and timely metric to measure uptake of research findings and to filter research findings resonating with the public in real time.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations

              Author summary Subjective assessments of the merit and likely impact of scientific publications are routinely made by scientists during their own research, and as part of promotion, appointment, and government committees. Using two large datasets in which scientists have made qualitative assessments of scientific merit, we show that scientists are poor at judging scientific merit and the likely impact of a paper, and that their judgment is strongly influenced by the journal in which the paper is published. We also demonstrate that the number of citations a paper accumulates is a poor measure of merit and we argue that although it is likely to be poor, the impact factor, of the journal in which a paper is published, may be the best measure of scientific merit currently available. Introduction How should we assess the merit of a scientific publication? Is the judgment of a well-informed scientist better than the impact factor (IF) of the journal the paper is published in, or the number of citations that a paper receives? These are important questions that have a bearing upon both individual careers and university departments. They are also critical to governments. Several countries, including the United Kingdom, Canada, and Australia, attempt to assess the merit of the research being produced by scientists and universities and then allocate funds according to performance. In the United Kingdom, this process was known until recently as the Research Assessment Exercise (RAE) (www.rae.ac.uk); it has now been rebranded the Research Excellence Framework (REF) (www.ref.ac.uk). The RAE was first performed in 1986 and has been repeated six times at roughly 5-yearly intervals. Although, the detailed structure of these exercises has varied, they have all relied, to a large extent, on the subjective assessment of scientific publications by a panel of experts. In a recent attempt to investigate how good scientists are at assessing the merit and impact of a scientific paper, Allen et al. [1] asked a panel of experts to rate 716 biomedical papers, which were the outcome of research funded, at least in part, by the Wellcome Trust (WT). They found that the level of agreement between experts was low, but that rater score was moderately correlated to the number of citations the paper had obtained 3 years after publication. However, they also found that the assessor score was more strongly correlated to the IF of the journal in which the paper was published than to the number of citations; it was therefore possible that the correlation between assessor scores, and between assessor scores and the number of citations was a consequence of assessors rating papers in high profile journals more highly, rather than an ability of assessors to judge the intrinsic merit or likely impact of a paper. Subsequently, Wardle [2] has assessed the reliability of post-publication subjective assessments of scientific publications using the Faculty of 1000 (F1000) database. In the F1000 database, a panel of experts is encouraged to select and recommend the most important research papers from biology and medicine to subscribers of the database. Papers in the F1000 database are rated “recommended,” “must read,” or “exceptional.” He showed, amongst ecological papers, that selected papers were cited more often than non-selected papers, and that papers rated must read or exceptional garnered more citations than those rated recommended. However, the differences were small; the average numbers of citations for non-selected, recommended, and must read/exceptional were 21.6, 30.9, and 37.5, respectively. Furthermore, he noted that F1000 faculty had failed to recommend any of the 12 most heavily cited papers from the year 2005. Nevertheless there is a good correlation between rates of article citation and subjective assessments of research merit at an institutional level for some subjects, including most sciences [3]. The RAE and similar procedures are time consuming and expensive. The last RAE, conducted in 2008, cost the British government £12 million to perform [4], and universities an additional £47 million to prepare their submissions [5]. This has led to the suggestion that it might be better to measure the merit of science using bibliometric methods, either by rating the merit of a paper by the IF of the journal in which it is published, or directly through the number of citations a paper receives [6]. Here we investigate three methods of assessing the merit of a scientific publication: subjective post-publication peer review, the number of citations a paper accrues, and the IF. We do not attempt to define merit rigorously; it is simply the qualities in a paper that lead a scientist to rate a paper highly; it is likely that this largely depends upon the perceived importance of the paper. We also largely restrict our analysis to the assessment of merit rather than impact; for example, as we show below, the number of citations, which is a measure of impact, is a very poor measure of the underlying merit of the science, because the accumulation of citations is highly stochastic. We have considered the IF, rather than other measures of journal impact, of which there are many (see [7] for list of 39 measures), because it is simple and widely used. Results Datasets To investigate methods of assessing scientific merit we used two datasets [8] in which the merit of a scientific publication had been subjectively assessed by a panel of experts: (i) 716 papers from the WT dataset mentioned in the introduction, each of which had been scored by two assessors and which had been published in 2005, and (ii) 5,811 papers, also published in 2005, from the F1000 database, 1,328 of which had been assessed by more than one assessor. For each of these papers we collated citation information ∼6 years after publication. We also obtained the IF of the journal in which the paper had been published (further details in the Materials and Methods). The datasets have strengths and weaknesses. The F1000 dataset is considerably larger than the WT dataset, but it is papers that the assessors considered good enough to be featured in F1000; the papers therefore probably represent a narrower range of merit than in the WT dataset. Furthermore, the scores of two assessors are not independent in the F1000 dataset because the second assessor might have known the score of the first assessor, and F1000 scores have the potential to affect rates of citation, whereas the WT assessments were independent and confidential. The papers in both datasets are drawn from a diverse set of journals covering a broad range of IFs (Figure 1). Perhaps not surprisingly the F1000 data tend to be drawn from journals with higher IF, because they have been chosen by the assessors for inclusion in the F1000 database (Mean IF: WT = 6.6; F1000 = 13.9). 10.1371/journal.pbio.1001675.g001 Figure 1 The distribution of the impact factor in the two datasets. Subjective Assessment of Merit If scientists are good at assessing the merit of a scientific publication, and they agree on what merit is, then there should be a good level of agreement between assessors. Indeed assessors gave the same score in 47% and 50% of cases in the WT and F1000 datasets, respectively (Tables 1 and 2). However, we would have expected them to agree 40% of the time by chance alone in both datasets, so the excess agreement above these expectations is small. The correlations between assessor scores are correspondingly modest (WT r = 0.36, p 20 compared to those with IF 20, respectively. If we remove the influence of IF upon assessor score, the correlations between assessor scores drop below 0.2 (partial correlations between assessor scores controlling for IF: WT, r = 0.15, p 30 in the F1000 dataset. Number of Citations An alternative to the subjective assessment of scientific merit is the use of bibliometric measures such as the IF of the journal in which the paper is published or the number of citations the paper receives. The number of citations a paper accumulates is likely to be subject to random fluctuation—two papers of similar merit will not accrue the same number of citations even if they are published in similar journals. We can infer the relative error variance associated with this process as follows. Let us assume that the number of citations within a journal is due to the intrinsic merit of the paper plus some error. The correlation between assessor score and the number of citations is therefore expected to be where and is the error variance associated with the accumulation of citations (see Materials and Methods for derivation). Hence we can estimate the error variance associated with the accumulation of citations relative to variance in merit by simultaneously considering the correlation between assessor scores and the correlation between assessor scores and the number of citations. If we assume that assessors and the number of citations are unaffected by the IF of the journal, then we estimate the ratio of the error variance associated with citations to be approximately 1.5 times the variance in merit (WT rc  = 1.5 [0.83–2.7]; F1000 rc  = 1.6 [0.86–2.6]) and if we assume that the correlation between assessor score and IF is entirely due to bias then we estimate, using the partial correlation between score and citations, controlling for IF, that the ratio of the error variance to the variance in merit within journals to be greater than 5-fold (WT rc  = 5.6 [1.2–42]; F1000 rc  = 9.8 [4.0–31]). These estimates underestimate the error variance because they do not take into account the variance associated with which journal a paper gets published in; the stochasticity associated with this process will generate additional variance in the number of citations a paper accumulates if the journal affects the number of citations a paper receives, as analyses of duplicate papers suggest [9]–[11]. Impact Factor The IF might potentially be a better measure of merit than either a post-publication assessment or the number of citations, since several individuals are typically involved in a decision to publish, so the error variance associated with their combined assessment should be lower than that associated with the number of citations; although such benefits can be partially undermined by having a single individual determine whether a manuscript should be reviewed or by rejecting manuscripts if one review is unsupportive. Unfortunately, it seems likely that the IF will also be subject to considerable error. If we combine n independent assessments we expect the ratio of the error variance to the variance in merit in their combined qualitative assessment to be reduced by a factor n. Hence, if we assume that pre-publication assessments are of similar quality to post-publication assessments, and that three individuals have equal influence over the decision to publish a paper, their combined assessment is still likely to be dominated by error not merit; e.g., if we average the estimates of rs from the correlation between scores and between scores controlling for IF we have  = 3.7 and 3.9, for the WT and F1000 datasets, respectively, which means that the ratio of the error variance associated with the combined assessor score will be ∼1.2× the variance in merit; i.e., the error variance is still larger than the variance in merit. Discussion Our results have some important implications for the assessment of science. We have shown that scientists are poor at estimating the merit of a scientific publication; their assessments are error prone and biased by the journal in which the paper is published. In addition, subjective assessments are expensive and time-consuming. Scientists are also poor at predicting the future impact of a paper, as measured by the number of citations a paper accumulates. This appears to be due to two factors; scientists are not good at assessing merit and the accumulation of citations is a highly stochastic process, such that two papers of similar merit can accumulate very different numbers of citations just by chance. The IF and the number of citations are also likely to be poor measures of merit, though they may be better measures of impact. The number of citations is a poor measure of merit for two reasons. First, the accumulation of citations is a highly stochastic process, so the number of citations is only poorly correlated to merit. It has previously been suggested that the error variance associated with the accumulation of citations is small based on the strong correlation between the number of citations in successive years [12], but such an analysis does not take into account the influence that citations have on subsequent levels of citation—the citations in successive years are not independent. Second, as others have shown, the number of citations is strongly affected by the journal in which the paper is published [9]–[11]. There are also additional problems associated with using the number of citations as a measure of merit since it is influenced by factors such as the geographic origin of the authors [13],[14], whether they are English speaking [14],[15], and the gender of the authors [16],[17] (though see [15]). The problems of using the number of citations as a measure of merit are also likely to affect other article level metrics such as downloads and social network activity. The IF is likely to be poor because it is based on subjective assessment, although it does have the benefit of being a pre-publication assessment, and hence not influenced by the journal in which the paper has been published. In fact, given that the scientific community has already made an assessment of a paper's merit in deciding where it should be published, it seems odd to suggest that we could do better with post-publication assessment. Post-publication assessment cannot hope to be better than pre-publication assessment unless more individuals are involved in making the assessment, and even then it seems difficult to avoid the bias in favour of papers published in high-ranking journals that seems to pervade our assessments. However, the correlation between merit and IF is likely to be far from perfect. In fact the available evidence suggests there is little correlation between merit and IF, at least amongst low IF journals. The IF depends upon two factors, the merit of the papers being published by the journal and the effect that the journal has on the number of citations for a given level of merit. In the most extensive analysis of its kind, Lariviere and Gingras [11] analysed 4,532 cases in which the same paper had been published in two different journals; on average the two journals differed by 2.4-fold in their IFs and the papers differed 1.9-fold in the number of citations they had accumulated, suggesting that the higher IF journals in their analysis had gained their higher IF largely through positive feedback, not by publishing better papers. However, the mean IF of the journals in this study was less than one, and it seems unlikely that the IF is entirely a function of positive feedback amongst higher IF journals. Nevertheless the tendency for journals to affect the number of citations a paper receives means that IFs are NOT a quantitative measure of merit; a paper published in a journal with an IF of 30 is not on average six times better than one published in a journal with an IF of 5. The IF has a number of additional benefits over subjective post-publication review and the number of citations as measures of merit. First, it is transparent. Second, it removes the difficult task of determining which papers should be selected for submission to an assessment exercise such as the RAE or REF; is it better to submit a paper in a high IF journal, a paper that has been highly cited, even if it appears in a low IF journal, or a paper that the submitter believes is their best work? Third, it is relatively cheap to implement. And fourth it is an instantaneous measure of merit. The use of IF as a measure merit is unpopular with many scientists, a dissatisfaction that has recently found its voice in the San Francisco Declaration of Research Assessment (DORA) (http://am.ascb.org/dora/). The declaration urges institutions, funding bodies, and governments to avoid using journal level metrics, such as the IF, to assess the merit of scientific papers. Instead it promotes the use of subjective review and article level metrics. However, as we have shown, both subjective post-publication review and the number of citations, an example of an article level metric, are highly error prone measures of merit. Furthermore, the declaration fails to appreciate that journal level metrics are a form of pre-publication subjective review. It has been argued that the IF is a poor measure of merit because the variation in the number of citations, accumulated by papers published in the same journal, is large [9],[18]; the IF is therefore unrepresentative of the number of citations that individual papers accumulate. However, as we have shown the accumulation of citations is highly stochastic, so we would expect a large variance in the number of citations even if the IF were a perfect measure of merit. There are however many problems with using the IF besides the error associated with the assessment. The IF is influenced by the type of papers that are published and with the way in which the IF is calculated [18],[19]. Furthermore it clearly needs to be standardized across fields. A possible solution to these problems may be to get leading scientists to rank the journals in their field, and to use these ranks as a measure of merit, rather than the IF. Finally, possibly the biggest problem with the IF is simply our reaction to it; we have a tendency to overrate papers published in high IF journals. So if are to use the IF, we need to reduce this tendency; one approach might be to rank all papers by their IF and assign scores by rank. The REF will be performed in the United Kingdom next year in 2014. The assessment of publications forms the largest component of this exercise. This will be done by subjective post-publication review, with citation information being provided to some panels. However, as we have shown, both subjective review and the number of citations are very error prone measures of merit, so it seems likely that these assessments will also be extremely error prone, particularly given the volume of assessments that need to be made. For example, sub-panel 14 in the 2008 version of the RAE assessed ∼9,000 research outputs, each of which was assessed by two members of a 19 person panel; therefore each panel member assessed an average of just under 1,000 papers within a few months. We have also shown that assessors tend to overrate science in high IF journals, and although the REF [20], like the RAE before it [21], contains a stipulation that the journal of publication should not be taken into account in making an assessment, it is unclear whether this is possible. In our research we have not been able to address another potential problem for a process such as the REF. It seems very likely that assessors will differ in their mean score—some assessors will tend to give higher scores than other assessors. This could potentially affect the overall score for a department, particularly if the department is small and its outputs scored by relatively few assessors. The REF actually represents an unrivalled opportunity to investigate the assessment of scientific research and to assess the quality of the data produced by such an exercise. We would therefore encourage the REF to have all components of every submission assessed by two independent assessors and then investigate how strongly these are correlated and whether some assessors score more generously than others. Only then can we determine how reliable the data are. In summary, we have shown that none of the measures of scientific merit that we have investigated are reliable. In particular subjective peer review is error prone, biased, and expensive; we must therefore question whether using peer review in exercises such as the RAE and the REF is worth the huge amount of resources spent on them. Ultimately the only way to obtain (a largely) unbiased estimate of merit is to have pre-publication assessment, by several independent assessors, of manuscripts devoid of author's names and addresses. Nevertheless this will be a noisy estimate of merit unless we are prepared to engage many reviewers for each paper. Materials and Methods We compiled subjective assessments from two sources. The largest of these datasets was from the F1000 database (www.F1000.com). In the F1000 database a panel of experts selects and recommends papers from biology and medicine to subscribers of the database. Papers in the F1000 database are rated “recommended” (numerical score 6), “must read” (8), or “exceptional” (10). We chose to take all papers that been published in a single year, 2005; this was judged to be sufficiently recent to reflect current trends and biases in publishing, but sufficiently long ago to allow substantial numbers of citations to have accumulated. We restricted our analysis to those papers that had been assessed within 12 months of publication to minimize the influence that subsequent discussion and citation might have on the assessment. This gave us a dataset of 5,811 papers, with 1,328 papers having been assessed by two or more assessors within 12 months. We chose to consider the 5-year IFs, since it was over a similar time-scale to the period over which we collected citations. However, in our dataset the 2-year and 5-year IFs are very highly correlated (r = 0.99). Citations were obtained from Google Scholar in 2011. We also analysed the WT data collected by Allen et al. [1]. This is a dataset of 716 biomedical papers, which were published in 2005, and assessed within 6 months by two assessors. Papers were given scores of 4, landmark; 3, major addition to knowledge; 2, useful step forward; and 1, for the record. The scores were sorted such that the higher score was usually allocated to the first assessor; this will affect the correlations by reducing the variance within the first (and second) assessor scores. As a consequence the scores were randomly re-allocated to the first and second assessor. Citations were collated from Google Scholar in 2011. As with the F1000 data we used 5 year IFs from 2010. Data have been deposited with Dryad [8]. Because most journals are poorly represented in each dataset we estimated the within and between journal variance in the number of citations as follows. We rounded the IF to the nearest integer then grouped journals according to the integer value. We then performed ANOVA on those groups for which we had ten or more publications. Estimates of the error variance in assessment relative to variance in merit can be estimated as follows. Let us assume that the score (s) given by an assessor is linearly dependent upon the merit (m) and some error (es ): s = m+es . Let the variance in merit be and that for the error be , so the variance in the score is . If two assessors score the same paper the covariance between their scores will simply be and the hence the correlation between scores is (1) where . If we similarly assume that the number of citations a paper accumulates depends linearly on the merit and some error (with variance ) then the covariance between an assessor's score and the number of citations is and the correlation is (2) where . It is therefore straightforward to estimate rs and rc , and to obtain confidence intervals by bootstrapping the data. Supporting Information Table S1 The correlations, partial correlations, and standardized regression coefficients between assessor score (AS) and IF and the number of citations (CIT). ***p<0.001. (DOCX) Click here for additional data file. Table S2 Spearman correlation coefficients between assessor scores and assessor scores and the number of citations and the IF. ***p<0.001. (DOCX) Click here for additional data file. Table S3 The correlations, partial correlations, and standardized regression coefficients between assessor score (AS) and the log of IF and the log of the number of citations (CIT). ***p<0.001. (DOCX) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Journal
                Aesthetic Surgery Journal
                Oxford University Press (OUP)
                1090-820X
                1527-330X
                November 2020
                October 24 2020
                June 07 2020
                November 2020
                October 24 2020
                June 07 2020
                : 40
                : 11
                : NP628-NP635
                Affiliations
                [1 ]Division of Plastic Surgery, Mayo Clinic, Rochester, MN
                [2 ]McGovern Medical School, Houston, TX
                [3 ]Department of Surgery, Mayo Clinic, Rochester, MN
                Article
                10.1093/asj/sjaa158
                32506129
                fe6c53c6-d71f-432e-981a-f925d7487e2b
                © 2020

                https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

                History

                Comments

                Comment on this article