26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Hundred top-cited articles focusing on acute kidney injury: a bibliometric analysis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Acute kidney injury (AKI) is a major global health issue, associated with poor short-term and long-term outcomes. Research on AKI is increasing with numerous articles published. However, the quantity and quality of research production in the field of AKI is unclear.

          Methods and analysis

          To analyse the characteristics of the most cited articles on AKI and to provide information about achievements and developments in AKI, we searched the Science Citation Index Expanded for citations of AKI articles. For the top 100 most frequently cited articles (T100), we evaluated the number of citations, publication time, province of origin, journal, impact factor, topic or subspecialty of the research, and publication type.

          Results

          The T100 articles ranged from a maximum of 1971 citations to a minimum of 215 citations (median 302 citations). T100 articles were published from 1951 to 2011, with most articles published in the 2000s (n=77), especially the 5-year period from 2002 to 2006 (n=51). The publications appeared in 30 journals, predominantly in the general medical journals, led by New England Journal of Medicine (n=17), followed by expert medical journals, led by the Journal of the American Society of Nephrology (n=16) and Kidney International (n=16). The majority (83.7%) of T100 articles were published by teams involving ≥3 authors. T100 articles originated from 15 countries, led by the USA (n=81) followed by Italy (n=9). Among the T100 articles, 69 were clinical research, 25 were basic science, 21 were reviews, 5 were meta-analyses and 3 were clinical guidelines. Most clinical articles (55%) included patients with any cause of AKI, followed by the specific causes of contrast-induced AKI (25%) and cardiac surgery-induced AKI (15%).

          Conclusions

          This study provides a historical perspective on the scientific progress on AKI, and highlights areas of research requiring further investigations and developments.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: not found
          • Article: not found

          The history and meaning of the journal impact factor.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A European Renal Best Practice (ERBP) position statement on the Kidney Disease Improving Global Outcomes (KDIGO) Clinical Practice Guidelines on Acute Kidney Injury: Part 1: definitions, conservative management and contrast-induced nephropathy†

            Introduction The broad clinical syndrome of acute kidney injury (AKI) encompasses various aetiologies, including specific kidney diseases (e.g. acute interstitial nephritis), non-specific conditions (e.g. renal ischaemia) as well as extrarenal pathology (e.g. post-renal obstruction). AKI is a serious condition that affects kidney structure and function acutely, but also in the long term. Recent epidemiological evidence supports the notion that even mild, reversible AKI conveys the risk of persistent tissue damage, and severe AKI can be accompanied by an irreversible decline of kidney function and progression to end-stage kidney failure [1–3]. The Kidney Disease Improving Global Outcomes (KDIGO) Clinical Practice Guidelines for AKI [4] were designed to systematically compile information on this topic by experts in the field. These guidelines are based on the systematic review of relevant trials published before February 2011. Nevertheless, for many sections of the guidelines, appropriate supporting evidence is lacking in the literature. As a consequence, variations in practice will inevitably occur when clinicians take into account the needs of individual patients, available resources and limitations unique to a region, an institution or type of practice. Therefore, in line with its philosophy [5], the European Renal Best Practice (ERBP) wanted to issue a position statement on these guidelines. A working group was established to produce guidance from the European nephrology perspective, based on the compiled evidence as presented, with an update of the literature up to March 2012, following the methodology as explained in the ERBP instructions to authors [6]. The present document will deal with the diagnosis and prevention of AKI, and contrast-induced nephropathy (CIN) (Sections 1–4 of the KDIGO document), and other chapters will be discussed in a separate position statement. As a general rule, we will only mention those guideline statements of the KDIGO document that we have amended, even when the change is small. If a KDIGO recommendation is not repeated, it can be considered as endorsed by ERBP as is, unless specifically stated otherwise. 1: AKI definition 1.1: Definition and classification of AKI 1.1.1 We recommend using a uniform definition of AKI, based on urinary output and on changes in serum creatinine (SCr) level. It is important that both criteria are taken into account. (1C) 1.1.2 We recommend diagnosing and indicating the severity of AKI according to the criteria in the table below: (ungraded statement) Stage 1: one of the following: Serum creatinine increased 1.5–1.9 times baseline Serum creatinine increase >0.3mg/dl (26.5 µmol/l) Urinary ouotput 3 times baseline Serum creatinine increases to >4.0mg/dl (353 µmol/l) Initiation of renal replacement therapy Urinary output 210 mg/dL) in the critically ill. However, this was a single-centre trial, and in a larger randomized multicentre trial of intensive versus conventional insulin therapy, the NICE-SUGAR trial [36], a blood glucose target of 81–108 mg/dL resulted in higher mortality than a target of <180 mg/dL, without any benefit in preventing or improving AKI. The same study also confirmed previous findings of increased incidence of hypoglycaemia, and the associated risk of death, when targeting low glycaemia levels. In two recent meta-analyses of trials on intensive versus conventional glycaemic control, pooled relative risk of death with intensive insulin therapy was only slightly lower, whereas relative risk of hypoglycaemia was much higher [37]. Lowering glycaemia could thus potentially be beneficial, but this small benefit is easily offset by the much higher risk of hypoglycaemia [38]. Overall, these data do not support the use of intensive insulin therapy aiming to control plasma glucose at 110 mg/dL or lower in critically ill patients as a general rule. On the other hand, it cannot be denied that insulin therapy for preventing severe hyperglycaemia is beneficial. Based on these considerations, ERBP suggests keeping glycaemia between 110 and 180 mg/dL. We strongly recommend regular control of glycaemia, with appropriate instructions on what action should be undertaken based on the result of a certain glycaemic value, when insulin therapy is initiated. In epidemiological studies, protein–calorie malnutrition is an important independent predictor of in-hospital mortality in patients with AKI, but very few systematic studies have assessed the impact of nutrition on clinical end points. Recommendations are therefore largely based on expert opinion. There is no evidence to support that giving proteins can invert the catabolic process in patients with AKI. According to ERBP, no meaningful guidance can be provided. As such, the ERBP group does not endorse the KDIGO statements relating to administration of proteins. As there is no proven benefit of administering high quantities of protein to patients with AKI, initiating high-volume CRRT with the sole aim to remove extra uraemic waste products resulting from high protein loading, cannot be recommended. Several RCT's have demonstrated the beneficial effect of providing enteral versus parenteral nutrition in different conditions as soon as possible in ICU patients [39, 40]. A recent large RCT indicated that early initiation of parenteral nutrition in patients not meeting the recommended caloric intakes by enteral feeding leads to higher mortality rates and longer ICU stay [41]. Although these studies have mostly not reported patients with AKI as a separate subgroup, there is no reason to believe that results would be different in this patient group. As parenteral feeding seems not to improve outcomes in a general ICU population, and as parenteral feeding can lead to accumulation of uraemic waste products and increased fluid loading, and thus ultrafiltration need, and, in AKI patients, it should only be used cautiously. 2.2.3 The use of diuretics in AKI. 2.2.3.1 We recommend diuretics should not be used to prevent AKI. (1B) 2.2.3.2 We suggest not using diuretics to increase urinary volume in established AKI, except for the management of volume overload. (2C) Rationale Since fluid retention is one of the major symptoms of impaired kidney function, diuretics are often used for patients with or developing AKI. Mostly, loop diuretics such as furosemide are administered to patients with AKI to convert oliguric to non-oliguric AKI, and to facilitate fluid management. However, some reports have indicated that the use of diuretics is associated with harmful effects maybe because circulating volume is reduced excessively, thereby worsening renal haemodynamics. The use of diuretics can also delay the recognition of AKI and nephrology consultation [42]. In meta-analyses, the use of furosemide was not associated with any significant clinical benefits in the prevention and treatment of AKI in adults, and high doses were associated with an increased risk of ototoxicity [43, 44]. The ERBP work group therefore endorses both recommendations on the use of diuretics in patients with AKI. 2.2.4 Pharmacological interventions. 2.2.4.1 We recommend low-dose dopamine should not be used to prevent or treat AKI. (1A) 2.2.4.2 We do not recommend using fenoldopam to prevent or treat AKI. (1C) 2.2.4.3 We do not recommend using atrial natriuretic peptide (ANP) to prevent (1C) or treat (1B) AKI. 2.2.4.4 We do not recommend using recombinant human (rh)IGF-1 to prevent or treat AKI. (1B) Rationale With multiple negative studies, including a randomized, double-blind, placebo-controlled trial of adequate size and power, ‘low-dose’ (1–3 mg/kg/min) dopamine has been abandoned for the prevention and treatment of AKI [32]. Smaller clinical studies have reported a potentially beneficial effect (prevention of need for RRT) of fenoldopam, a pure dopamine Type-1 receptor agonist, in patients with established AKI after cardiothoracic surgery [45], but larger trials are lacking. In contrast, results on the use of fenoldopam for the prevention of AKI were not positive. Taken together, no data from adequately powered multicentre trials with clinically significant end points and adequate safety are available to recommend fenoldopam to either prevent or treat AKI. In addition, concerns about a potentially harmful dose-dependent hypotensive action, and about the high cost remain. Also, the beneficial impact of norepinephrine on mortality and AKI is well established [30] in these conditions, and should remain as first-line therapy, also in the function of its low cost. As a consequence, ERBP does not recommend the use of fenoldopam. There are no trials to support the use of ANP, urodilatin and brain natriuretic peptide (BNP—nesiritide), for prevention or treatment of AKI. In view of the paucity of robust data from large intervention trials, and the fact that all substances may induce serious adverse effects such as hypotension and arrhythmias, the ERBP group considers that their use cannot be recommended. The list of substances tested in the setting of experimental and clinical AKI is long, and among them are recombinant human insulin-like growth factor-1 (IGF-1) and recombinant human erythropoietin. As with many other agents, clinical studies on IGF-1 were disappointing. Under these circumstances, the ERBP feels that their use cannot be recommended until proof of a beneficial effect is provided. 2.2.5 Prevention of aminoglycoside- and amphotericin-related AKI. 2.2.5.1 We suggest not using more than one shot of aminoglycosides for the treatment of infections unless no suitable, less nephrotoxic, therapeutic alternatives are available. (2A) 2.2.5.2 We recommend that, in patients with normal kidney function in steady state, aminoglycosides are administered as a single-dose daily rather than multiple-dose daily treatment regimens. (1B) An exception to this recommendation can be patients with endocarditis, where inconsistent evidence on non-inferiority of single versus multiple daily dosing is reported. (1D) 2.2.5.3 We recommend monitoring aminoglycoside drug levels when treatment with multiple daily dosing is used for more than 24h. (1A) 2.2.5.4 We suggest monitoring aminoglycoside drug levels when treatment with single-daily dosing is used for more than 48h. (2C) 2.2.5.5 We suggest using topical or local applications of aminoglycosides (e.g. respiratory aerosols, instilled antibiotic beads), rather than intravenous (i.v.) application, when feasible and suitable. (2B) 2.2.5.6 We recommend that patients receiving whatever formulation of amphotericin B should receive adequate sodium loading and potassium suppletion (1B). We suggest balancing the presumed lower nephrotoxicity of lipid formulations against their higher cost. (2D) 2.2.5.7 We suggest balancing the need for adequate antimycotic treatment against the potential risk of nephrotoxicity in selecting the most suitable antimycotic agent. (Ungraded statement) Rationale Aminoglycosides are highly potent, bactericidal antibiotics. They have many favourable attributes, including their remarkable stability, predictable pharmacokinetics, low incidence of immunologically mediated side effects and lack of haematologic or hepatic toxicity. Although nephrotoxicity, and ototoxicity, remain major concerns, these events appear to be due to cumulative exposure, and their occurrence after single shot administration is exceptional. On the other hand, due to their potent bactericidal activity, aminoglycosides can help to reverse sepsis-related haemodynamic instability, and thus risk for AKI. In the light of recent developments with progressive antimicrobial resistance to a number of other classes of agents, aminoglycosides remain useful antibiotics. In this perspective, ERBP does not object to the use of aminoglycosides as a single-shot administration in certain conditions. However, careful dosing and therapeutic drug monitoring should be applied to mitigate the risk of AKI with these antibiotics when more than one dose is administered. We recommend that they should be used for as short a period of time as possible. There are several approaches to avoid nephrotoxicity of amphotericin B in patients at risk. In the opinion of ERBP, the KDIGO guideline has focused too little attention to sodium loading as a potential nephroprotective strategy. Although there is no hard evidence to support the protective effect of sodium loading, the cost is low, and therefore ERBP recommends that it should be implemented in all patients receiving any formulation of amphotericin B. Numerous studies with lipid formulations of this drug have been published. However, a well-performed review on the topic pointed to the high risk of bias in these studies, making the conclusions rather weak [46]. The ERBP believes that there is insufficient evidence to recommend the use of the lipid formulations of amphotericin B as being clearly superior to the conventional formulation. Another approach to prevent amphotericin B nephrotoxicity is to use alternative agents, such as the azoles (voriconazole, fluconazole, itraconazole and posaconazole) and echinocandins (caspofungin, anidulafungin and micafungin). Although these agents have clearly a better record with regard to nephrotoxicity, there is the potential of hepatotoxicity, and there is uncertainty on the therapeutic equivalence. A Cochrane review [47] pointed to substantial biases in the RCT's dealing with this question. In this setting, ERBP believes that the recommendation as issued by KDIGO is too strong, ambivalent and not supported by the evidence. The ERBP workgroup judged that azoles and echinocandines can be used in low-grade infections, but that their role in life threatening infections is unclear, and that in these conditions, the risk of AKI should not outweigh the risk of death by uncontrolled infection. 3. Contrast-induced nephropathy Besides the KDIGO guidelines, many other bodies issued recommendations on the treatment and prevention of CIN. As early as 2007, a series of guidelines on the prevention of CIN in high-risk patients undergoing cardiovascular procedures were released [48], and in 2011, the European Society of Urogenital Radiology (ESUR) released their new guidelines on CIN [49]. 3.1 Definition, epidemiology and prognosis 3.1.1 We recommend that for CIN, the same definition and grading is used as for AKI (see 1.1). (Ungraded statement). 3.1.2 We recommend that before an intervention which encompasses a risk for CIN, a baseline serum creatinine should be determined. (Ungraded statement) 3.1.3 We suggest that in high-risk patients, a repeat serum creatinine is performed 12 and 72h after administration of contrast media. (2D) 3.1.4 We suggest not considering only CIN in individuals who develop changes in kidney function after administration of intravascular contrast media, but also other possible causes of AKI. (Not Graded) Rationale The ERBP work group is not aware of any pathophysiological or epidemiological reason why the definition and staging of CIN should be different from the general AKI definition. This definition is slightly different from the ESUR criteria [49] for contrast-induced nephropathy, which requires an increase in SCr by more than 25% or 44 µmol/L in the 3 days following intravascular administration of contrast medium (CM) in the absence of an alternative aetiology. Thus, many patients with an SCr increase ranging from 26.5 to 44 µmol/L following CM administration would be considered as presenting Stage 1 AKI but not as CI nephropathy. However, for the sake of clarity and uniformity, ERBP recommends to use the general AKI criteria. Remarkably, studies have also pointed out that in many hospitalized patients not receiving contrast, an increase in serum creatinine was observed [50]. As such, in patients who did receive contrast, one should be cautious to attribute AKI to the contrast, and other underlying causes for AKI should be explored. The moment when the repeat serum creatinine should be measured is a matter of debate. According to ESUR, it should be done in the 3 days following intravascular administration of CM. Some studies suggest that the peak of SCr could even occur later, especially in patients with diabetes and pre-existing CKD [51–56] which really underlines the need for an extended period of renal function survey. On the other hand, the percentage increase in serum creatinine from baseline after 12h showed a good prediction for later development of renal impairment [57]. The reliability of other renal function markers such as cystatin C should be further evaluated. On the other hand, the importance of urinary output for diagnosing CIN should be emphasized. 3.2.1 We recommend balancing the risk for CIN against the benefit of administering contrast. (Not Graded) 3.2.2 We recommend considering alternative imaging methods not requiring contrast administration in patients at increased risk for CIN, so long as these yield the same diagnostic accuracy. (Not Graded) Rationale Although these recommendations seem trivial, it is important to balance the potential risk of CIN against the potential gain of administering contrast in the clinical decision process. Risk for CIN increases with decreasing pre-existing GFR. A CIN Consensus Working Panel [58] agreed that CIN risk becomes clinically significant when the baseline SCr concentration is ≥1.3 mg/dL (≥115 mmol/L) in men and ≥1.0 mg/dL (≥88.4 mmol/L) in women, mostly equivalent to an eGFR <60 mL/min/1.73 m2. In light of more recent work [50], the ERBP work group agrees with KDIGO that this threshold could be lowered to 45 mL/min/1.73 m2. The risk of CIN also increased in the presence of diabetes, and dehydration. The risk may be lower when simple i.v. contrast is administered for imaging versus when contrast is used during an invasive intra-arterial procedure, where the risk of cholesterol embolization should also be taken into account [59]. It is unclear whether simple intra-arterial injection, e.g. digital subtraction angiography has a different risk from i.v. [60, 61]. The risk increases with the volume of contrast applied. There are no data available to know if the effect of repeated contrast administration is simply a consequence of the cumulative dosage of iodine, or whether repeated administrations are disproportionately more toxic than the administration of a certain volume of contrast in one shot. Another risk factor is the use of concurrent nephrotoxic medication: non-steroidal anti-inflammatory drugs, aminoglycosides, amphotericin B, high doses of loop diuretics and antiviral drugs like acyclovir and foscarnet, in particular. A special mention should be made on metformine, as accumulation of this drug in CIN can lead to dangerous situations. The ERBP group wants to point out that several drugs have a prolonged nephrotoxic action as a consequence of a long-lasting cellular accumulation in the kidney. In order to minimize the risk of kidney damage, these drugs would have to be stopped for days or even weeks, and not only hours, before contrast administration. The rationale for stopping loop diuretics is mainly based on their detrimental effect if used as pharmacological prevention against CIN [62]. Not only must loop diuretics be discontinued during and after contrast administration, but they should be stopped for as long as possible before the procedure in order to reduce the possibility of volume depletion. From this point of view, it is surprising to note that the possible detrimental effect of thiazide diuretics, which have a much longer action period, is almost never mentioned. It should be stressed that dehydration or any degree of volume depletion make medullary renal perfusion closely dependent of vasoactive hormones, and extremely sensitive to microvascular effects of intravascular contrast administration [63]. Apart from diuretics, clinical circumstances such as gastro-intestinal fluid losses may induce dehydration, and if possible it is wise to delay contrast administration until volume status has been corrected. To date, there is very little evidence on the detrimental effects of angiotensin-converting enzyme-inhibitor (ACE-I) concerning the renal risk of contrast administration. A randomized study showed a decreased incidence of CIN following the administration of captopril in diabetic patients undergoing coronary angiography [64], and more recently, it was observed that a captopril treatment stopped 36h before CM administration was neither associated with nor increased the risk of CIN in hydrated patients [65]. However, the risk associated with long-acting ACE-I and ARB is poorly defined and should be assessed through specific studies. Pharmacological prevention strategies of CIN 3.4.1 We recommend volume expansion with either isotonic sodium chloride or sodium bicarbonate solutions, rather than no volume expansion, in patients at increased risk for CIN. (1A) 3.4.2 We suggest using the oral route for hydration, on the premise that adequate intake of fluid and salt are assured. (2C) We suggest that, when oral intake of fluid and salt is deemed cumbersome in patients at increased risk of CIN, hydration should be performed by intravenous route. (2C) 3.4.3 We suggest using oral N-acetyl cysteine (NAC) only in patients who receive appropriate fluid and salt loading (2D). We recommend not using oral NAC as the only method for prevention of CIN. (1D) 3.4.4 We do not suggest using theophylline to prevent CIN. (2C) 3.4.5 We do not recommend using fenoldopam to prevent CIN. (1B) Rationale There is no doubt that before contrast media administration, adequate salt and fluid should be provided to prevent CIN The ERBP work group amended the statement on oral fluid loading by the KDIGO work group, as this was based on two small and relatively old studies, in which oral fluid intake did not confer the same degree of protection against CIN than i.v. fluid administration [66, 67]. However, a recent observational study showed a significant inverse correlation between the amount of oral fluid intake and the percentage changes in SCr as well as the absolute changes in eGFR in patients undergoing a coronary computed tomography angiography [68], and a prospective randomized trial comparing i.v. fluids with oral hydration with or without sodium bicarbonate found no differences in the incidence of CIN in patients with mild CKD [69]. It should be noted that the main difference between oral and i.v. fluid administration concerns not only the volume but the sodium content of the fluids as well [70]. In ambulatory patients, the i.v. route leads to a substantial increase in costs, and a risk for destruction of future vascular access. The ERBP work group accordingly does not recommend hospitalizing low-risk patients just for hydration. Most of the ambulatory patients have a relatively low risk for CIN, and in these patients, oral hydration should be recommended. When i.v. access is in place anyway, e.g. in hospitalized patients, the i.v. route can be used. NAC has a number of beneficial properties, including anti-oxidant functions and mediation of renal vasodilation, making it a suitable candidate to help prevent CIN. However, NAC has been the subject of a series of comprehensive reviews, and overall there appears to be insufficient evidence to support the universal use of NAC to prevent CIN despite its ease of administration [63]. It should be noted that in most trials reporting a benefit, NAC administration was associated with i.v. hydration using bicarbonate. Studies of NAC with bicarbonate administration have found a moderate benefit for this combination, compared with the combination of NAC–saline, and it is unclear in how far the benefit can be attributed to NAC per se. To date, 7 out of the 11 meta-analyses that have been published on this subject found a net benefit for NAC in the prevention of CIN [71]. NAC, however, has been reported to decrease SCr levels in normal volunteers with normal kidney function. This reduction in SCr was not accompanied by a change in serum cystatin C levels, suggesting an effect independent of a change in GFR, such as an increase in tubular secretion of creatinine or a decrease in creatinine production [72]. In conclusion, in view of its low costs and the high likelihood of absence of harm, there is no objection against oral NAC administration, but this should never replace adequate fluid loading. Effects of haemodialysis or haemofiltration 4.5.1: We do not recommend using prophylactic intermittent haemodialysis (IHD) or haemofiltration (HF) for the purpose of prevention of CIN only. (1C) Rationale The evidence collected by KDIGO demonstrates that IHD to prevent CIN in well pre-hydrated patients at risk is not effective, and that there is even a trend to more harm (more CIN, and more need for RRT) [73–75]. High-volume HF in this setting has been reported to be beneficial [76, 77]. The protocol used in these studies included HF at ICU, and with high volumes of bicarbonate fluid. It seems likely that under these conditions, the beneficial effects observed were due to volume expansion and loading with bicarbonate rather than to the removal of contrast media by the HF. In view of the high costs and logistical problems, the evidence seems too weak to recommend prophylactic HF at this moment.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations

              Author summary Subjective assessments of the merit and likely impact of scientific publications are routinely made by scientists during their own research, and as part of promotion, appointment, and government committees. Using two large datasets in which scientists have made qualitative assessments of scientific merit, we show that scientists are poor at judging scientific merit and the likely impact of a paper, and that their judgment is strongly influenced by the journal in which the paper is published. We also demonstrate that the number of citations a paper accumulates is a poor measure of merit and we argue that although it is likely to be poor, the impact factor, of the journal in which a paper is published, may be the best measure of scientific merit currently available. Introduction How should we assess the merit of a scientific publication? Is the judgment of a well-informed scientist better than the impact factor (IF) of the journal the paper is published in, or the number of citations that a paper receives? These are important questions that have a bearing upon both individual careers and university departments. They are also critical to governments. Several countries, including the United Kingdom, Canada, and Australia, attempt to assess the merit of the research being produced by scientists and universities and then allocate funds according to performance. In the United Kingdom, this process was known until recently as the Research Assessment Exercise (RAE) (www.rae.ac.uk); it has now been rebranded the Research Excellence Framework (REF) (www.ref.ac.uk). The RAE was first performed in 1986 and has been repeated six times at roughly 5-yearly intervals. Although, the detailed structure of these exercises has varied, they have all relied, to a large extent, on the subjective assessment of scientific publications by a panel of experts. In a recent attempt to investigate how good scientists are at assessing the merit and impact of a scientific paper, Allen et al. [1] asked a panel of experts to rate 716 biomedical papers, which were the outcome of research funded, at least in part, by the Wellcome Trust (WT). They found that the level of agreement between experts was low, but that rater score was moderately correlated to the number of citations the paper had obtained 3 years after publication. However, they also found that the assessor score was more strongly correlated to the IF of the journal in which the paper was published than to the number of citations; it was therefore possible that the correlation between assessor scores, and between assessor scores and the number of citations was a consequence of assessors rating papers in high profile journals more highly, rather than an ability of assessors to judge the intrinsic merit or likely impact of a paper. Subsequently, Wardle [2] has assessed the reliability of post-publication subjective assessments of scientific publications using the Faculty of 1000 (F1000) database. In the F1000 database, a panel of experts is encouraged to select and recommend the most important research papers from biology and medicine to subscribers of the database. Papers in the F1000 database are rated “recommended,” “must read,” or “exceptional.” He showed, amongst ecological papers, that selected papers were cited more often than non-selected papers, and that papers rated must read or exceptional garnered more citations than those rated recommended. However, the differences were small; the average numbers of citations for non-selected, recommended, and must read/exceptional were 21.6, 30.9, and 37.5, respectively. Furthermore, he noted that F1000 faculty had failed to recommend any of the 12 most heavily cited papers from the year 2005. Nevertheless there is a good correlation between rates of article citation and subjective assessments of research merit at an institutional level for some subjects, including most sciences [3]. The RAE and similar procedures are time consuming and expensive. The last RAE, conducted in 2008, cost the British government £12 million to perform [4], and universities an additional £47 million to prepare their submissions [5]. This has led to the suggestion that it might be better to measure the merit of science using bibliometric methods, either by rating the merit of a paper by the IF of the journal in which it is published, or directly through the number of citations a paper receives [6]. Here we investigate three methods of assessing the merit of a scientific publication: subjective post-publication peer review, the number of citations a paper accrues, and the IF. We do not attempt to define merit rigorously; it is simply the qualities in a paper that lead a scientist to rate a paper highly; it is likely that this largely depends upon the perceived importance of the paper. We also largely restrict our analysis to the assessment of merit rather than impact; for example, as we show below, the number of citations, which is a measure of impact, is a very poor measure of the underlying merit of the science, because the accumulation of citations is highly stochastic. We have considered the IF, rather than other measures of journal impact, of which there are many (see [7] for list of 39 measures), because it is simple and widely used. Results Datasets To investigate methods of assessing scientific merit we used two datasets [8] in which the merit of a scientific publication had been subjectively assessed by a panel of experts: (i) 716 papers from the WT dataset mentioned in the introduction, each of which had been scored by two assessors and which had been published in 2005, and (ii) 5,811 papers, also published in 2005, from the F1000 database, 1,328 of which had been assessed by more than one assessor. For each of these papers we collated citation information ∼6 years after publication. We also obtained the IF of the journal in which the paper had been published (further details in the Materials and Methods). The datasets have strengths and weaknesses. The F1000 dataset is considerably larger than the WT dataset, but it is papers that the assessors considered good enough to be featured in F1000; the papers therefore probably represent a narrower range of merit than in the WT dataset. Furthermore, the scores of two assessors are not independent in the F1000 dataset because the second assessor might have known the score of the first assessor, and F1000 scores have the potential to affect rates of citation, whereas the WT assessments were independent and confidential. The papers in both datasets are drawn from a diverse set of journals covering a broad range of IFs (Figure 1). Perhaps not surprisingly the F1000 data tend to be drawn from journals with higher IF, because they have been chosen by the assessors for inclusion in the F1000 database (Mean IF: WT = 6.6; F1000 = 13.9). 10.1371/journal.pbio.1001675.g001 Figure 1 The distribution of the impact factor in the two datasets. Subjective Assessment of Merit If scientists are good at assessing the merit of a scientific publication, and they agree on what merit is, then there should be a good level of agreement between assessors. Indeed assessors gave the same score in 47% and 50% of cases in the WT and F1000 datasets, respectively (Tables 1 and 2). However, we would have expected them to agree 40% of the time by chance alone in both datasets, so the excess agreement above these expectations is small. The correlations between assessor scores are correspondingly modest (WT r = 0.36, p 20 compared to those with IF 20, respectively. If we remove the influence of IF upon assessor score, the correlations between assessor scores drop below 0.2 (partial correlations between assessor scores controlling for IF: WT, r = 0.15, p 30 in the F1000 dataset. Number of Citations An alternative to the subjective assessment of scientific merit is the use of bibliometric measures such as the IF of the journal in which the paper is published or the number of citations the paper receives. The number of citations a paper accumulates is likely to be subject to random fluctuation—two papers of similar merit will not accrue the same number of citations even if they are published in similar journals. We can infer the relative error variance associated with this process as follows. Let us assume that the number of citations within a journal is due to the intrinsic merit of the paper plus some error. The correlation between assessor score and the number of citations is therefore expected to be where and is the error variance associated with the accumulation of citations (see Materials and Methods for derivation). Hence we can estimate the error variance associated with the accumulation of citations relative to variance in merit by simultaneously considering the correlation between assessor scores and the correlation between assessor scores and the number of citations. If we assume that assessors and the number of citations are unaffected by the IF of the journal, then we estimate the ratio of the error variance associated with citations to be approximately 1.5 times the variance in merit (WT rc  = 1.5 [0.83–2.7]; F1000 rc  = 1.6 [0.86–2.6]) and if we assume that the correlation between assessor score and IF is entirely due to bias then we estimate, using the partial correlation between score and citations, controlling for IF, that the ratio of the error variance to the variance in merit within journals to be greater than 5-fold (WT rc  = 5.6 [1.2–42]; F1000 rc  = 9.8 [4.0–31]). These estimates underestimate the error variance because they do not take into account the variance associated with which journal a paper gets published in; the stochasticity associated with this process will generate additional variance in the number of citations a paper accumulates if the journal affects the number of citations a paper receives, as analyses of duplicate papers suggest [9]–[11]. Impact Factor The IF might potentially be a better measure of merit than either a post-publication assessment or the number of citations, since several individuals are typically involved in a decision to publish, so the error variance associated with their combined assessment should be lower than that associated with the number of citations; although such benefits can be partially undermined by having a single individual determine whether a manuscript should be reviewed or by rejecting manuscripts if one review is unsupportive. Unfortunately, it seems likely that the IF will also be subject to considerable error. If we combine n independent assessments we expect the ratio of the error variance to the variance in merit in their combined qualitative assessment to be reduced by a factor n. Hence, if we assume that pre-publication assessments are of similar quality to post-publication assessments, and that three individuals have equal influence over the decision to publish a paper, their combined assessment is still likely to be dominated by error not merit; e.g., if we average the estimates of rs from the correlation between scores and between scores controlling for IF we have  = 3.7 and 3.9, for the WT and F1000 datasets, respectively, which means that the ratio of the error variance associated with the combined assessor score will be ∼1.2× the variance in merit; i.e., the error variance is still larger than the variance in merit. Discussion Our results have some important implications for the assessment of science. We have shown that scientists are poor at estimating the merit of a scientific publication; their assessments are error prone and biased by the journal in which the paper is published. In addition, subjective assessments are expensive and time-consuming. Scientists are also poor at predicting the future impact of a paper, as measured by the number of citations a paper accumulates. This appears to be due to two factors; scientists are not good at assessing merit and the accumulation of citations is a highly stochastic process, such that two papers of similar merit can accumulate very different numbers of citations just by chance. The IF and the number of citations are also likely to be poor measures of merit, though they may be better measures of impact. The number of citations is a poor measure of merit for two reasons. First, the accumulation of citations is a highly stochastic process, so the number of citations is only poorly correlated to merit. It has previously been suggested that the error variance associated with the accumulation of citations is small based on the strong correlation between the number of citations in successive years [12], but such an analysis does not take into account the influence that citations have on subsequent levels of citation—the citations in successive years are not independent. Second, as others have shown, the number of citations is strongly affected by the journal in which the paper is published [9]–[11]. There are also additional problems associated with using the number of citations as a measure of merit since it is influenced by factors such as the geographic origin of the authors [13],[14], whether they are English speaking [14],[15], and the gender of the authors [16],[17] (though see [15]). The problems of using the number of citations as a measure of merit are also likely to affect other article level metrics such as downloads and social network activity. The IF is likely to be poor because it is based on subjective assessment, although it does have the benefit of being a pre-publication assessment, and hence not influenced by the journal in which the paper has been published. In fact, given that the scientific community has already made an assessment of a paper's merit in deciding where it should be published, it seems odd to suggest that we could do better with post-publication assessment. Post-publication assessment cannot hope to be better than pre-publication assessment unless more individuals are involved in making the assessment, and even then it seems difficult to avoid the bias in favour of papers published in high-ranking journals that seems to pervade our assessments. However, the correlation between merit and IF is likely to be far from perfect. In fact the available evidence suggests there is little correlation between merit and IF, at least amongst low IF journals. The IF depends upon two factors, the merit of the papers being published by the journal and the effect that the journal has on the number of citations for a given level of merit. In the most extensive analysis of its kind, Lariviere and Gingras [11] analysed 4,532 cases in which the same paper had been published in two different journals; on average the two journals differed by 2.4-fold in their IFs and the papers differed 1.9-fold in the number of citations they had accumulated, suggesting that the higher IF journals in their analysis had gained their higher IF largely through positive feedback, not by publishing better papers. However, the mean IF of the journals in this study was less than one, and it seems unlikely that the IF is entirely a function of positive feedback amongst higher IF journals. Nevertheless the tendency for journals to affect the number of citations a paper receives means that IFs are NOT a quantitative measure of merit; a paper published in a journal with an IF of 30 is not on average six times better than one published in a journal with an IF of 5. The IF has a number of additional benefits over subjective post-publication review and the number of citations as measures of merit. First, it is transparent. Second, it removes the difficult task of determining which papers should be selected for submission to an assessment exercise such as the RAE or REF; is it better to submit a paper in a high IF journal, a paper that has been highly cited, even if it appears in a low IF journal, or a paper that the submitter believes is their best work? Third, it is relatively cheap to implement. And fourth it is an instantaneous measure of merit. The use of IF as a measure merit is unpopular with many scientists, a dissatisfaction that has recently found its voice in the San Francisco Declaration of Research Assessment (DORA) (http://am.ascb.org/dora/). The declaration urges institutions, funding bodies, and governments to avoid using journal level metrics, such as the IF, to assess the merit of scientific papers. Instead it promotes the use of subjective review and article level metrics. However, as we have shown, both subjective post-publication review and the number of citations, an example of an article level metric, are highly error prone measures of merit. Furthermore, the declaration fails to appreciate that journal level metrics are a form of pre-publication subjective review. It has been argued that the IF is a poor measure of merit because the variation in the number of citations, accumulated by papers published in the same journal, is large [9],[18]; the IF is therefore unrepresentative of the number of citations that individual papers accumulate. However, as we have shown the accumulation of citations is highly stochastic, so we would expect a large variance in the number of citations even if the IF were a perfect measure of merit. There are however many problems with using the IF besides the error associated with the assessment. The IF is influenced by the type of papers that are published and with the way in which the IF is calculated [18],[19]. Furthermore it clearly needs to be standardized across fields. A possible solution to these problems may be to get leading scientists to rank the journals in their field, and to use these ranks as a measure of merit, rather than the IF. Finally, possibly the biggest problem with the IF is simply our reaction to it; we have a tendency to overrate papers published in high IF journals. So if are to use the IF, we need to reduce this tendency; one approach might be to rank all papers by their IF and assign scores by rank. The REF will be performed in the United Kingdom next year in 2014. The assessment of publications forms the largest component of this exercise. This will be done by subjective post-publication review, with citation information being provided to some panels. However, as we have shown, both subjective review and the number of citations are very error prone measures of merit, so it seems likely that these assessments will also be extremely error prone, particularly given the volume of assessments that need to be made. For example, sub-panel 14 in the 2008 version of the RAE assessed ∼9,000 research outputs, each of which was assessed by two members of a 19 person panel; therefore each panel member assessed an average of just under 1,000 papers within a few months. We have also shown that assessors tend to overrate science in high IF journals, and although the REF [20], like the RAE before it [21], contains a stipulation that the journal of publication should not be taken into account in making an assessment, it is unclear whether this is possible. In our research we have not been able to address another potential problem for a process such as the REF. It seems very likely that assessors will differ in their mean score—some assessors will tend to give higher scores than other assessors. This could potentially affect the overall score for a department, particularly if the department is small and its outputs scored by relatively few assessors. The REF actually represents an unrivalled opportunity to investigate the assessment of scientific research and to assess the quality of the data produced by such an exercise. We would therefore encourage the REF to have all components of every submission assessed by two independent assessors and then investigate how strongly these are correlated and whether some assessors score more generously than others. Only then can we determine how reliable the data are. In summary, we have shown that none of the measures of scientific merit that we have investigated are reliable. In particular subjective peer review is error prone, biased, and expensive; we must therefore question whether using peer review in exercises such as the RAE and the REF is worth the huge amount of resources spent on them. Ultimately the only way to obtain (a largely) unbiased estimate of merit is to have pre-publication assessment, by several independent assessors, of manuscripts devoid of author's names and addresses. Nevertheless this will be a noisy estimate of merit unless we are prepared to engage many reviewers for each paper. Materials and Methods We compiled subjective assessments from two sources. The largest of these datasets was from the F1000 database (www.F1000.com). In the F1000 database a panel of experts selects and recommends papers from biology and medicine to subscribers of the database. Papers in the F1000 database are rated “recommended” (numerical score 6), “must read” (8), or “exceptional” (10). We chose to take all papers that been published in a single year, 2005; this was judged to be sufficiently recent to reflect current trends and biases in publishing, but sufficiently long ago to allow substantial numbers of citations to have accumulated. We restricted our analysis to those papers that had been assessed within 12 months of publication to minimize the influence that subsequent discussion and citation might have on the assessment. This gave us a dataset of 5,811 papers, with 1,328 papers having been assessed by two or more assessors within 12 months. We chose to consider the 5-year IFs, since it was over a similar time-scale to the period over which we collected citations. However, in our dataset the 2-year and 5-year IFs are very highly correlated (r = 0.99). Citations were obtained from Google Scholar in 2011. We also analysed the WT data collected by Allen et al. [1]. This is a dataset of 716 biomedical papers, which were published in 2005, and assessed within 6 months by two assessors. Papers were given scores of 4, landmark; 3, major addition to knowledge; 2, useful step forward; and 1, for the record. The scores were sorted such that the higher score was usually allocated to the first assessor; this will affect the correlations by reducing the variance within the first (and second) assessor scores. As a consequence the scores were randomly re-allocated to the first and second assessor. Citations were collated from Google Scholar in 2011. As with the F1000 data we used 5 year IFs from 2010. Data have been deposited with Dryad [8]. Because most journals are poorly represented in each dataset we estimated the within and between journal variance in the number of citations as follows. We rounded the IF to the nearest integer then grouped journals according to the integer value. We then performed ANOVA on those groups for which we had ten or more publications. Estimates of the error variance in assessment relative to variance in merit can be estimated as follows. Let us assume that the score (s) given by an assessor is linearly dependent upon the merit (m) and some error (es ): s = m+es . Let the variance in merit be and that for the error be , so the variance in the score is . If two assessors score the same paper the covariance between their scores will simply be and the hence the correlation between scores is (1) where . If we similarly assume that the number of citations a paper accumulates depends linearly on the merit and some error (with variance ) then the covariance between an assessor's score and the number of citations is and the correlation is (2) where . It is therefore straightforward to estimate rs and rc , and to obtain confidence intervals by bootstrapping the data. Supporting Information Table S1 The correlations, partial correlations, and standardized regression coefficients between assessor score (AS) and IF and the number of citations (CIT). ***p<0.001. (DOCX) Click here for additional data file. Table S2 Spearman correlation coefficients between assessor scores and assessor scores and the number of citations and the IF. ***p<0.001. (DOCX) Click here for additional data file. Table S3 The correlations, partial correlations, and standardized regression coefficients between assessor score (AS) and the log of IF and the log of the number of citations (CIT). ***p<0.001. (DOCX) Click here for additional data file.
                Bookmark

                Author and article information

                Journal
                BMJ Open
                BMJ Open
                bmjopen
                bmjopen
                BMJ Open
                BMJ Publishing Group (BMA House, Tavistock Square, London, WC1H 9JR )
                2044-6055
                2016
                27 July 2016
                : 6
                : 7
                : e011630
                Affiliations
                [1 ]Department of Mammary Disease, Guangdong Provincial Hospital of Chinese Medicine, The Second Clinical College of Guangzhou University of Chinese Medicine , Guangzhou, China
                [2 ]Department of Pharmacy, Nanfang Hospital, Southern Medical University, 1038 Guangzhou, China
                [3 ]Department of Pathophysiology, School of Basic Medical Sciences, Southern Medical University , Guangzhou, China
                [4 ]Department of Physiology, School of Basic Medical Sciences, Gannan Medical University , Ganzhou, China
                Author notes
                [Correspondence to ] Dr Ning Tan; gdtanning@ 123456126.com and Peng Cheng He; gdhpc100@ 123456126.com

                Y-hL, S-qW and J-hX contributed equally.

                Article
                bmjopen-2016-011630
                10.1136/bmjopen-2016-011630
                4964173
                27466238
                7a472f24-08e9-48be-8d37-30614b4cf848
                Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

                This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

                History
                : 22 February 2016
                : 12 May 2016
                : 4 July 2016
                Categories
                Renal Medicine
                Research
                1506
                1728
                1728

                Medicine
                bibliometric analysis,acute kidney injury,top 100 cited articles
                Medicine
                bibliometric analysis, acute kidney injury, top 100 cited articles

                Comments

                Comment on this article