Blog
About

294
views
0
recommends
+1 Recommend
0 collections
    16
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Power failure: why small sample size undermines the reliability of neuroscience

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.

          Related collections

          Most cited references 92

          • Record: found
          • Abstract: found
          • Article: not found

          G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences.

          G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of the t, F, and chi2 test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Why Most Published Research Findings Are False

            Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research findings are false. Here I will examine the key factors that influence this problem and some corollaries thereof. Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null findings. It can be proven that most claimed research findings are false As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11]. Consider a 2 × 2 table in which research findings are compared against the gold standard of true relationships in a scientific field. In a research field both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the field. R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R/(R + 1). The probability of a study finding a true relationship reflects the power 1 - β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 - β)R/(R - βR + α). A research finding is thus more likely true than false if (1 - β)R > α. Since usually the vast majority of investigators depend on a = 0.05, this means that a research finding is more likely true than false if (1 - β)R > 0.05. What is less well appreciated is that bias and the extent of repeated independent testing by different teams of investigators around the globe may further distort this picture and may lead to even smaller probabilities of the research findings being indeed true. We will try to model these two factors in the context of similar 2 × 2 tables. Bias First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect. Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias. We may assume that u does not depend on whether a true relationship exists or not. This is not an unreasonable assumption, since typically it is impossible to know which relationships are indeed true. In the presence of bias (Table 2), one gets PPV = ([1 - β]R + uβR)/(R + α − βR + u − uα + uβR), and PPV decreases with increasing u, unless 1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most situations. Thus, with increasing bias, the chances that a research finding is true diminish considerably. This is shown for different levels of power and for different pre-study odds in Figure 1. Conversely, true research findings may occasionally be annulled because of reverse bias. For example, with large measurement errors relationships are lost in noise [12], or investigators use data inefficiently or fail to notice statistically significant relationships, or there may be conflicts of interest that tend to “bury” significant findings [13]. There is no good large-scale empirical evidence on how frequently such reverse bias may occur across diverse research fields. However, it is probably fair to say that reverse bias is not as common. Moreover measurement errors and inefficient use of data are probably becoming less frequent problems, since measurement error has decreased with technological advances in the molecular era and investigators are becoming increasingly sophisticated about their data. Regardless, reverse bias may be modeled in the same way as bias above. Also reverse bias should not be confused with chance variability that may lead to missing a true relationship because of chance. Testing by Several Independent Teams Several independent teams may be addressing the same sets of research questions. As research efforts are globalized, it is practically the rule that several research teams, often dozens of them, may probe the same or similar questions. Unfortunately, in some areas, the prevailing mentality until now has been to focus on isolated discoveries by single teams and interpret research experiments in isolation. An increasing number of questions have at least one study claiming a research finding, and this receives unilateral attention. The probability that at least one study, among several done on the same question, claims a statistically significant research finding is easy to estimate. For n independent studies of equal power, the 2 × 2 table is shown in Table 3: PPV = R(1 − β n )/(R + 1 − [1 − α] n − Rβ n ) (not considering bias). With increasing number of independent studies, PPV tends to decrease, unless 1 - β < a, i.e., typically 1 − β < 0.05. This is shown for different levels of power and for different pre-study odds in Figure 2. For n studies of different power, the term β n is replaced by the product of the terms β i for i = 1 to n, but inferences are similar. Corollaries A practical example is shown in Box 1. Based on the above considerations, one may deduce several interesting corollaries about the probability that a research finding is indeed true. Box 1. An Example: Science at Low Pre-Study Odds Let us assume that a team of investigators performs a whole genome association study to test whether any of 100,000 gene polymorphisms are associated with susceptibility to schizophrenia. Based on what we know about the extent of heritability of the disease, it is reasonable to expect that probably around ten gene polymorphisms among those tested would be truly associated with schizophrenia, with relatively similar odds ratios around 1.3 for the ten or so polymorphisms and with a fairly similar power to identify any of them. Then R = 10/100,000 = 10−4, and the pre-study probability for any polymorphism to be associated with schizophrenia is also R/(R + 1) = 10−4. Let us also suppose that the study has 60% power to find an association with an odds ratio of 1.3 at α = 0.05. Then it can be estimated that if a statistically significant association is found with the p-value barely crossing the 0.05 threshold, the post-study probability that this is true increases about 12-fold compared with the pre-study probability, but it is still only 12 × 10−4. Now let us suppose that the investigators manipulate their design, analyses, and reporting so as to make more relationships cross the p = 0.05 threshold even though this would not have been crossed with a perfectly adhered to design and analysis and with perfect comprehensive reporting of the results, strictly according to the original study plan. Such manipulation could be done, for example, with serendipitous inclusion or exclusion of certain patients or controls, post hoc subgroup analyses, investigation of genetic contrasts that were not originally specified, changes in the disease or control definitions, and various combinations of selective or distorted reporting of the results. Commercially available “data mining” packages actually are proud of their ability to yield statistically significant results through data dredging. In the presence of bias with u = 0.10, the post-study probability that a research finding is true is only 4.4 × 10−4. Furthermore, even in the absence of any bias, when ten independent research teams perform similar experiments around the world, if one of them finds a formally statistically significant association, the probability that the research finding is true is only 1.5 × 10−4, hardly any higher than the probability we had before any of this extensive research was undertaken! Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. Small sample size means smaller power and, for all functions above, the PPV for a true research finding decreases as power decreases towards 1 − β = 0.05. Thus, other factors being equal, research findings are more likely true in scientific fields that undertake large studies, such as randomized controlled trials in cardiology (several thousand subjects randomized) [14] than in scientific fields with small studies, such as most research of molecular predictors (sample sizes 100-fold smaller) [15]. Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. Power is also related to the effect size. Thus research findings are more likely true in scientific fields with large effects, such as the impact of smoking on cancer or cardiovascular disease (relative risks 3–20), than in scientific fields where postulated effects are small, such as genetic risk factors for multigenetic diseases (relative risks 1.1–1.5) [7]. Modern epidemiology is increasingly obliged to target smaller effect sizes [16]. Consequently, the proportion of true research findings is expected to decrease. In the same line of thinking, if the true effect sizes are very small in a scientific field, this field is likely to be plagued by almost ubiquitous false positive claims. For example, if the majority of true genetic or nutritional determinants of complex diseases confer relative risks less than 1.05, genetic or nutritional epidemiology would be largely utopian endeavors. Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. As shown above, the post-study probability that a finding is true (PPV) depends a lot on the pre-study odds (R). Thus, research findings are more likely true in confirmatory designs, such as large phase III randomized controlled trials, or meta-analyses thereof, than in hypothesis-generating experiments. Fields considered highly informative and creative given the wealth of the assembled and tested information, such as microarrays and other high-throughput discovery-oriented research [4,8,17], should have extremely low PPV. Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias, u. For several research designs, e.g., randomized controlled trials [18–20] or meta-analyses [21,22], there have been efforts to standardize their conduct and reporting. Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes) [23]. Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) [24] may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported. Regardless, even in the most stringent research designs, bias seems to be a major problem. For example, there is strong evidence that selective outcome reporting, with manipulation of the outcomes and analyses reported, is a common problem even for randomized trails [25]. Simply abolishing selective publication would not make this problem go away. Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research [26], and typically they are inadequately and sparsely reported [26,27]. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable [28]. Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This seemingly paradoxical corollary follows because, as stated above, the PPV of isolated findings decreases when many teams of investigators are involved in the same field. This may explain why we occasionally see major excitement followed rapidly by severe disappointments in fields that draw wide attention. With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations [29]. Empirical evidence suggests that this sequence of extreme opposites is very common in molecular genetics [29]. These corollaries consider each factor separately, but these factors often influence each other. For example, investigators working in fields where true effect sizes are perceived to be small may be more likely to perform large studies than investigators working in fields where true effect sizes are perceived to be large. Or prejudice may prevail in a hot scientific field, further undermining the predictive value of its research findings. Highly prejudiced stakeholders may even create a barrier that aborts efforts at obtaining and disseminating opposing results. Conversely, the fact that a field is hot or has strong invested interests may sometimes promote larger studies and improved standards of research, enhancing the predictive value of its research findings. Or massive discovery-oriented testing may result in such a large yield of significant relationships that investigators have enough to report and search further and thus refrain from data dredging and manipulation. Most Research Findings Are False for Most Research Designs and for Most Fields In the described framework, a PPV exceeding 50% is quite difficult to get. Table 4 provides the results of simulations using the formulas developed for the influence of power, ratio of true to non-true relationships, and bias, for various types of situations that may be characteristic of specific study designs and settings. A finding from a well-conducted, adequately powered randomized controlled trial starting with a 50% pre-study chance that the intervention is effective is eventually true about 85% of the time. A fairly similar performance is expected of a confirmatory meta-analysis of good-quality randomized trials: potential bias probably increases, but power and pre-test chances are higher compared to a single randomized trial. Conversely, a meta-analytic finding from inconclusive studies where pooling is used to “correct” the low power of single studies, is probably false if R ≤ 1:3. Research findings from underpowered, early-phase clinical trials would be true about one in four times, or even less frequently if bias is present. Epidemiological studies of an exploratory nature perform even worse, especially when underpowered, but even well-powered epidemiological studies may have only a one in five chance being true, if R = 1:10. Finally, in discovery-oriented research with massive testing, where tested relationships exceed true ones 1,000-fold (e.g., 30,000 genes tested, of which 30 may be the true culprits) [30,31], PPV for each claimed relationship is extremely low, even with considerable standardization of laboratory and statistical methods, outcomes, and reporting thereof to minimize bias. Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias As shown, the majority of modern biomedical research is operating in areas with very low pre- and post-study probability for true findings. Let us suppose that in a research field there are no true findings at all to be discovered. History of science teaches us that scientific endeavor has often in the past wasted effort in fields with absolutely no yield of true scientific information, at least based on our current understanding. In such a “null field,” one would ideally expect all observed effect sizes to vary by chance around the null in the absence of bias. The extent that observed findings deviate from what is expected by chance alone would be simply a pure measure of the prevailing bias. For example, let us suppose that no nutrients or dietary patterns are actually important determinants for the risk of developing a specific tumor. Let us also suppose that the scientific literature has examined 60 nutrients and claims all of them to be related to the risk of developing this tumor with relative risks in the range of 1.2 to 1.4 for the comparison of the upper to lower intake tertiles. Then the claimed effect sizes are simply measuring nothing else but the net bias that has been involved in the generation of this scientific literature. Claimed effect sizes are in fact the most accurate estimates of the net bias. It even follows that between “null fields,” the fields that claim stronger effects (often with accompanying claims of medical or public health importance) are simply those that have sustained the worst biases. For fields with very low PPV, the few true relationships would not distort this overall picture much. Even if a few relationships are true, the shape of the distribution of the observed effects would still yield a clear measure of the biases involved in the field. This concept totally reverses the way we view scientific results. Traditionally, investigators have viewed large and highly significant effects with excitement, as signs of important discoveries. Too large and too highly significant effects may actually be more likely to be signs of large bias in most fields of modern research. They should lead investigators to careful critical thinking about what might have gone wrong with their data, analyses, and results. Of course, investigators working in any field are likely to resist accepting that the whole field in which they have spent their careers is a “null field.” However, other lines of evidence, or advances in technology and experimentation, may lead eventually to the dismantling of a scientific field. Obtaining measures of the net bias in one field may also be useful for obtaining insight into what might be the range of bias operating in other fields where similar analytical methods, technologies, and conflicts may be operating. How Can We Improve the Situation? Is it unavoidable that most research findings are false, or can we improve the situation? A major problem is that it is impossible to know with 100% certainty what the truth is in any research question. In this regard, the pure “gold” standard is unattainable. However, there are several approaches to improve the post-study probability. Better powered evidence, e.g., large studies or low-bias meta-analyses, may help, as it comes closer to the unknown “gold” standard. However, large studies may still have biases and these should be acknowledged and avoided. Moreover, large-scale evidence is impossible to obtain for all of the millions and trillions of research questions posed in current research. Large-scale evidence should be targeted for research questions where the pre-study probability is already considerably high, so that a significant research finding will lead to a post-test probability that would be considered quite definitive. Large-scale evidence is also particularly indicated when it can test major concepts rather than narrow, specific questions. A negative finding can then refute not only a specific proposed claim, but a whole field or considerable portion thereof. Selecting the performance of large-scale studies based on narrow-minded criteria, such as the marketing promotion of a specific drug, is largely wasted research. Moreover, one should be cautious that extremely large studies may be more likely to find a formally statistical significant difference for a trivial effect that is not really meaningfully different from the null [32–34]. Second, most research questions are addressed by many teams, and it is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve. In some research designs, efforts may also be more successful with upfront registration of studies, e.g., randomized trials [35]. Registration would pose a challenge for hypothesis-generating research. Some kind of registration or networking of data collections or investigators within fields may be more feasible than registration of each and every hypothesis-generating experiment. Regardless, even if we do not see a great deal of progress with registration of studies in other fields, the principles of developing and adhering to a protocol could be more widely borrowed from randomized controlled trials. Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values—the pre-study odds—where research efforts operate [10]. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test [36]. Nevertheless, most new discoveries will continue to stem from hypothesis-generating research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections [37], usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on how many relationships are expected to be true among those probed across the relevant research fields and research designs. The wider field may yield some guidance for estimating this probability for the isolated research project. Experiences from biases detected in other neighboring fields would also be useful to draw upon. Even though these assumptions would be considerably subjective, they would still be very useful in interpreting research claims and putting them in context.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research

              In the last decade the number of bioscience journals has increased enormously, with many filling specialised niches reflecting new disciplines and technologies. The emergence of open-access journals has revolutionised the publication process, maximising the availability of research data. Nevertheless, a wealth of evidence shows that across many areas, the reporting of biomedical research is often inadequate, leading to the view that even if the science is sound, in many cases the publications themselves are not “fit for purpose,” meaning that incomplete reporting of relevant information effectively renders many publications of limited value as instruments to inform policy or clinical and scientific practice [1]–[21]. A recent review of clinical research showed that there is considerable cumulative waste of financial resources at all stages of the research process, including as a result of publications that are unusable due to poor reporting [22]. It is unlikely that this issue is confined to clinical research [2]–[14],[16]–[20]. Failure to describe research methods and to report results appropriately therefore has potential scientific, ethical, and economic implications for the entire research process and the reputation of those involved in it. This is particularly true for animal research, one of the most controversial areas of science. The largest and most comprehensive review of published animal research undertaken to date, to our knowledge, has highlighted serious omissions in the way research using animals is reported [5]. The survey, commissioned by the National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs), a UK Government-sponsored scientific organisation, found that only 59% of the 271 randomly chosen articles assessed stated the hypothesis or objective of the study, and the number and characteristics of the animals used (i.e., species/strain, sex, and age/weight). Most of the papers surveyed did not report using randomisation (87%) or blinding (86%) to reduce bias in animal selection and outcome assessment. Only 70% of the publications that used statistical methods fully described them and presented the results with a measure of precision or variability [5]. These findings are a cause for concern and are consistent with reviews of many research areas, including clinical studies, published in recent years [2]–[22]. Good Reporting Is Essential for Peer Review and to Inform Future Research Scrutiny by scientific peers has long been the mainstay of “quality control” for the publication process. The way that experiments are reported, in terms of the level of detail of methods and the presentation of key results, is crucial to the peer review process and, indeed, the subsequent utility and validity of the knowledge base that is used to inform future research. The onus is therefore on the research community to ensure that their research articles include all relevant information to allow in-depth critique, and to avoiding duplicating studies and performing redundant experiments. Ideally scientific publications should present sufficient information to allow a knowledgeable reader to understand what was done, why, and how, and to assess the biological relevance of the study and the reliability and validity of the findings. There should also be enough information to allow the experiment to be repeated [23]. The problem therefore is how to ensure that all relevant information is included in research publications. Using Reporting Guidelines Measurably Improves the Quality of Reporting Evidence provided by reviews of published research suggests that many researchers and peer reviewers would benefit from guidance about what information should be provided in a research article. The CONSORT Statement for randomised controlled clinical trials was one of the first guidelines developed in response to this need [24],[25]. Since publication, an increasing number of leading journals have supported CONSORT as part of their instructions to authors [26],[27]. As a result, convincing evidence is emerging that CONSORT improves the quality and transparency of reports of clinical trials [28],[29]. Following CONSORT, many other guidelines have been developed—there are currently more than 90 available for reporting different types of health research, most of which have been published in the last ten years (see http://www.equator-network.org and references [30],[31]). Guidelines have also been developed to improve the reporting of other specific bioscience research areas including metabolomics and gene expression studies [32]–[37]. Several organisations support the case for improved reporting and recommend the use of reporting guidelines, including the International Committee of Medical Journal Editors, the Council of Science Editors, the Committee on Publication Ethics, and the Nuffield Council for Bioethics [38]–[41]. Improving the Reporting of Animal Experiments—The ARRIVE Guidelines Most bioscience journals currently provide little or no guidance on what information to report when describing animal research [42]–[50]. Our review found that 4% of the 271 journal articles assessed did not report the number of animals used anywhere in the methods or the results sections [5]. Reporting animal numbers is essential so that the biological and statistical significance of the experimental results can be assessed or the data reanalysed, and is also necessary if the experimental methods are to be repeated. Improved reporting of these and other details will maximise the availability and utility of the information gained from every animal and every experiment, preventing unnecessary animal use in the future. To address this, we led an initiative to produce guidelines for reporting animal research. The guidelines, referred to as ARRIVE (Animals in Research: Reporting In Vivo Experiments), have been developed using the CONSORT Statement as their foundation [24],[25]. The ARRIVE guidelines consist of a checklist of 20 items describing the minimum information that all scientific publications reporting research using animals should include, such as the number and specific characteristics of animals used (including species, strain, sex, and genetic background); details of housing and husbandry; and the experimental, statistical, and analytical methods (including details of methods used to reduce bias such as randomisation and blinding). All the items in the checklist have been included to promote high-quality, comprehensive reporting to allow an accurate critical review of what was done and what was found. Consensus and consultation are the corner-stones of the guideline development process [51]. To maximise their utility, the ARRIVE guidelines have been prepared in consultation with scientists, statisticians, journal editors, and research funders. We convened an expert working group, comprising researchers and statisticians from a range of disciplines, and journal editors from Nature Cell Biology, Science, Laboratory Animals, and the British Journal of Pharmacology (see Acknowledgments). At a one-day meeting in June 2009, the working group agreed the scope and broad content of a draft set of guidelines that were then used as the basis for a wider consultation with the scientific community, involving researchers, and grant holders and representatives of the major bioscience funding bodies including the Medical Research Council, Wellcome Trust, Biotechnology and Biological Sciences Research Council, and The Royal Society (see Table 1). Feedback on the content and wording of the items was incorporated into the final version of the checklist. Further feedback on the content utility of the guidelines is encouraged and sought. 10.1371/journal.pbio.1000412.t001 Table 1 Funding bodies consulted. Name of Bioscience Research Funding Body Medical Research Council Biotechnology and Biological Sciences Research Council Wellcome Trust The Royal Society Association of Medical Research Charities British Heart Foundation Parkinson's Disease Society The ARRIVE guidelines (see Table 2) can be applied to any area of bioscience research using laboratory animals, and the inherent principles apply not only to reporting comparative experiments but also to other study designs. Laboratory animal refers to any species of animal undergoing an experimental procedure in a research laboratory or formal test setting. The guidelines are not intended to be mandatory or absolutely prescriptive, nor to standardise or formalise the structure of reporting. Rather they provide a checklist that can be used to guide authors preparing manuscripts for publication, and by those involved in peer review for quality assurance, to ensure completeness and transparency. 10.1371/journal.pbio.1000412.t002 Table 2 Animal Research: Reporting In Vivo experiments: The ARRIVE guidelines. ITEM RECOMMENDATION TITLE 1 Provide as accurate and concise a description of the content of the article as possible. ABSTRACT 2 Provide an accurate summary of the background, research objectives (including details of the species or strain of animal used), key methods, principal findings, and conclusions of the study. INTRODUCTION Background 3 a. Include sufficient scientific background (including relevant references to previous work) to understand the motivation and context for the study, and explain the experimental approach and rationale.b. Explain how and why the animal species and model being used can address the scientific objectives and, where appropriate, the study's relevance to human biology. Objectives 4 Clearly describe the primary and any secondary objectives of the study, or specific hypotheses being tested. METHODS Ethical statement 5 Indicate the nature of the ethical review permissions, relevant licences (e.g. Animal [Scientific Procedures] Act 1986), and national or institutional guidelines for the care and use of animals, that cover the research. Study design 6 For each experiment, give brief details of the study design, including:a. The number of experimental and control groups.b. Any steps taken to minimise the effects of subjective bias when allocating animals to treatment (e.g., randomisation procedure) and when assessing results (e.g., if done, describe who was blinded and when).c. The experimental unit (e.g. a single animal, group, or cage of animals).A time-line diagram or flow chart can be useful to illustrate how complex study designs were carried out. Experimental procedures 7 For each experiment and each experimental group, including controls, provide precise details of all procedures carried out. For example:a. How (e.g., drug formulation and dose, site and route of administration, anaesthesia and analgesia used [including monitoring], surgical procedure, method of euthanasia). Provide details of any specialist equipment used, including supplier(s).b. When (e.g., time of day).c. Where (e.g., home cage, laboratory, water maze).d. Why (e.g., rationale for choice of specific anaesthetic, route of administration, drug dose used). Experimental animals 8 a. Provide details of the animals used, including species, strain, sex, developmental stage (e.g., mean or median age plus age range), and weight (e.g., mean or median weight plus weight range).b. Provide further relevant information such as the source of animals, international strain nomenclature, genetic modification status (e.g. knock-out or transgenic), genotype, health/immune status, drug- or test-naïve, previous procedures, etc. Housing and husbandry 9 Provide details of:a. Housing (e.g., type of facility, e.g., specific pathogen free (SPF); type of cage or housing; bedding material; number of cage companions; tank shape and material etc. for fish).b. Husbandry conditions (e.g., breeding programme, light/dark cycle, temperature, quality of water etc. for fish, type of food, access to food and water, environmental enrichment).c. Welfare-related assessments and interventions that were carried out before, during, or after the experiment. Sample size 10 a. Specify the total number of animals used in each experiment and the number of animals in each experimental group.b. Explain how the number of animals was decided. Provide details of any sample size calculation used.c. Indicate the number of independent replications of each experiment, if relevant. Allocating animals to experimental groups 11 a. Give full details of how animals were allocated to experimental groups, including randomisation or matching if done.b. Describe the order in which the animals in the different experimental groups were treated and assessed. Experimental outcomes 12 Clearly define the primary and secondary experimental outcomes assessed (e.g., cell death, molecular markers, behavioural changes). Statistical methods 13 a. Provide details of the statistical methods used for each analysis.b. Specify the unit of analysis for each dataset (e.g. single animal, group of animals, single neuron).c. Describe any methods used to assess whether the data met the assumptions of the statistical approach. RESULTS Baseline data 14 For each experimental group, report relevant characteristics and health status of animals (e.g., weight, microbiological status, and drug- or test-naïve) before treatment or testing (this information can often be tabulated). Numbers analysed 15 a. Report the number of animals in each group included in each analysis. Report absolute numbers (e.g. 10/20, not 50%a).b. If any animals or data were not included in the analysis, explain why. Outcomes and estimation 16 Report the results for each analysis carried out, with a measure of precision (e.g., standard error or confidence interval). Adverse events 17 a. Give details of all important adverse events in each experimental group.b. Describe any modifications to the experimental protocols made to reduce adverse events. DISCUSSION Interpretation/scientific implications 18 a. Interpret the results, taking into account the study objectives and hypotheses, current theory, and other relevant studies in the literature.b. Comment on the study limitations including any potential sources of bias, any limitations of the animal model, and the imprecision associated with the resultsa.c. Describe any implications of your experimental methods or findings for the replacement, refinement, or reduction (the 3Rs) of the use of animals in research. Generalisability/translation 19 Comment on whether, and how, the findings of this study are likely to translate to other species or systems, including any relevance to human biology. Funding 20 List all funding sources (including grant number) and the role of the funder(s) in the study. a Schulz, et al. (2010) [24]. Improved Reporting Will Maximise the Output of Published Research These guidelines were developed to maximise the output from research using animals by optimising the information that is provided in publications on the design, conduct, and analysis of the experiments. The need for such guidelines is further illustrated by the systematic reviews of animal research that have been carried out to assess the efficacy of various drugs and interventions in animal models [8],[9],[13],[52]–[55]. Well-designed and -reported animal studies are the essential building blocks from which such a systematic review is constructed. The reviews have found that, in many cases, reporting omissions, in addition to the limitations of the animal models used in the individual studies assessed in the review, are a barrier to reaching any useful conclusion about the efficacy of the drugs and interventions being compared [2],[3]. Driving improvements in reporting research using animals will require the collective efforts of authors, journal editors, peer reviewers, and funding bodies. There is no single simple or rapid solution, but the ARRIVE guidelines provide a practical resource to aid these improvements. The guidelines will be published in several leading bioscience research journals simultaneously [56]–[60], and publishers have already endorsed the guidelines by including them in their journal Instructions to Authors subsequent to publication. The NC3Rs will continue to work with journal editors to extend the range of journals adopting the guidelines, and with the scientific community to disseminate the guidelines as widely as possible (http://www.nc3rs.org.uk/ARRIVE).
                Bookmark

                Author and article information

                Journal
                Nature Reviews Neuroscience
                Nat Rev Neurosci
                Springer Science and Business Media LLC
                1471-003X
                1471-0048
                May 2013
                April 10 2013
                May 2013
                : 14
                : 5
                : 365-376
                10.1038/nrn3475
                23571845
                © 2013

                http://www.springer.com/tdm

                Comments

                Comment on this article