+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Descriptive vs. inferential cheating

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Given the recent and highly publicized scandals involving psychology researchers who cheated, the proliferation of articles on related topics is unsurprising. As an example, Simons et al. (2011) pointed out subtle ways in which researchers can increase their false positive rate above the nominal level of p < 0.05. From my perspective, a major limitation of the literature on cheating has been a failure to distinguish between two kinds of cheating (bias might be a kinder word), that I term descriptive and inferential cheating. I intend to demonstrate that inferential cheating is not as destructive as descriptive cheating. So what is descriptive and inferential cheating? Descriptive cheating involves the false reporting of descriptive data, such as sample means, proportions, standard deviations, and so on. The harm of descriptive cheating is obvious, has been demonstrated by previous scandals, and needs no further elaboration here. In contrast, when a researcher cheats inferentially, the descriptive data are true but the reported p-values (and associated t-tests, F-tests, and so on) are not. My conclusion that inferential cheating causes only limited harm is based on demonstrations that the null hypothesis significance testing procedure (NHSTP) is invalid. My conclusion is that although providing false information that matters a lot, such as wrong descriptive statistics, can do much harm, providing false information that matters hardly at all, such as false p values, does not do much harm. So what is wrong with the NHSTP? The basic idea is that if we are to reject the null hypothesis, it should be shown to have a low probability of being true, given the finding. But a p-value does not provide this; rather, a p-value only shows that a finding is rare given the null hypothesis (Nickerson, 2000). As Kass and Raftery (1995) pointed out, knowing that a finding is rare given a hypothesis is not useful unless one knows how rare the finding is given a competing hypothesis. Also, Trafimow (2003) demonstrated that (1) the null hypothesis can have a very high probability (including a probability of 1) of being true even when p < 0.05, (2) p-values generally are inaccurate estimators of probabilities of null hypotheses, and (3) the conditions needed to make p-values valid indicators of probabilities of null hypotheses preclude the researcher from gaining much information from the NHSTP. Furthermore, Trafimow and Rice (2009) demonstrated that the correlation between p values and probabilities of null hypotheses is low to begin with, and decreases to triviality when dichotomous “accept” or “reject” decisions are made based on cutoff numbers such as 0.05 or 0.01. The famous theorem by Bayes provides examples whereby the null hypothesis will be rejected even when it has a strong likelihood of being true. Suppose that the prior probability of the null hypothesis is 0.95, the probability of the finding given the null hypothesis is the traditional value of 0.05 (so the null hypothesis is rejected), and the prior probability of the finding given that the null hypothesis is not true is 0.06. In that case, the posterior probability of the rejected null hypothesis is ( 0.95 ) ( 0.05 ) ( 0.95 ) ( 0.05 ) + ( 0.06 ) ( 1 − 0.95 ) = 0.94 . In the foregoing example, I tacitly allowed the null hypothesis to represent a range of values. Worse yet, however, in most empirical psychology articles, the null hypothesis refers to a single value (e.g., that the difference between two conditions is zero). But when the null hypothesis refers to a specific value, it is a practical certainty that the value is not exactly true. With an infinite number of possible values, the probability that the single value specified by the null hypothesis is exactly true approaches zero (e.g., Meehl, 1967; Loftus, 1996; Trafimow, 2006), and so it should be rejected. The NHSTP has been demonstrated to be invalid and it results in p-values that have little correlation with actual probabilities of null hypotheses. We also have seen that when the null hypothesis specifies a point, as opposed to a range, it is almost certainly false regardless of the obtained p-value. Thus, whether the null hypothesis specifies a range or a point, the NHSTP is invalid. Arguably, because of its invalidity, the NHSPT should not be performed, and so inferential cheating bypasses a procedure that should not be used anyway. Thus, where is the harm in avoiding the use of a procedure that is blatantly invalid and only trivially correlated with what we really need to know (the probabilities of null hypotheses)? Let me be clear about what I am not saying. First, I am not disagreeing with various prescriptions for avoiding inferential cheating, particularly because many of them would reduce descriptive cheating too, and the latter is much more important. Second, I am not arguing that all inferential cheating is harmless; for example, harm can result when one makes improper estimates of population parameters based on poor inferential procedures even with accurate sample statistics. Third, it is quite possible that in attempting heroic measures to obtain p < 0.05, descriptive statistics also might be influenced, and this would be harmful to psychology. Fourth, from a deontological point of view, cheating is unethical in its own right, even apart from specific demonstrable consequences, and so the present argument should not be taken as a justification for any cheating whatsoever. With the foregoing caveats in place, my main point is as follows. Although descriptive cheating is harmful in specific and demonstrable ways, this is not true of the most common type of inferential cheating, which results in the rejection of null hypotheses in ways that deviate from ostensible proper practice. Clearly such inferential cheating is undesirable in a general deontological sense, but it is difficult to enumerate specific consequential harm to the field of psychology. That specific consequential harm from inferential cheating is so difficult to enumerate perhaps constitutes a further argument that the NHSTP should not be required for publication.

          Related collections

          Most cited references 6

          • Record: found
          • Abstract: found
          • Article: not found

          Null hypothesis significance testing: a review of an old and continuing controversy.

          Null hypothesis significance testing (NHST) is arguably the most widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data.
            • Record: found
            • Abstract: not found
            • Article: not found

            Psychology Will Be a Much Better Science When We Change the Way We Analyze Data.

              • Record: found
              • Abstract: found
              • Article: not found

              Hypothesis testing and theory evaluation at the boundaries: surprising insights from Bayes's theorem.

              Because the probability of obtaining an experimental finding given that the null hypothesis is true [p(F\H0)] is not the same as the probability that the null hypothesis is true given a finding [p(H0\F)], calculating the former probability does not justify conclusions about the latter one. As the standard null-hypothesis significance-testing procedure does just that, it is logically invalid (J. Cohen, 1994). Theoretically, Bayes's theorem yields p(H0\F), but in practice, researchers rarely know the correct values for 2 of the variables in the theorem. Nevertheless, by considering a wide range of possible values for the unknown variables, it is possible to calculate a range of theoretical values for p(H0\F) and to draw conclusions about both hypothesis testing and theory evaluation.

                Author and article information

                Front Psychol
                Front Psychol
                Front. Psychol.
                Frontiers in Psychology
                Frontiers Media S.A.
                11 September 2013
                : 4
                Department of Psychology, New Mexico State University Las Cruces, NM, USA
                Author notes
                *Correspondence: dtrafimo@ 123456nmsu.edu

                This article was submitted to Theoretical and Philosophical Psychology, a section of the journal Frontiers in Psychology.

                Edited by: Dan Lloyd, Trinity College, USA

                Copyright © 2013 Trafimow.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                Page count
                Figures: 0, Tables: 0, Equations: 1, References: 8, Pages: 2, Words: 1303
                General Commentary Article


                Comment on this article