Blog
About


  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Statistical analysis of numerical preclinical radiobiological data

Read Bookmark

Abstract

Background

Scientific fraud is an increasingly vexing problem. Many current programs for fraud detection focus on image manipulation, while techniques for detection based on anomalous patterns that may be discoverable in the underlying numerical data get much less attention, even though these techniques are often easy to apply.

Methods

We applied statistical techniques in considering and and comparing data sets from 10 researchers in one laboratory and three outside investigators to determine whether anomalous patterns in data from a research teaching specialist (RTS) were likely to have occurred by chance. Rightmost digits of values in RTS data sets were not, as expected, uniform. Equal pairs of terminal digits occurred at higher than expected frequency (>10%) and an unexpectedly large number of data triples commonly produced in such research included values near their means as an element. We applied standard statistical tests (chi-square goodness of fit, binomial probabilities) to determine the likelihood of the first two anomalous patterns and developed a new statistical model to test the third.

Results

Application of the three tests to various data sets reported by RTS resulted in repeated rejection of the hypotheses (often at p-levels well below 0.001) that anomalous patterns in those data may have occurred by chance. Similar application to data sets from other investigators was entirely consistent with chance occurrence.

Conclusions

This analysis emphasizes the importance of access to raw data that form the bases of publications, reports, and grant applications in order to evaluate the correctness of the conclusions and the importance of applying statistical methods to detect anomalous, especially potentially fabricated, numerical results.

Related collections

Most cited references 25

  • Record: found
  • Abstract: not found
  • Article: not found

Science publishing: The trouble with retractions.

  • Record: found
  • Abstract: found
  • Article: not found

The analysis of 168 randomised controlled trials to test data integrity.

The purpose of this study was to use some statistical methods to assess if randomised controlled trials (RCTs) published by one particular author (Fujii) contained data of unusual consistency. I searched seven electronic databases, retrieving 168 RCTs published by this author between 1991 and July 2011. I extracted rates for categorical variables and means (SDs) for continuous variables, and compared these published distributions with distributions that would be expected by chance. The published distributions of 28/33 variables (85%) were inconsistent with the expected distributions, such that the likelihood of their occurring ranged from 1 in 25 to less than 1 in 1 000 000 000 000 000 000 000 000 000 000 000 (1 in 10(33)), equivalent to p values of 0.04 to < 1 × 10(-33) , respectively. In 141 human studies, 13/13 published continuous variable distributions were inconsistent with expected, their likelihoods being: weight < 1 in 10(33) ; age < 1 in 10(33) ; height < 1 in 10(33) ; last menstrual period 1 in 4.5 × 10(15) ; baseline blood pressure 1 in 4.2 × 10(5) ; gestational age 1 in 28; operation time < 1 in 10(33) ; anaesthetic time < 1 in 10(33) ; fentanyl dose 1 in 6.3 × 10(8) ; operative blood loss 1 in 5.6 × 10(9) ; propofol dose 1 in 7.7 × 10(7) ; paracetamol dose 1 in 4.4 × 10(2) ; uterus extrusion time 1 in 33. The published distributions of 7/11 categorical variables in these 141 studies were inconsistent with the expected, their likelihoods being: previous postoperative nausea and vomiting 1 in 2.5 × 10(6) ; motion sickness 1 in 1.0 × 10(4) ; male or female 1 in 140; antihypertensive drug 1 in 25; postoperative headache 1 in 7.1 × 10(10) ; postoperative dizziness 1 in 1.6 × 10(6) ; postoperative drowsiness 1 in 3.8 × 10(4) . Distributions for individual RCTs were inconsistent with the expected in 96/134 human studies by Fujii et al. that reported more than two continuous variables, their likelihood ranging from 1 in 22 to 1 in 140 000 000 000 (1 in 1.4 × 10(11)), compared with 12/139 RCTs by other authors. In 26 canine studies, the distributions of 8/9 continuous variables were inconsistent with the expected, their likelihoods being: right atrial pressure < 1 in 10(33) ; diaphragmatic stimulation (100 Hz) < 1 in 10(33) ; pulmonary artery occlusion pressure < 1 in 10(33) ; diaphragmatic stimulation (20 Hz) < 1 in 10(33) ; heart rate 1 in 6.3 × 10(10) ; mean pulmonary artery pressure 1 in 2.2 × 10(14) ; mean arterial pressure 1 in 6.3 × 10(7) ; cardiac output 1 in 110. Distributions were inconsistent with the expected in 21/24 individual canine studies that reported more than two continuous variables, their likelihood ranging from 1 in 345 to 1 in 51 000 000 000 000 (1 in 5.1 × 10(13)). Anaesthesia © 2012 The Association of Anaesthetists of Great Britain and Ireland.
  • Record: found
  • Abstract: found
  • Article: not found

Just post it: the lesson from two cases of fabricated data detected by statistics alone.

I argue that requiring authors to post the raw data supporting their published results has the benefit, among many others, of making fraud much less likely to go undetected. I illustrate this point by describing two cases of suspected fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these published results provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraud's explanations for his anomalous results. If journals, granting agencies, universities, or other entities overseeing research promoted or required data posting, it seems inevitable that fraud would be reduced.

Author and article information

Affiliations
[1]Renaissance Associates, Princeton, NJ, USA
[2]NJ Medical School, Rutgers University, Newark, NJ 07101-1709, USA
Author notes
[*]Corresponding author’s e-mail address: hill@123456njms.rutgers.edu
Contributors
(View ORCID Profile)
Journal
SOR-STAT
ScienceOpen Research
ScienceOpen
2199-1006
22 January 2016
: 0 (ID: 8aa0f248-2bad-44c6-adfd-42816c14c272)
: 0
: 1-22
© 2016 Pitt et al.

This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

Counts
Figures: 2, Tables: 3, References: 28, Pages: 22
Product
Categories
Original article

Comments

2017-01-31 17:25 UTC
+1
2017-01-31 17:18 UTC
+1
2017-01-26 21:17 UTC
+1
2017-01-26 18:14 UTC
+1

Let me begin by commending the authors for a thought-provoking and persuasive paper on an important topic.  Since my background is in political science, not biology, and since Chris Hartgerink’s review aptly discussed a number of technical issues, I will focus my remarks on some of the big picture issues that arise in statistical exercises designed to detect data fraud, responding to the authors’ comment about “routine application” of this type of investigation:

“We believe that routine application of statistical tools to identify potential fabrication could help to avoid the pitfalls of undetected fabricated data just as tools, for example, CrossCheck and TurnItIn, are currently used to detect plagiarism.” (p.1)

Like the authors, I see no reason that that statistical methods for detecting fabrication should be “all but ignored…by the larger world.”  Clearly, these methods are useful and have the potential to be quite convincing, and the current application is a case in point. 

Still, if applied in a “routine” manner, the statistical detection of irregularities raises the question of how the presumption of innocence on the part of the accused should be built into the statistical analysis.   The authors say little about this, for understandable reasons. This manuscript presents such overwhelming statistical evidence that almost any priors about innocence would be swept aside by the tsunami of evidence showing that RTS’s data are incompatible with basic probability models, with data generated by other works, and by data generated by other labs. 

However, because the authors ultimately wish to speak to the policy question of whether such data checks should become routine, they should step back and consider the consequences of routine checks performed on a vast scale across a large number of labs.  Even in a world were no data fraud occurs, suspicious patterns will occur by chance.  For example, if 100,000 innocent data producers were subjected to routine checks, 1000 of them will be the objects of suspicion based on hypothesis tests with a 0.01 Type I error rate.  Perhaps more poignantly, 10 of them will be objects of quite intense suspicion based on hypothesis tests with at 0.0001 Type I error rate.  The authors should consider the systemic implications of mandating routine checks and how one might develop statistical procedures and investigative guidelines that balance the aim of detecting data fraud and the downside risk of false inculpation.

More generally, I would recommend that the authors place their analysis (and policy prescriptions) within a Bayesian framework.  Prior to conducting the statistical investigation, the investigator who suspects fraud starts with some priors about whether the lab worker fabricated the data.  These priors might be informed by a number of circumstantial facts – for example, was the lab worker unsupervised when recording the data?  The authors of this paper start with a null hypothesis of no fabrication and try to reject it at some high level of significance, but another approach is to begin with a strong presumption of innocence (e.g., a prior probability of data fabrication of 0.001 or less) as an input into Bayes’ Rule.  Next, the researcher assesses (theoretically or intuitively through experience) the likelihood of observing statistical evidence suggesting fabrication given that fabrication occurred as well as the likelihood of observing statistical evidence suggesting fabrication given that fabrication did not occur.  These likelihoods depend on the characteristics of the statistical tests (such as the Poisson model the authors propose) and on intuitions about the fabrication process.  In this regard, the authors do a nice job of suggesting that the suspicious triplet pattern at the center of their analysis would be consistent with data fabrication because it would be especially convenient for the data fabricator. 

More subtly, these likelihoods also depend on how many such tests were conducted and which ones were presented.  Again, in this particular application, the results that the authors present are overwhelming, and there is no reason to think that tests other than the ones presented were conducted or would be relevant.  In the general case, however, an impartial reader might wonder whether tests other than the ones presented were conducted and, if so, how they would affect the posteriors that would emerge from Bayes’ Rule. Just as the authors call for open access to replication data, they should also call for transparency and comprehensiveness in reporting of investigative analyses.  Forensic exercises such as this one should report the full set of tests that were conducted (including code and data) so that the reader is assured that tests were not presented selectively or inaccurately.  The concluding section of the paper might fruitfully include a checklist that lays out what investigations of this sort should present to readers.

Suppose that the evidence is presented in a comprehensive and accurate manner.  The final step would be to generate a posterior probability of data fabrication given the evidence using Bayes’ Rule.  For example, one could plug in the inputs at one of the many on-line sites like this one: http://psych.fullerton.edu/mbirnbaum/bayes/BayesCalc.htm

In some cases, the results of this exercise may be sensitive to the prior probability of guilt.   If the procedure is being applied to all researchers as a matter of “routine,” then the prior probability may be fairly low (if one believes that data-faking tends to be rare).  Given the severity of the accusation of data fraud, it may make sense as a matter of policy to keep the prior fairly low even when a specific person comes under investigation in the wake of some suspicious behavior.

The results from Bayes’ Rule may also be sensitive to the specified probability of inculpatory evidence given data fabrication.  Although this paper presented an intuitive theory about why data fraud in this domain would take the form that it did, in other situations we may not have a clear sense of how fabrication would occur, and so it may be hard to pin this quantity down.  I would be curious to hear the authors’ thoughts on how this quantity should be handled as part of routine surveillance.

In conclusion, the authors do a good job of developing statistical tests tailored to the application at hand and presenting overwhelming evidence of guilt.  The final section of their paper summarizes other cases of fraud in which statistical irregularities resulted in a fuller investigation that left no doubt about guilt.  Less is said about instances in suspicions were found to be groundless (or ambiguous) upon further statistical investigation.  In sum, would invite the authors to say more about the potential for, and systemic implications of, false alarms.

2016-07-01 10:50 UTC
+1

Comment on this article