Scientific fraud is an increasingly vexing problem. Many current programs for fraud detection focus on image manipulation, while techniques for detection based on anomalous patterns that may be discoverable in the underlying numerical data get much less attention, even though these techniques are often easy to apply.
We applied statistical techniques in considering and and comparing data sets from 10 researchers in one laboratory and three outside investigators to determine whether anomalous patterns in data from a research teaching specialist (RTS) were likely to have occurred by chance. Rightmost digits of values in RTS data sets were not, as expected, uniform. Equal pairs of terminal digits occurred at higher than expected frequency (>10%) and an unexpectedly large number of data triples commonly produced in such research included values near their means as an element. We applied standard statistical tests (chi-square goodness of fit, binomial probabilities) to determine the likelihood of the first two anomalous patterns and developed a new statistical model to test the third.
Application of the three tests to various data sets reported by RTS resulted in repeated rejection of the hypotheses (often at p-levels well below 0.001) that anomalous patterns in those data may have occurred by chance. Similar application to data sets from other investigators was entirely consistent with chance occurrence.
This analysis emphasizes the importance of access to raw data that form the bases of publications, reports, and grant applications in order to evaluate the correctness of the conclusions and the importance of applying statistical methods to detect anomalous, especially potentially fabricated, numerical results.