I originally started this blog arguing why we need to be skeptical scientists. First, we need to be skeptical of our assumptions, our way of thinking, and the studies we do. Second, we need to be skeptical of the scientific literature we read. We cannot simply read a study and take it face value, we have to carefully scrutinize scientific studies to judge their evidential value.
In other words, we should not suddenly stop thinking scientifically when we are reading scientific papers. The same principles which guide us when interpreting a dataset should also guide us when interpreting a published study.
No science can be perfect, and it is conceivable that even the most carefully checked manuscript will contain a (minor) issue. Nevertheless, the scientific literature should be as flawless as possible. However, we cannot simply assume that the papers we read are error free, or have a high evidential value.
Four papers, 150+ errors
In an attempt to make it very clear why we must skeptical scientists, I would like to present the results of a thorough investigation by Nick Brown, Jordan Anaya, and me into four papers which contain not just one, but over 150 inconsistencies and impossibilities. It concerns the following four papers by Siğirci and Wansink et al. which appear to based on a single dataset from one field experiment:
- Just, D. R., Sığırcı, Ö., & Wansink, B. (2014). Lower Buffet Prices Lead to Less Taste Satisfaction. Journal of Sensory Studies, 29(5), 362-370.
- Just, D. R., Sigirci, O., & Wansink, B. (2015). Peak-end pizza: prices delay evaluations of quality. Journal of Product & Brand Management, 24(7), 770-778.
- Kniffin, K. M., Sigirci, O., & Wansink, B. (2016). Eating Heavily: Men Eat More in the Company of Women. Evolutionary Psychological Science, 2(1), 38-46.
- Siğirci, Ö., & Wansink, B. (2015). Low prices and high regret: how pricing influences regret at all-you-can-eat buffets. BMC Nutrition, 1(36), 1-5.
The full report can be found as a pre-print at PeerJ. For your convenience, here is the abstract:
We present the initial results of a reanalysis of four articles from the Cornell Food and Brand Lab based on data collected from diners at an Italian restaurant buffet. On a ﬁrst glance at these articles, we immediately noticed a number of apparent inconsistencies in the summary statistics. A thorough reading of the articles and careful reanalysis of the results revealed additional problems. The sample sizes for the number of diners in each condition are incongruous both within and between the four articles. In some cases, the degrees of freedom of between-participant test statistics are larger than the sample size, which is impossible. Many of the computed F and t statistics are inconsistent with the reported means and standard deviations. In some cases, the number of possible inconsistencies for a single statistic was such that we were unable to determine which of the components of that statistic were incorrect. We contacted the authors of the four articles, but they have thus far not agreed to share their data. The attached Appendix reports approximately 150 inconsistencies in these four articles, which we were able to identify from the reported statistics alone. We hope that our analysis will encourage readers, using and extending the simple methods that we describe, to undertake their own efforts to verify published results, and that such initiatives will improve the accuracy and reproducibility of the scientiﬁc literature.
I will leave you to it to read our report, interpret it carefully, and draw your own conclusions regarding the veracity of the four papers. The main points I want to make now are as follows:
First of all, we absolutely have to be skeptical of what we read and carefully consider the evidential value of published papers. The four papers which are discussed here serve as a case study to show how published research can be riddled with inconsistencies and impossibilities. These papers have been peer-reviewed, but that is certainly not something you can trust on blindly.
Secondly, there are a wide variety of tools which can help skeptical scientists to do this. You can use online tools like StatCheck and QuickCalcs to recalculate p values based on common tests (t tests, ANOVAs, etc). What is more, you can check descriptive statistics based on interval data with GRIMMER; in other words, you can see whether means and standard deviations from data with whole numbers (such as Likert scale data) are mathematically possible or not. See this list for even more tools. This means that you can and should check what you read.
Thirdly, as scientists we need to collectively get our shit together and extensively reorganize the peer review process and fix quality incentives.