Flawed science: 4 papers with 150+ errors

I originally started this blog arguing why we need to be skeptical scientists. First, we need to be skeptical of our assumptions, our way of thinking, and the studies we do. Second, we need to be skeptical of the scientific literature we read. We cannot simply read a study and take it face value, we have to carefully scrutinize scientific studies to judge their evidential value.

In other words, we should not suddenly stop thinking scientifically when we are reading scientific papers. The same principles which guide us when interpreting a dataset should also guide us when interpreting a published study.

No science can be perfect, and it is conceivable that even the most carefully checked manuscript will contain a (minor) issue. Nevertheless, the scientific literature should be as flawless as possible. However, we cannot simply assume that the papers we read are error free, or have a high evidential value.

Four papers, 150+ errors

In an attempt to make it very clear why we must skeptical scientists, I would like to present the results of a thorough investigation by Nick Brown, Jordan Anaya, and me into four papers which contain not just one, but over 150 inconsistencies and impossibilities.  It concerns the following four papers by Siğirci and Wansink et al. which appear to based on a single dataset from one field experiment:

  1. Just, D. R., Sığırcı, Ö., & Wansink, B. (2014). Lower Buffet Prices Lead to Less Taste Satisfaction. Journal of Sensory Studies, 29(5), 362-370.
  2. Just, D. R., Sigirci, O., & Wansink, B. (2015). Peak-end pizza: prices delay evaluations of quality. Journal of Product & Brand Management, 24(7), 770-778.
  3. Kniffin, K. M., Sigirci, O., & Wansink, B. (2016). Eating Heavily: Men Eat More in the Company of Women. Evolutionary Psychological Science, 2(1), 38-46.
  4. Siğirci, Ö., & Wansink, B. (2015). Low prices and high regret: how pricing influences regret at all-you-can-eat buffets. BMC Nutrition, 1(36), 1-5.

The full report can be found as a pre-print at PeerJ. For your convenience, here is the abstract:

We present the initial results of a reanalysis of four articles from the Cornell Food and Brand Lab based on data collected from diners at an Italian restaurant buffet. On a first glance at these articles, we immediately noticed a number of apparent inconsistencies in the summary statistics. A thorough reading of the articles and careful reanalysis of the results revealed additional problems. The sample sizes for the number of diners in each condition are incongruous both within and between the four articles. In some cases, the degrees of freedom of between-participant test statistics are larger than the sample size, which is impossible. Many of the computed F and t statistics are inconsistent with the reported means and standard deviations. In some cases, the number of possible inconsistencies for a single statistic was such that we were unable to determine which of the components of that statistic were incorrect. We contacted the authors of the four articles, but they have thus far not agreed to share their data. The attached Appendix reports approximately 150 inconsistencies in these four articles, which we were able to identify from the reported statistics alone. We hope that our analysis will encourage readers, using and extending the simple methods that we describe, to undertake their own efforts to verify published results, and that such initiatives will improve the accuracy and reproducibility of the scientific literature.

(Van der Zee, Anaya, & Brown, 2017)

I will leave you to it to read our report, interpret it carefully, and draw your own conclusions regarding the veracity of the four papers. The main points I want to make now are as follows:

First of all, we absolutely have to be skeptical of what we read and carefully consider the evidential value of published papers. The four papers which are discussed here serve as a case study to show how published research can be riddled with inconsistencies and impossibilities. These papers have been peer-reviewed, but that is certainly not something you can trust on blindly.

Secondly, there are a wide variety of tools which can help skeptical scientists to do this. You can use online tools like StatCheck and QuickCalcs to recalculate p values based on common tests (t tests, ANOVAs, etc). What is more, you can check descriptive statistics based on interval data with GRIMMER; in other words, you can see whether means and standard deviations from data with whole numbers (such as Likert scale data) are mathematically possible or not. See this list for even more tools. This means that you can and should check what you read.

Thirdly, as scientists we need to collectively get our shit together and extensively reorganize the peer review process and fix quality incentives.


2 thoughts on “Flawed science: 4 papers with 150+ errors

  1. Thank you, Tim. This is a nice teaching example, great for “how to detect BS” classes. Of course, we need to bear in mind that misleading science does not always reveal itself so blatantly. Many papers appear correct on the surface, but have been carefully p hacked behind the scenes. Or even more subtle, they have been designed – either intentionally or unintentionally – to capitalise on spurious factors, such as response bias (http://www.psychologicalscience.org/news/were-only-human/why-psychotherapy-appears-to-work-even-when-it-doesnt.html#.WIpqypKQNXg). Or they simply represent the tip of the iceberg, and a whole host of null studies remains hidden and unpublished.

    Sure, there’s self interest in this – we do need to change the incentive system so that we reward quality and not the loudest, most sensational or most pleasing results. But a more subtle factor at play is ideology. Your example above is a nice illustration of that. There’s ideology behind studies that seek to show that eating behaviour can be shaped by environment. Obesity is a societal bad, and we are highly motivated to hunt down the cultural perpetrators. Researchers can be easily duped by factitious effects because they heartily believe in the truth of their claims.

    Strong ideologies can be found in many areas of Psychology. In Social and Forensic Psychology, we’re motivated to seek the “dysfunction” that leads to undesirable social behaviours. In Health Psychology, we try to make sense of the randomness of illness through ideologies about mind-body relationships. And of course, in Psychotherapy, we are highly motivated to seek the “proof” that our favoured approach “works”.

    So I’ll be giving my students your article. But I’ll also be giving them one further piece of advice: beware of the ideology underlying the study.

  2. Thanks Carolyn. You identify a very crucial point. We are rigged to see patterns based on our prior experiences and expectations. We even see patterns in noise, although we rarely see noise in patterns.

    I think it is vital to realize that this holds not just for other people but also ourselves. What I mean is, we should not just think this when we read a study of someone else but also be aware that this guides our own thinking and behavior. If we are just being skeptical of others but not ourselves then we are lost as well.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.