(Note: If you haven’t done so yet, be sure to read my earlier blog post as an introduction to the 4 papers with 150+ inconsistencies)
Many scientists will at some point in their academic career play a game about research ethics involving discussions of case descriptions. These cases typically start with a description of a tricky scenario, for example about two scientists arguing about who should be the first author, or about data management and scientific fraud. The interesting part of these discussions is the wide array of opinions on what you should do if you would find yourself in one of those scenarios.
This blog post is such a case. It is not fictional scenario but a real life case which is currently unfolding:
Case Description: Three researchers – Jordan, Nick, and Tim – find over 150 reporting inconsistencies in 4 published papers. Although they have contacted all the corresponding authors of these papers, only the senior author, Dr. Brian Wansink, has replied. However, Dr. Wansink did not want to share the anonymized dataset underlying the 4 papers or offer any explanation for the inconsistencies, and eventually stops replying.
Given this scenario, what would you do?
After our initial discovery we tried many different possible ‘solutions’ to this case of suspected research misconduct. First we wanted to be absolutely sure about our calculations of the 150+ inconsistencies, so all three of us checked everything. As we found no errors in our calculations, we proceeded with writing a thorough pre-print and published it at PeerJ Pre-Prints, which we then shared with others.
We received a lot of responses, and within a few days thousands of people read and downloaded our pre-print. The statistician Andrew Gelman blogged four times about it (1, 2, 3, 4); the Everything Hertz Podcast discussed it quite thoroughly; after interviewing one of us, Slate magazine published an article; and the investigation was featured at Retraction Watch. To our knowledge, two more media outlets are preparing an article about our investigation.
While these efforts helped to raise awareness, they did not by themselves cause any change in the situation. As such, we thought it would be best to find and contact some kind of ethics board at the university of the researcher. As such, we wrote the following long and detailed letter to the Office of Research Integrity and Assurance (ORIA) at the Cornell University, and a CC to the Institutional Research Board (IRB) at Cornell:
We are contacting you regarding four published articles of which Dr. Brian Wansink, of the Cornell Food and Brand Lab, is the senior author. The four articles involved are :
- Just, D. R., Sığırcı, Ö., & Wansink, B. (2014). Lower buffet prices lead to less taste satisfaction. Journal of Sensory Studies, 29(5), 362–370.
- Just, D. R., Sigirci, O., & Wansink, B. (2015). Peak-end pizza: prices delay evaluations of quality. Journal of Product & Brand Management, 24(7), 770–778.
- Kniffin, K. M., Sigirci, O., & Wansink, B. (2016). Eating Heavily: Men eat more in the company of women. Evolutionary Psychological Science, 2(1), 38–46.
- Siğirci, Ö., & Wansink, B. (2015). Low prices and high regret: how pricing influences regret at all-you-can-eat buffets. BMC Nutrition, 1(36), 1–5.
On November 21, 2016, Dr. Wansink published a blog post in which he gives a frank description of the research practices in his lab. The reactions to this blog post by members of the research community, typified by a number of the comments written by readers of the blog, suggest that many people regard the post as describing what are sometimes referred to as “questionable research practices” (e.g., generating hypotheses after having looked at the data). Other reactions concerned the hiring and employment policies in the Food and Brand Lab that appeared to be suggested by the blog post. Although both of these issues are important, however, neither is the focus of our request here.
The blog post lists five articles that were co-authored by the graduate student whose experience at Cornell is the focus of the post. Four of these articles have a common theme, namely patterns of consumption among diners at an all-you-can-eat buffet, featuring pizza, salad, and other food items. My colleagues and I have read these four articles, which all appear to be based on a single dataset, and identified a total of over 150 statistical and other errors and inconsistencies in them.
The nature and severity of these errors and inconsistencies vary widely. For example, many of the reported means and standard deviations (SDs) are mathematically impossible given the reported sample sizes. Many of the reported test statistics are internally impossible, such as reported t or F statistics that are incompatible with the means, SDs, and sample sizes used to calculate them. There are also multiple inconsistencies between the articles, such as samples and means which should be identical but are reported differently in each article; inconsistent descriptions of what observations were or were not made; and inconsistent reporting of the numbers of participants in each condition, both within and across studies.
Naturally, it would be easier for my colleagues and I to determine exactly what might be causing these errors and inconsistencies if we had access to the data set. Between January 5 and January 10, 2017, I had a cordial exchange of e-mails with representatives of the Food and Brand Lab, in which I requested access to these data. They explained that it was indeed possible to share data, although they “would then need to revise the IRB request for the study by adding you to the project as a co-author who has access to the data that is agreed upon.” It appears that the lab personnel initially thought that my colleagues and I might be wanting to conduct supplementary analyses of our own to investigate other aspects of the patterns of consumption in the data. However, when I made it clear that our aim was to identify the source of the errors and inconsistencies that we had found, I received (and have still received, as of this writing) no further reply.
We considered that the high number of errors and inconsistencies that we found was so high that it was worth writing up the methodology and results of our investigation. A preprint detailing our findings can be found attached to this letter, as well as here: https://peerj.com/preprints/2748v1/. At this time, this manuscript is under consideration by a peer-reviewed journal for possible publication.
Our investigation of these articles has already gained a considerable amount of attention, with our preprint getting over 4,400 views and 2,900 downloads in the eight days that have elapsed since its appearance. The four articles in question have also been the subject of a number of blog posts, including no less than four by the widely respected statistician Dr. Andrew Gelman of Columbia University. We also understand that both Slate magazine and the Guardian newspaper are preparing stories about this matter.
Although, as mentioned above, neither I nor my colleagues have received any further direct communication from the Food and Brand Lab about our request for the data, we recently noted that Dr. Wansink has added an update to his blog in which he explains why these data were not made public. An analogous comment was also added to the PubPeer pages for one of the articles in question; although this comment is not signed, we presume that it was posted with the authorization of Dr. Wansink.
I must confess that my colleagues and I find Dr. Wansink’s apparent attitude to our request to be a little frustrating. First, as already mentioned, I have received no reply to my formal request for a copy of the data set, although more than three weeks have now passed; yet, Dr. Wansink and his team appear to be happy to communicate about this matter on social media. Second, these posts (i.e., Dr. Wansink’s blog, and the post on PubPeer that we presume originates from an authorized source within the Food and Brand Lab) only address the question of why the data from these studies has not been made generally available in a publicly-accessible repository; they do not explain why an anonymized version of the dataset could not be shared with bona fide researchers. (If Dr. Wansink does not consider us to be bona fide researchers, he has so far failed to explain why this might be the case; certainly, when the representative of the lab indicated that it would be necessary to add my name retroactively to the IRB request, there was no mention of any vetting procedure that might be required.) Third, it appears that the data set in question has recently been shared with what Dr. Wansink, in his blog post, refers to as “a non-coauthor Stats Pro [sic],” whose identity and institutional affiliation have not been revealed.
Given the severity and scope of the errors and inconsistencies that we have identified in these articles, we believe that the further investigation of these problems, based on the data set, ought not to be left solely to an unnamed person appointed by the Principal Investigator himself. To do so would, we feel, risk giving the impression that Cornell is not acting in a fully transparent, collegial, and scientific manner here. We therefore hope that you will add your institutional support to our request to the Food and Brand Lab to give us access to the full data set that was collected at Aiello’s restaurant and used as the basis of these four articles. This would be of immense help to us in our efforts to verify the errors and inconsistencies that we found, and to identify their origin.
Although we do not find all of the reasons given by Dr. Wansink for why the data set cannot be freely shared with the public to be especially compelling, we are conscious of the need to respect the confidentiality clauses that were included in the consent forms that were signed by participants. Hence, we would be happy to give any assurances that might be required regarding the handling of these data, including the signing of non-disclosure agreements or other documents concerning the confidentiality and ethical protection of the participants in these studies, in order for the data to be released to us so that we may continue and deepen our reanalysis. We would also be happy for the names of participants, as well as any other genuinely personally identifiable information (e.g., addresses or phone numbers), to be removed from the data set before it is sent to us.
Thank you for your time, and your consideration of this matter.
This is where it becomes more interesting. If you were the chair of the Office of Research Integrity and Assurance (ORIA) at the Cornell University, what would you do? How would you reply to our letter?
Personally, I would deem it reasonable to agree with our request for anonymized data. In addition, I would argue that the sheer volume of errors constitute sufficient reason to initiate an investigation in the veracity of the research performed at the Cornell Food and Band lab. This is strengthened by the fact that Jordan found even more errors in six other, highly cited papers by Dr. Brian Wansink. However, I am not an expert in these matters and am personally involved, so perhaps my judgment is not the best.
This is how the Office of Research Integrity and Assurance decided to respond:
Thank you for your inquiry. Cornell University supports open inquiry and vigorous scientific debate. In the absence of sponsor or publisher data sharing requirements, however, Cornell allows its investigators to determine if and when it is appropriate to release raw data, subject to any IRB imposed limitations.
To clarify, the IRB is the Institutional Research Board (IRB) at Cornell University, which has (so far) not replied yet.
How should we respond?
What would you do?
Do you agree with the reply of the ORIA?
How do you hope the IRB will respond?
Share your thoughts!