In this video https://www.youtube.com/watch?v=8u6xdGCIq6o Wansink says that the eyes are looking down at 4 degrees, not the 9.67 degrees mentioned in the article. This kind of error makes very little sense to me unless the actual research is supremely unimportant to him. Compare the JAMA Pediatrics article (now retracted) in which, in the initial version, the conclusion was that a study conducted with 8-11 year olds (or, as Wansink now claims, he *believed* was conducted with 8-11 year olds) showed that the effect had been demonstrated in “preliterate” children. It is very tempting to believe that almost nothing is backed up by facts, and the key aim is to tell a good story each time.

]]>The draft with the quote

“This non-response is towards my opinion the strongest argument that my view about partial behaviour by BMJ is founded. I have thus no proof that the unavailability of the form can indeed be attributed to partial behaviour by BMJ. BMJ has, on the other hand, not rebutted that partial behaviour is the real motive for the decision that the form is unavailable. BMJ states on its website that they ‘encourage open debate, comment and criticism’. This statement indicates that BMJ has no objections against the publication of this article.”

has been published last week in the open access journal ‘Roars Transactions, a Journal on Research Policy and Evaluation’ at https://riviste.unimi.it/index.php/roars/article/view/9073

The publication of this paper (‘Is partial behaviour a plausible explanation for the unavailability of the ICMJE disclosure form of an author in a BMJ journal?’) shows, at least towards my opinion, that one can indeed use circular arguments in a scientific debate / dialogue with parties / people who do not respond.

]]>I think this is on the whole a good clarification of an interpretative problem lots of people have or have had at one point in their statistical education. A couple of perhaps minor issues keep me from adding this post to the recommendations for further reading for my students, which I thought I’d summarise here. (I appreciate that you probably purposefully glossed over these details for pedagogical reasons.)

– Much of what you discuss is also discussed in greater detail by Cumming & Finch (2005) (“Inference by eye: confidence intervals and how to read pictures of data”; American Psychologist).

– As other commenters pointed out, you can compute CIs without relying on the central limit theorem (e.g., using bootstrapping). In fact, the formula you present (I presume for pedagogical tractability) is only of limited use as it assumes Gaussian data (fine-grained, practically unbounded data in which there’s no relationship between the mean and the variance). This excludes Likert-type data, proportions etc. Furthermore, it assumes that the population SD has been estimated with great certainty (i.e., that n is large enough) and that the data point are independent of one another. Often, then, one can’t be 100% confident that the formula will yield 95% coverage intervals; indeed, one can often be 100% that it *won’t* (e.g., due to clustering, blocking, heteroskedasticity etc.). Other formulae are available for such cases but they have their own assumptions, so one can rarely be 100% confident in an algorithm’s coverage properties.

– As CP pointed out, I don’t think it’s helpful to say that confidence intervals “assume” that the population mean is equal to the sample mean. Such an assumption would obviate the need for confidence intervals in the first place. What you mean is that the sample mean is taken as a point of departure in the construction of the CI around it, but that’s just because the sample mean is an unbiased estimator of the population mean. Similarly, the t-test doesn’t assume the null hypothesis is true; this assumption would again obviate the need for testing it. Rather, it takes the null hypothesis as, well, a hypothetical.

– “What does this mean? That the validity of calculating a CI relies on the assumption of normality. Importantly, this does not mean that the scores in the sample need to be normally distributed, but that the scores in the population of samples needs to be (approximately) normally distributed.” That’s an assumption of using t-distributions when constructing CIs, but the formula you provided only assumes that the sample means are normally distributed, not that the data points themselves were sampled from a normal distribution.

– Even after Morey et al.’s (2016) “The fallacy of placing confidence in confidence intervals” reading, I still didn’t really understand why one can’t say that any given CI doesn’t contain µ with a 95% probability. This was often explained in terms of “the CI either contains µ or it doesn’t, but we don’t know which”, which I didn’t find too helpful. (I’ll either win the lottery or I won’t, but I can still quantify the probability of each outcome.) I’m still not fully sure how best to explain it, though.

– Related to the previous point: Confidence intervals are indeed easily misinterpreted. But I’ll go out on a limp and say that typical misconceptions about confidence intervals are less aggravating than typical misconceptions about p-values. That doesn’t mean one shouldn’t try to understand them or teach them correctly, but it’s some form of consolation. (Incidentally, I think that Bayesian credible intervals can equally as easily be misinterpreted, namely by neglecting their dependence on the specification of the model or any bias in the data.)

]]>In any context where there is external information about the effect size (e.g., all values from -inf to +inf are not equally plausible a priori), the frequentist interval and the (numerically equivalent) bayesian interval under a flat prior will sometimes be plainly wrong as a statement about the value of the population parameter.

See

http://andrewgelman.com/2013/11/21/hidden-dangers-noninformative-priors/

Also, looking back, I think I see the expository problem more clearly. You say “you basically just ran two one-sided hypotheses tests: 4.0 and all smaller values are significantly different from the null hypothesis of X=5, and 6.0 and all larger values are also significant.” Fair enough. These are two end-point defining tests. But earlier you say that “The CIs around the two means are based on the assumption that each population mean is equal to each sample mean”. There are two tests in being described in each sentence, but the nulls and tests of the first sentence are not the nulls and tests of the second sentence.

]]>