Statistics is (almost) useless

As scientists, what are we doing? Someone once told me that when you study education (like I do) your job is to separate that which is obvious and false, from that which is obvious and true. In other words, we can often think we know something, but with a scientific methodology we can figure out if we are justified in thinking so.

Evidence is not about stating what is true, but about making accurate and justified claims. The key point here is justification, because maybe we had an accurate idea all along, but you can only truly build on ideas which are justified. Whenever I talk about ‘evidence’ or about ‘proving’ something I do not claim to know the truth, but that I am justified in making a claim with a certain level of confidence (which might be my main reason for preferring Bayesian uncertainty modeling over binary frequentist stats).

When it comes to evidence, using statistics is almost entirely useless. In the entire process from making observations to making accurate and justified claims, the application of statistics is one of the least important and least influential steps. I exaggerate a little, of course, but there is an important point here.

To make accurate and justified claims on the basis of a study there are a wide range of criteria which need to be fulfilled, and the use of statistics certainly has its role here. However, it is one of the last links in a very long chain and its use very much depends on the strength of the prior links. If a medical trial was not properly controlled it does not matter if the experimental group outperformed the pseudo-control group, nor should we be moved by qualifiers such as statistical significance. Likewise, using statistics to make generalizations is only as valid as the appropriateness of the sampling plan.

Realizing this has (or should have) many consequences for how we set up our studies and read articles. I’ve always found the Method sections to be most important, not because there might be numbers there, but because the validity of a study depends on how it has been executed. Similarly, knowing that a study was pre-registered (and maybe even peer-reviewed!) before any data was collected or analyzed dramatically increases the evidential value of a study. This is why I saddened by the state of research on education because there appear to be only 4 pre-registered studies.

Getting statistics right is important, but this only becomes relevant with a study design that can consistently give us reliable information. Small studies, improper sampling, or lack of multilevel methods can corrupt your research in a way that cannot be salvaged by statistics.

Exposing bad applications of statistics is also important, but what is paramount is to reduce the corrupting influence of publication bias and HARKing on the veracity of the scientific literature.

Leave a Comment