Yes, my thought exactly!

I already posted a similar post i did above on Chris Chamber’s blog a few days ago tying it to his criticism concerning “conceptual replications”, and “Registered Reports”. It hasn’t (yet) been posted though:

http://neurochambers.blogspot.nl/2012/03/you-cant-replicate-concept.html

Thank you for letting me share the dream, thank you for all your efforts in (improving) psychological science, and all the best!

]]>I am also a big fan of doing more extensive testing of a finding before publishing. That will result in fewer papers, and each paper having more information/evidence. Of course, you could also extensively test a paper throughout multiple publications, which has the advantage of a more rapid research cycle and earlier knowledge dissemination, but the same could be done through pre-prints that you update along the way.

]]>I have had a dream with several things being in common with your dream. I hope it’s okay for me to share my dream. I hope it makes sense, but i am not smart enough to decide whether it is. I’ll let the reader decide.

Thank you for all your efforts in trying to help improve psychological science!

1) Small groups of let’s say 5 researchers all working on the same theory/topic/construct perform a pilot study/exploratory study and at one point make it clear for themselves and the other members of the group to have their work rigorously tested.

2) These 5 studies will all then all be pre-registrated and prospectively replicated in a round robin fashion.

3) You would hereby end up with 5 (what perhaps often can be seen as “conceptual” replications depending on how far you want to go to consider something a “conceptual” replication) studies, that will all have been “directly” replicated 4 times (+ 1 version via the original researcher, which makes a total of 5).

4) All results will be published no matter the outcome in a single paper: for instance “Ego-depletion: Round 1”. This paper then includes 5 different “conceptual” studies (probably varying in degree of how “conceptual” they are, e.g. see LeBel et al.‘s “falsifiability is not optional” paper), which will all have been “directly’ replicated.

5) All members of the team of 5 researchers would then come up with their own follow-up study, possibly (partly) related to the results of the “first round”. The process repeats itself as long as deemed fruitful.

Additional thoughts related to this format which might be interesting regarding recent discussions and events in psychological science:

1) Possibly think how this format could influence the discussions about “creativity”, “science being messy” and the acceptance of “null-results”.

Researchers using this format could each come up with their own ideas for each “round” (creativity), there would be a clear demarcation between pilot-studies/exploratory studies and testing it in a confirmatory fashion (“science is messy”), and this could also contribute to publishing and “doing something” with possible null-results concerning inferences and conclusions (acceptance of “null-results”).

2) Possibly think about how this format could influence the discussion about how there may be too much information (i.c. Simonsohn’s “let’s publish fewer papers”).

Let’s say it’s reasonable that researchers can try and run 5 studies a year (2 years?) given time and resources (50-100 pp per study per individual researcher). That would mean that a group of researchers using this format could publish a single paper every 1 or 2 years (“let’s publish fewer papers”), but this paper would be highly informational given that it would be relatively highly-powered (5 x 50-100 pp = 250-500 pp per study), and would contain both “conceptual” and “direct” replications.

3) Possibly think about how this format could influence the discussion about “expertise” and “reverse p-hacking/deliberately wanting to find a “null-result” concerning replications.

Perhaps every member of these small groups would be inclined to a) “put forward” their “best” experiment they want to rigorously test using this format, and b) execute the replication part of the format (i.c. the replications of the other members’ study) with great attention and effort because they would be incentivized to do so. This is because “optimally” gathered information coming from this format (e.g. both significant and non-significant findings) would be directly helpful to them for coming up with study-proposals for the next round (e.g. see LeBel et al.’s “falsifiability is not optional” paper).

4) Possibly think about how this format could influence the discussion about “a single study almost never provides definitive evidence for or against an effect”, and problems if interpreting “single p-values”. Also see Fisher, 1926, p. 83: “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”

5) Possibly think about how this format could influence the discussion about the problematic grant-culture in academia. Small groups of collaborating researchers could write grant proposals together, and funding agencies would give their money to multiple researchers who each contribute their own ideas. Both things contribute to psychological science becoming less competetive and more collaborative.

6) The overall process of this format would entail a clear distinction of post-hoc theorizing and theory testing (c.f. Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012), “rounds” of theory building, testing, and reformulation (cf. Wallander, 1992) and could be viewed as a systematic manner of data collection (cf. Chow, 2002)

7) Finally, it might also be interesting to note that this format could lead to interesting meta-scientific information as well. For instance, perhaps the findings of a later “round” turn out to be more replicable due to enhanced accurate knowledge about a specific theory or phenomenon. Or perhaps it will show that the devastating typical process of research into psychological phenomena and theories described by Meehl (1978) will be cut-off sooner, or will follow a different path.

]]>“Importantly, the specific values of a calculated CI should not be directly interpreted.”

Strictly speaking, that’s not so. There is a way to interpret realized (calculated) CIs. The concept is “bet-proofness”. I learned about it from a recent paper by Mueller-Norets (Econometrica 2016).

Mueller-Norets (2016, published version, p. 2185):

“Following Buehler (1959) and Robinson (1977), we consider a formalization of “reasonableness” of a confidence set by a betting scheme: Suppose an inspector does not know the true value of θ either, but sees the data and the confidence set of level 1−α. For any realization, the inspector can choose to object to the confidence set by claiming that she does not believe that the true value of θ is contained in the set. Suppose a correct objection yields her a payoff of unity, while she loses α/(1−α) for a mistaken objection, so that the odds correspond to the level of the confidence interval. Is it possible for the inspector to be right on average with her objections no matter what the true parameter is, that is, can she generate positive expected payoffs uniformly over the parameter space? … The possibility of uniformly positive expected winnings may thus usefully serve as a formal indicator for the “reasonableness” of confidence sets.”

“The analysis of set estimators via betting schemes, and the closely related notion of a relevant or recognizable subset, goes back to Fisher (1956), Buehler (1959), Wallace (1959), Cornfield (1969), Pierce (1973), and Robinson (1977). The main result of this literature is that a set is “reasonable” or bet-proof (uniformly positive expected winnings are impossible) if and only if it is a superset of a Bayesian credible set with respect to some prior. In the standard problem of inference about an unrestricted mean of a normal variate with known variance, which arises as the limiting problem in well behaved parametric models, the usual [realized confidence] interval can hence be shown to be bet-proof.”

We had a good discussion about it over at Andrew Gelman’s blog some months ago.

http://andrewgelman.com/2017/03/04/interpret-confidence-intervals/

Some good contributions there, esp. by Carlos Ungil and Daniel Lakeland.

Full reference:

Credibility of Confidence Sets in Nonstandard Econometric Problems

Ulrich K. Mueller and Andriy Norets (2016)

https://www.princeton.edu/~umueller/cred.pdf

http://onlinelibrary.wiley.com/doi/10.3982/ECTA14023/abstract

The “reply” by the “redefine statistical significance” folks (or at least one of them)….

]]>For NEW studies, indeed, lowering alpha to .005 is a bad idea, because scientists will adapt their research strategy; when including more variables, more analyses, etc., it is just as easy to obtain p < .005 as it is now to obtain p < .05.

]]>