Poor response to treatment is a defining characteristic of reading disorder. In the present systematic review and meta-analysis, we found that the overall average effect size for treatment efficacy was modest, with a mean standardized difference of 0.38. Small true effects, combined with the difficulty to recruit large samples, seriously challenge researchers planning to test treatment efficacy in dyslexia and potentially in other learning disorders. Nonetheless, most published studies claim effectiveness, generally based on liberal use of multiple testing. This inflates the risk that most statistically significant results are associated with overestimated effect sizes. To enhance power, we propose the strategic use of repeated measurements with mixed-effects modelling. This novel approach would enable us to estimate both individual parameters and population-level effects more reliably. We suggest assessing a reading outcome not once, but three times, at pre-treatment and three times at post-treatment. Such design would require only modest additional efforts compared to current practices. Based on this, we performed ad hoc a priori design analyses via simulation studies. Results showed that using the novel design may allow one to reach adequate power even with low sample sizes of 30–40 participants (i.e., 15–20 participants per group) for a typical effect size of d = 0.38. Nonetheless, more conservative assumptions are warranted for various reasons, including a high risk of publication bias in the extant literature. Our considerations can be extended to intervention studies of other types of neurodevelopmental disorders.