The authors want to replicate the findings from a previous study (Zihl et al., PLoS One. 2014;9: e84590) and want to demonstrate that the Digit Symbol Substitution Test (DSST) is a valid and reliable measure of cognitive reserve (CR) when used in a testing-the-limits paradigm.
Their main hypotheses were that (1) participants with higher processing resources and higher executive capacities will show higher CR, and (2) CR is positively correlated with cognitive lifestyles, mood, well-being, and sleep quality.
To test the hypotheses, 136 healthy elderly participants (age range: 60–75 years) with high level of education (≥13 years of schooling) had to perform 10 consecutive DSST trials. In addition, the authors assessed several cognitive and non-cognitive variables that might contribute to, or interact with CR.
In my view this is an interesting study. However, there are several methodological problems that limit its scientific merit.
A major problem of the present study is that the performance of the elderly participants was not compared to the performance of a young group. Therefore, it is not possible to replicate previous findings and to clarify whether age is an important moderating factor. It is also not possible to clarify whether the performance gain in the DSST is special or extraordinary. It is well known that repeated practice leads to an improvement in performance, regardless whether elderly participants or brain-damaged patients were investigated.
Another problem in the present manuscript is the fact that the statistical analyses were unambitious. As Satz et al. (2010) indicated more complex statistical methods (e.g. Structural Equation Modeling, SEM) are needed to achieve “greater clarity to the conceptualization and study of the reserve construct”. By using SEM or regression analyses the moderating effects of the different variables (e.g. age, cognitive lifestyles, mood, well-being, and sleep quality) could be analyzed in a more sophisticated way.
A further problem is that there is no a priori model of the moderating or mediating variables presented and tested. For example, sleep quality or reading performance are not included in the two models postulated by Satz et al. (2010).
Due to these reasons the scientific merit of the present study is limited and it is difficult for me to see how the findings “... support and extend the model proposed by Satz et al.  ...”.
Here are further points that need some clarification:
• The authors should indicate that participants in this study did not participate in the first study (Zihl et al., PLoS One. 2014;9: e84590).
• Performance gain was calculated as the difference between correctly assigned symbols in the first and in the best trial of the 10 consecutive trials. I wonder whether this calculation is really indicating the pure performance gain. What if the best trials is just an outlier? Couldn´t a mean score of the last two or three trials not be a more stable and reliable indicator of pure performance gain?
• Please indicate when the more complex CR index is used and which score is presented in Table 1?
• In addition to the effect sizes, the respective confidence intervals should be reported.
• The Bonferroni correction is not clear to me. How many contrasts were calculated and what was the real selected alpha level? Please keep in mind that the Bonferroni correction is very conservative and leads to an increase in beta error.
• Please indicate the number of participants in the extreme group comparisons.
• Table 1: Scoring of the Stroop Test is not clear to me. If I understand correctly participants needed 100.45 seconds to read the word plate and 41.22 seconds the interference plate. Shouldn´t this be the other way round? The response time is fairly accurate. How was it measured?
• Obvious limitations (e.g. missing control group) of the study were not mentioned in the Discussion section of the manuscript.