1. INTRODUCTION We are grateful for the opportunity to discuss this new test, based on marginal screening, of a global null hypothesis in linear models. Marginal screening has become a very popular tool for reducing dimensionality in recent years, and a great deal of work has focused on its variable selection properties (e.g., Fan and Lv 2008; Fan, Samworth, and Wu 2009). Corresponding inference procedures are much less well developed, and one of the interesting contributions of this article is the observation that the limiting distribution (here and throughout, we use the same notation as in the article) of n 1 / 2 ( θ ^ n - θ 0 ) is discontinuous at θ0 = 0. Such nonregular limiting distributions are well known to cause difficulties for the bootstrap (e.g., Beran 1997; Samworth 2003). Although in some settings, these issues are an artefact of the pointwise asymptotics of consistency usually invoked to justify the bootstrap (Samworth 2005), there are other settings where some modification of standard bootstrap procedures is required. Two such examples include bootstrapping Lasso estimators (Chatterjee and Lahiri 2011) and certain classification problems (Laber and Murphy 2011), where thresholded versions of the obvious estimators are bootstrapped, in an analogous fashion to the approach in this article. 2. STANDARDIZED OR UNSTANDARDIZED PREDICTORS? Theorem 1 of the article reveals that the limiting distribution of n 1 / 2 ( θ ^ n - θ 0 ) may be quite complicated, even under the global null. To see this, consider a setting where p = 2, where X 1 and X 2 are highly correlated, but v ar ( X 1 ) ≪ v ar ( X 2 ) . In this case, it is essentially a coin toss as to which predictor has the greater sample correlation with Y, but if k ^ n = 1 then | θ ^ n | will be tend to be large, while if k ^ n = 2 then | θ ^ n | will be tend to be small. The unfortunate consequence for the power of the procedure is that even for large sample sizes, we will only have a reasonable chance of rejecting the global null if we select X 1 (in particular, the power will be not much greater than 50% even when the signal is relatively large). For instance, consider the situation where n = 100, p = 2, X 1 ∼ N(0, 1), X 2 = 20X 1 + η, where η ∼ N(0, 1) is independent of X 1, and (1) Y = X 1 + ϵ , where ε ∼ N(0, 1) is independent of X 1 (and η). Instead of using adaptive resampling test (ART) to obtain the critical value for the test of size α = 0.05, we simply simulated from the null model where (X 1, X 2) are as above, but Y = ε ∼ N(0, 1). A density plot of the values of θ ^ n computed over 10,000 repetitions is shown in the top-left panel of Figure 1; note that the spike around 0 is due mainly to the 5017 occasions where X 2 happened to have higher absolute correlation with Y (i.e., k ^ n = 2 ). The critical value for the test was taken to be the 100(1 − α)th quantile of the realizations of | θ ^ n | , namely, 0.171. Under the alternative specified by (2.1), θ ^ n has a highly bimodal distribution as illustrated in the bottom-left panel of Figure 1. The only occasions when we were able to reject the null were when X 1 had higher absolute correlation with Y, yielding a power of 59.8%. Fortunately, it is straightforward to construct a slightly modified test statistic that can yield great improvements. Indeed, it is standard practice in variable selection contexts to standardize each predictor Xk so that E ^ ( X k ) = 0 and v ar ^ ( X k ) = n , and likewise to standardize the response so that E ^ ( Y ) = 0 and v ar ^ ( Y ) = n . This amounts to using the test statistic | θ ˜ n | , where θ ˜ n = Corr ^ ( X k ^ n , Y ) . Note that the definition of θ ˜ n does not depend on whether the predictors and the response have been standardized or not, and that we have the simple expression | θ ˜ n | = max j = 1 , ... , p | Corr ^ ( X j , Y ) | . For the example above, the top-right panel of Figure 1 gives a density plot of θ ˜ n under the null; the critical value for our modified test was 0.198. Under the alternative, θ ˜ n tends to be inflated, regardless of whether k ^ n = 1 or k ^ n = 2 ; in fact, we obtain an empirical power of 100%. Figure 1 Top row: density plots of θ ^ n (left) and θ ˜ n (right) under the global null hypothesis for the example in Section 2. Bottom row: corresponding density plots of θ ^ n (left) and θ ˜ n (right) under the alternative specified in (2.1). We emphasize that the problems described in this section are not observed in the simulation study of the article because there all of the predictors have equal variance. In the next section, we consider predictors and response standardized as above, and consider alternative approaches to calibrate the test statistic n 1 / 2 | θ ˜ n | , as well as another test statistic proposed in Goeman, van de Geer, and van Houwelingen (2006). 3. ALTERNATIVE APPROACHES Although the nonregularities in the problem considered here make the construction of a confidence interval for θ0 a challenging task, the particularly simple form of the global null hypothesis makes the testing problem amenable to several other approaches. Under the global null, X and Y are independent, so by the central limit theorem, n 1 / 2 Corr ^ ( X 1 , Y ) ⋮ Corr ^ ( X p , Y ) → d N p ( 0 , Θ ) , as n → ∞, where Θ jk = Corr(Xj, Xk). Then by the continuous mapping theorem, n 1 / 2 | θ ˜ n | → d max j = 1 , ... , p | Z j | , where (Z 1, …, Zp ) T ∼ Np (0, Θ). Since the distribution on the right does not depend on the distribution of Y, we can simulate n 1 / 2 | θ ˜ n | under the distribution of Y being (a) the empirical measure of the data Y 1, …, Yn , or (b) N(0, 1), for example, to calibrate the test statistic. Figures 2 and 3 display the results of using these approaches in the numerical experiments of Section 4.1 in the article. Method (a) appears to yield a test with size not exceeding its nominal level and with similar power to the ART procedure. When the error distribution is normal, the size of the test from method (b) is exactly equal to the nominal level, up to Monte Carlo error; again the power in similar to that of ART. Figure 2 The same graphs as in Figure 1 (ρ = 0.5) of the original article but for globaltest (black circles), method (a) (green crosses), method (b) (red plus signs), and the permutation test (blue triangles). Note model (i) is the null model. (For interpretation of the references to color in this caption and that of Figure 3, the reader is referred to the web version of the article.) Figure 3 The same graphs as in Figure 2 (ρ = 0.8) of the original article but for globaltest (black circles), method (a) (green crosses), method (b) (red plus signs), and the permutation test (blue triangles). An alternative approach to calibration is via permutations. Making the dependence of θ ˜ n on Y 1, …, Yn explicit, we note that the law of θ ˜ n ( Y 1 , ... , Y n ) is the same as that of θ ˜ n ( Y π ( 1 ) , ... , Y π ( n ) ) for any permutation π of {1, …, n}. The permutation test has the advantage over (a) and (b), of having its size not exceeding the nominal level regardless of the distribution of Y. Its power performance also seems close to that of ART. Although it may seem natural to base test statistics on θ ˜ n , there are other possibilities. For example, Goeman, van de Geer, and van Houwelingen (2006) constructed a locally most powerful test for high-dimensional alternatives under the global null. We compare the power of their globaltest procedure with the approaches discussed above, in Figures 2 and 3. Overall, its performance is similar to that of ART, though in certain settings it seems to have a slight advantage and in others a slight disadvantage. 4. EXTENSIONS In our view, the main attraction of ART is that it can be used to construct confidence intervals for θ n . It would be interesting to study empirically the coverage properties and lengths of these intervals. Another interesting related question would be to try to provide some form of uncertainty quantification for the variable having greatest absolute correlation with the response. The ideas of stability selection (Meinshausen and Bühlmann 2010; Shah and Samworth 2013) provide natural quantifications of variable importance through empirical selection probabilities over subsets of the data. However, it is not immediately clear how to use these to provide, say, a (nontrivial) confidence set of variable indices that with at least 1 − α probability contains all indices of variables having largest absolute correlation with the response (in particular this would be set full set {1, …, p} of indices under the global null). Although understanding marginal relationships between variables and the response is useful in certain contexts, in other situations, the coefficients from multivariate regression are of more interest. It would be interesting to see whether the ART methodology can be extended to provide confidence intervals for the largest regression coefficients in absolute value.