Two experiments for evaluating the agreement between bibliometrics and informed peer review - depending on two large samples of journal articles - were performed by the Italian governmental agency for research evaluation. They were presented as successful and as warranting the combined use of peer review and bibliometrics in research assessment exercises. However, the results of both experiments were supposed to be based on a stratified random sampling of articles with a proportional allocation, even if solely subsets of the original samples in the strata were selected owing to the presence of missing articles. Such a kind of selection has the potential to introduce biases in the results of the experiments, since different proportions of articles could be missed in different strata. In order to assess the 'representativeness' of the sampling, we develop a novel statistical test for assessing the homogeneity of missing proportions between strata and we consider its application to data of both experiments. Outcome of the testing procedure show that the null hypotesis of missing proportion homogeneity should be rejected for both experiments. As a consequence, the obtained samples cannot be considered as 'representative' of the population of articles submitted to the research assessments. It is therefore impossible to exclude that the combined use of peer review and bibliometrics might have introduced uncontrollable major biases in the final results of the Italian research assessment exercises. Moreover, the two experiments should not be considered as valid pieces of knowledge to be used in the ongoing search of the Holy Grail of a definite agreement between peer review and bibliometrics.