There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
<p class="first" id="d68086e69">Ten years ago we issued, in conjunction with the Journal
of Chemical Information and
Modeling, an open prediction challenge to the cheminformatics community. Would they
be able to predict the intrinsic solubilities of 32 druglike compounds using only
a high-precision set of 100 compounds as a training set? The "Solubility Challenge"
was a widely recognized success and spurred many discussions about the prediction
methods and quality of data. Regardless of the obvious limitations of the challenge,
the conclusions were somewhat unexpected. Despite contestants employing the entire
spectrum of approaches available then to predict aqueous solubility and disposing
of an extremely tight data set, it was not possible to identify the best methods at
predicting aqueous solubility, a variety of methods and combinations all performed
equally well (or badly). Several authors have suggested since then that it is not
the poor quality of the solubility data which limits the accuracy of the predictions,
but the deficient methods used. Now, ten years after the original Solubility Challenge,
we revisit it and challenge the community to a new test with a much larger database
with estimates of interlaboratory reproducibility.
</p>