Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules.

Results

The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set) and between non-inactive compounds in a given biological assay (156-K set). When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for ST ^ST-opt, CT ^ST-opt, ComboT ^ST-opt, ST ^CT-opt, CT ^CT-opt, and ComboT ^CT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN) pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs.

Conclusion

These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non-inactive from random and inactive spaces “on average”, although some assays show a noticeable separation between the non-inactive and random spaces when multiple conformers are used for each compound. The present study is a critical next step to understand effects of conformational diversity of the molecules upon the 3-D molecular similarity and its application to biological activity data analysis in PubChem. The results of this study may be helpful to build search and analysis tools that exploit 3-D molecular similarity between compounds archived in PubChem and other molecular libraries in a more efficient way.

Related collections

Most cited references 43

Record: found
Abstract: found
Article: not found

Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database

Paul Hawkins, A. Skillman, Gregory Warren … (2010)

Here, we present the algorithm and validation for OMEGA, a systematic, knowledge-based conformer generator. The algorithm consists of three phases: assembly of an initial 3D structure from a library of fragments; exhaustive enumeration of all rotatable torsions using values drawn from a knowledge-based list of angles, thereby generating a large set of conformations; and sampling of this set by geometric and energy criteria. Validation of conformer generators like OMEGA has often been undertaken by comparing computed conformer sets to experimental molecular conformations from crystallography, usually from the Protein Databank (PDB). Such an approach is fraught with difficulty due to the systematic problems with small molecule structures in the PDB. Methods are presented to identify a diverse set of small molecule structures from cocomplexes in the PDB that has maximal reliability. A challenging set of 197 high quality, carefully selected ligand structures from well-solved models was obtained using these methods. This set will provide a sound basis for comparison and validation of conformer generators in the future. Validation results from this set are compared to the results using structures of a set of druglike molecules extracted from the Cambridge Structural Database (CSD). OMEGA is found to perform very well in reproducing the crystallographic conformations from both these data sets using two complementary metrics of success.

0 comments Cited 401 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Database resources of the National Center for Biotechnology Information

Eric Sayers, Tanya Barrett, Dennis A Benson … (2011)

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

0 comments Cited 236 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

PubChem's BioAssay Database

Yanli Wang, Jewen Xiao, Tugba Suzek … (2011)

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activity data of small molecules and RNAi reagents. The mission of PubChem is to deliver free and easy access to all deposited data, and to provide intuitive data analysis tools. The PubChem BioAssay database currently contains 500 000 descriptions of assay protocols, covering 5000 protein targets, 30 000 gene targets and providing over 130 million bioactivity outcomes. PubChem's bioassay data are integrated into the NCBI Entrez information retrieval system, thus making PubChem data searchable and accessible by Entrez queries. Also, as a repository, PubChem constantly optimizes and develops its deposition system answering many demands of both high- and low-volume depositors. The PubChem information platform allows users to search, review and download bioassay description and data. The PubChem platform also enables researchers to collect, compare and analyze biological test results through web-based and programmatic tools. In this work, we provide an update for the PubChem BioAssay resource, including information content growth, data model extension and new developments of data submission, retrieval, analysis and download tools.

0 comments Cited 177 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Sunghwan Kim

Evan E Bolton

Stephen H Bryant

Journal

Journal ID (nlm-ta): J Cheminform

Journal ID (iso-abbrev): J Cheminform

Title: Journal of Cheminformatics

Publisher: BioMed Central

ISSN (Electronic): 1758-2946

Publication date Collection: 2012

Publication date (Electronic): 7 November 2012

Volume: 4

Page: 28

Affiliations

[1 ]National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, 20894, MD, USA

Article

Publisher ID: 1758-2946-4-28

DOI: 10.1186/1758-2946-4-28

PMC ID: 3537644

PubMed ID: 23134593

SO-VID: 4434e05d-5dd8-4cb4-919b-3608f24173d9

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 28 June 2012

Date accepted : 3 October 2012

Comments

Comment on this article

scite_

Cited by 10

See all cited by

Most referenced authors 570

See all reference authors

- Version 1

Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

Read this article at

Abstract

Background

Results

Conclusion

Related collections

ChemSpider related publications

Most cited references 43

Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database

Database resources of the National Center for Biotechnology Information

PubChem's BioAssay Database

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 44

Cited by 10

Most referenced authors 570