66
views
0
recommends
+1 Recommend
2 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RRegrs: an R package for computer-aided model selection with multiple regression models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others.

          Results

          We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package.

          Conclusion

          The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.

          Graphical abstract

          RRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling

          Related collections

          Most cited references38

          • Record: found
          • Abstract: not found
          • Article: not found

          A tutorial on support vector regression

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Binding of blood proteins to carbon nanotubes reduces cytotoxicity.

              With the potential wide uses of nanoparticles such as carbon nanotubes in biomedical applications, and the growing concerns of nanotoxicity of these engineered nanoparticles, the importance of nanoparticle-protein interactions cannot be stressed enough. In this study, we use both experimental and theoretical approaches, including atomic force microscope images, fluorescence spectroscopy, CD, SDS-PAGE, and molecular dynamics simulations, to investigate the interactions of single-wall carbon nanotubes (SWCNTs) with human serum proteins, and find a competitive binding of these proteins with different adsorption capacity and packing modes. The π-π stacking interactions between SWCNTs and aromatic residues (Trp, Phe, Tyr) are found to play a critical role in determining their adsorption capacity. Additional cellular cytotoxicity assays, with human acute monocytic leukemia cell line and human umbilical vein endothelial cells, reveal that the competitive bindings of blood proteins on the SWCNT surface can greatly alter their cellular interaction pathways and result in much reduced cytotoxicity for these protein-coated SWCNTs, according to their respective adsorption capacity. These findings have shed light toward the design of safe carbon nanotube nanomaterials by comprehensive preconsideration of their interactions with human serum proteins.
                Bookmark

                Author and article information

                Contributors
                gtsiliki@central.ntua.gr
                crm.publish@gmail.com
                seoane@stanford.edu
                carlos.fernandez@udc.es
                hsarimv@central.ntua.gr
                egon.willighagen@maastrichtuniversity.nl
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                15 September 2015
                15 September 2015
                2015
                : 7
                : 46
                Affiliations
                [ ]School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechneiou Street, Zografou Campus, 15780 Athens, Greece
                [ ]Computer Science Faculty, University of A Coruna, Campus Elviña, s/n, 15071 A Coruña, Spain
                [ ]Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, P.O. Box 616, UNS50 Box 19, 6200 MD Maastricht, The Netherlands
                [ ]Stanford Cancer Institute, Stanford University, C.J.Huang Building, 780 Welch Road, Palo Alto, CA 94304 USA
                Article
                94
                10.1186/s13321-015-0094-2
                4570700
                26379782
                df8fbfa8-627f-4409-b789-be7925b9d907
                © Tsiliki et al. 2015

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 7 April 2015
                : 24 August 2015
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2015

                Chemoinformatics
                multiple regression,qsar,r package,caret-based tool
                Chemoinformatics
                multiple regression, qsar, r package, caret-based tool

                Comments

                Comment on this article