10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Random Forests Are Able to Identify Differences in Clotting Dynamics from Kinetic Models of Thrombin Generation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Current methods for distinguishing acute coronary syndromes such as heart attack from stable coronary artery disease, based on the kinetics of thrombin formation, have been limited to evaluating sensitivity of well-established chemical species (e.g., thrombin) using simple quantifiers of their concentration profiles (e.g., maximum level of thrombin concentration, area under the thrombin concentration versus time curve). In order to get an improved classifier, we use a 34-protein factor clotting cascade model and convert the simulation data into a high-dimensional representation (about 19000 features) using a piecewise cubic polynomial fit. Then, we systematically find plausible assays to effectively gauge changes in acute coronary syndrome/coronary artery disease populations by introducing a statistical learning technique called Random Forests. We find that differences associated with acute coronary syndromes emerge in combinations of a handful of features. For instance, concentrations of 3 chemical species, namely, active alpha-thrombin, tissue factor-factor VIIa-factor Xa ternary complex, and intrinsic tenase complex with factor X, at specific time windows, could be used to classify acute coronary syndromes to an accuracy of about 87.2%. Such a combination could be used to efficiently assay the coagulation system.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: not found

          Random forest: a classification and regression tool for compound classification and QSAR modeling.

          A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. We also present three additional features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compound similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Gene selection and classification of microarray data using random forest

            Background Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection. Results We investigate the use of random forest for classification of microarray data (including multi-class problems) and propose a new method of gene selection in classification problems based on random forest. Using simulated and nine microarray data sets we show that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy. Conclusion Because of its performance and features, random forest and gene selection using random forest should probably become part of the "standard tool-box" of methods for class prediction and gene selection with microarray data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Application and comparison of classification algorithms for recognition of Alzheimer's disease in electrical brain activity (EEG).

              The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2016
                12 May 2016
                : 11
                : 5
                : e0153776
                Affiliations
                [1 ]Department of Mechanical Engineering, Texas A&M University, College Station, Texas, United States of America
                [2 ]Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas, United States of America
                [3 ]Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
                Kurume University School of Medicine, JAPAN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Analyzed the data: JA SB KN ARS. Wrote the paper: JA SB ARS.

                Article
                PONE-D-15-35489
                10.1371/journal.pone.0153776
                4865224
                27171403
                1ba4aff0-d044-4266-a3ab-124b49dd7943
                © 2016 Arumugam et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 12 August 2015
                : 4 April 2016
                Page count
                Figures: 4, Tables: 3, Pages: 12
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 1028894
                Award Recipient :
                This work was supported by National Science Foundation, Grant # 1028894 ( www.nsf.gov).
                Categories
                Research Article
                Biology and Life Sciences
                Biochemistry
                Proteins
                Thrombin
                Engineering and Technology
                Management Engineering
                Decision Analysis
                Decision Trees
                Research and Analysis Methods
                Decision Analysis
                Decision Trees
                Physical Sciences
                Mathematics
                Algebra
                Polynomials
                Medicine and Health Sciences
                Vascular Medicine
                Coronary Artery Disease
                Research and Analysis Methods
                Simulation and Modeling
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Artificial Intelligence
                Machine Learning
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Medicine and Health Sciences
                Hematology
                Blood Coagulation
                Biology and Life Sciences
                Anatomy
                Body Fluids
                Blood
                Medicine and Health Sciences
                Anatomy
                Body Fluids
                Blood
                Biology and Life Sciences
                Physiology
                Body Fluids
                Blood
                Medicine and Health Sciences
                Physiology
                Body Fluids
                Blood
                Medicine and Health Sciences
                Hematology
                Blood
                Custom metadata
                All relevant data are within the paper and its Supporting Information files.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article