54
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data

      research-article
      1 , * , 1 , 2 , 1 , 2 , 3
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don’t think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data – Electronic Medical Records – typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          The use of receiver operating characteristic curves in biomedical informatics.

          Receiver operating characteristic (ROC) curves are frequently used in biomedical informatics research to evaluate classification and prediction models for decision support, diagnosis, and prognosis. ROC analysis investigates the accuracy of a model's ability to separate positive from negative cases (such as predicting the presence or absence of disease), and the results are independent of the prevalence of positive cases in the study population. It is especially useful in evaluating predictive models or other tests that produce output values over a continuous range, since it captures the trade-off between sensitivity and specificity over that range. There are many ways to conduct an ROC analysis. The best approach depends on the experiment; an inappropriate approach can easily lead to incorrect conclusions. In this article, we review the basic concepts of ROC analysis, illustrate their use with sample calculations, make recommendations drawn from the literature, and list readily available software.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Clinical practice. Gout.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Using electronic health records to drive discovery in disease genomics.

              I Kohane (2011)
              If genomic studies are to be a clinically relevant and timely reflection of the relationship between genetics and health status--whether for common or rare variants--cost-effective ways must be found to measure both the genetic variation and the phenotypic characteristics of large populations, including the comprehensive and up-to-date record of their medical treatment. The adoption of electronic health records, used by clinicians to document clinical care, is becoming widespread and recent studies demonstrate that they can be effectively employed for genetic studies using the informational and biological 'by-products' of health-care delivery while maintaining patient privacy.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2013
                24 June 2013
                : 8
                : 6
                : e66341
                Affiliations
                [1 ]Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
                [2 ]Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
                [3 ]Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
                Children’s National Medical Center, United States of America
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: TAL JCD MAL. Performed the experiments: TAL. Analyzed the data: TAL. Contributed reagents/materials/analysis tools: TAL JCD. Wrote the paper: TAL JCD MAL. Developed the software used in analysis: TAL.

                Article
                PONE-D-12-40629
                10.1371/journal.pone.0066341
                3691199
                23826094
                44b3895f-1ba0-40fe-9a18-fb3907cf37a5
                Copyright @ 2013

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 18 December 2012
                : 7 May 2013
                Page count
                Pages: 13
                Funding
                This work was funded in part by a grant from the Edward Mallinckrodt, Jr. Foundation, and Vanderbilt University Medical Center. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Biological Data Management
                Genetics
                Heredity
                Phenotypes
                Human Genetics
                Personalized Medicine
                Computer Science
                Computing Methods
                Computer Inferencing
                Engineering
                Signal Processing
                Data Mining
                Medicine
                Clinical Genetics
                Personalized Medicine

                Uncategorized
                Uncategorized

                Comments

                Comment on this article