10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Imbalanced target prediction with pattern discovery on clinical data repositories

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice.

          Methods

          We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed.

          Results

          We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant ( p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance.

          Conclusions

          Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12911-017-0443-3) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          Assessing computational tools for the discovery of transcription factor binding sites.

          The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            Fast Effective Rule Induction

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              A study of cross-validation and bootstrap for accuracy estimation and model selection in

                Bookmark

                Author and article information

                Contributors
                cyrus.chan@philips.com
                liyuxi@pku.edu.cn
                choo.chiap.chiau@philips.com
                Jane.ZHU@philips.com
                jiangjie417@vip.163.com
                huoyong@263.net.cn
                Journal
                BMC Med Inform Decis Mak
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central (London )
                1472-6947
                20 April 2017
                20 April 2017
                2017
                : 17
                : 47
                Affiliations
                [1 ]Philips Research China - Health Systems, China, Philips Innovation Campus Shanghai, No. 1 Building, 10, Lane 888, Tian Lin Road, Shanghai, 200233 China
                [2 ]ISNI 0000 0004 1764 1621, GRID grid.411472.5, , Peking University First Hospital, ; Beijing, China
                Article
                443
                10.1186/s12911-017-0443-3
                5399417
                28427384
                783fa6f8-eb3e-4ca6-8db7-bea0ad8d1e65
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 20 October 2016
                : 11 April 2017
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                pattern discovery,data mining,prediction,imbalanced data,clinical data repository

                Comments

                Comment on this article