50
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access
      research-article
      ,
      BMC Medical Informatics and Decision Making
      BioMed Central
      The 5th Translational Bioinformatics Conference (TBC 2015)
      7-9 November 2015

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Nearest neighbor (NN) imputation algorithms are efficient methods to fill in missing data where each missing value on some records is replaced by a value obtained from related cases in the whole set of records. Besides the capability to substitute the missing data with plausible values that are as close as possible to the true value, imputation algorithms should preserve the original data structure and avoid to distort the distribution of the imputed variable. Despite the efficiency of NN algorithms little is known about the effect of these methods on data structure.

          Methods

          Simulation on synthetic datasets with different patterns and degrees of missingness were conducted to evaluate the performance of NN with one single neighbor (1NN) and with k neighbors without (kNN) or with weighting (wkNN) in the context of different learning frameworks: plain set, reduced set after ReliefF filtering, bagging, random choice of attributes, bagging combined with random choice of attributes (Random-Forest-like method).

          Results

          Whatever the framework, kNN usually outperformed 1NN in terms of precision of imputation and reduced errors in inferential statistics, 1NN was however the only method capable of preserving the data structure and data were distorted even when small values of k neighbors were considered; distortion was more severe for resampling schemas.

          Conclusions

          The use of three neighbors in conjunction with ReliefF seems to provide the best trade-off between imputation error and preservation of the data structure. The very same conclusions can be drawn when imputation experiments were conducted on the single proton emission computed tomography (SPECTF) heart dataset after introduction of missing data completely at random.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Missing value estimation methods for DNA microarrays.

          Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            :{unav)

            Machine Learning, 24(2), 123-140
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Applications of multiple imputation in medical studies: from AIDS to NHANES.

              Rubin's multiple imputation is a three-step method for handling complex missing data, or more generally, incomplete-data problems, which arise frequently in medical studies. At the first step, m (> 1) completed-data sets are created by imputing the unobserved data m times using m independent draws from an imputation model, which is constructed to reasonably approximate the true distributional relationship between the unobserved data and the available information, and thus reduce potentially very serious nonresponse bias due to systematic difference between the observed data and the unobserved ones. At the second step, m complete-data analyses are performed by treating each completed-data set as a real complete-data set, and thus standard complete-data procedures and software can be utilized directly. At the third step, the results from the m complete-data analyses are combined in a simple, appropriate way to obtain the so-called repeated-imputation inference, which properly takes into account the uncertainty in the imputed values. This paper reviews three applications of Rubin's method that are directly relevant for medical studies. The first is about estimating the reporting delay in acquired immune deficiency syndrome (AIDS) surveillance systems for the purpose of estimating survival time after AIDS diagnosis. The second focuses on the issue of missing data and noncompliance in randomized experiments, where a school choice experiment is used as an illustration. The third looks at handling nonresponse in United States National Health and Nutrition Examination Surveys (NHANES). The emphasis of our review is on the building of imputation models (i.e. the first step), which is the most fundamental aspect of the method.
                Bookmark

                Author and article information

                Contributors
                lorberimm@hotmail.com
                Conference
                BMC Med Inform Decis Mak
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central (London )
                1472-6947
                25 July 2016
                25 July 2016
                2016
                : 16
                Issue : Suppl 3 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. Articles have undergone the journal’s standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.
                : 74
                Affiliations
                Referral Center for Systemic Autoimmune Diseases, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
                Article
                318
                10.1186/s12911-016-0318-z
                4959387
                27454392
                b6dec014-3913-4cc0-932b-639549837151
                © The Author(s). 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                The 5th Translational Bioinformatics Conference (TBC 2015)
                Tokyo, Japan
                7-9 November 2015
                History
                Categories
                Research
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article