5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Outcome-sensitive multiple imputation: a simulation study

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels.

          Methods

          We used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20–80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven multiple imputation methods which deleted cases with missing outcome before imputation, after imputation or not at all; included or did not include the outcome in the imputation models; and included or did not include a secondary outcome in the imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario.

          Results

          Overall, there was very little to separate multiple imputation methods which included the outcome in the imputation model. Even when missingness was quite extensive, all multiple imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple imputation methods protected less well against missingness not at random, but did offer some protection.

          Conclusions

          As long as the outcome is included in the imputation model, there are very small performance differences between the possible multiple imputation approaches: no outcome imputation, imputation or imputation and deletion. All informative covariates, even with very high levels of missingness, should be included in the multiple imputation model. Multiple imputation offers some protection against a simple missing not at random mechanism.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12874-016-0281-5) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: not found
          • Article: not found

          Multiple Imputation after 18+ Years

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Using the outcome for imputation of missing predictor values was preferred.

            Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Handling missing data in RCTs; a review of the top medical journals

              Background Missing outcome data is a threat to the validity of treatment effect estimates in randomized controlled trials. We aimed to evaluate the extent, handling, and sensitivity analysis of missing data and intention-to-treat (ITT) analysis of randomized controlled trials (RCTs) in top tier medical journals, and compare our findings with previous reviews related to missing data and ITT in RCTs. Methods Review of RCTs published between July and December 2013 in the BMJ, JAMA, Lancet, and New England Journal of Medicine, excluding cluster randomized trials and trials whose primary outcome was survival. Results Of the 77 identified eligible articles, 73 (95%) reported some missing outcome data. The median percentage of participants with a missing outcome was 9% (range 0 – 70%). The most commonly used method to handle missing data in the primary analysis was complete case analysis (33, 45%), while 20 (27%) performed simple imputation, 15 (19%) used model based methods, and 6 (8%) used multiple imputation. 27 (35%) trials with missing data reported a sensitivity analysis. However, most did not alter the assumptions of missing data from the primary analysis. Reports of ITT or modified ITT were found in 52 (85%) trials, with 21 (40%) of them including all randomized participants. A comparison to a review of trials reported in 2001 showed that missing data rates and approaches are similar, but the use of the term ITT has increased, as has the report of sensitivity analysis. Conclusions Missing outcome data continues to be a common problem in RCTs. Definitions of the ITT approach remain inconsistent across trials. A large gap is apparent between statistical methods research related to missing data and use of these methods in application settings, including RCTs in top medical journals. Electronic supplementary material The online version of this article (doi:10.1186/1471-2288-14-118) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Contributors
                e.kontopantelis@manchester.ac.uk
                ian.white@mrc-bsu.cam.ac.uk
                matthew.sperrin@manchester.ac.uk
                buchan@manchester.ac.uk
                Journal
                BMC Med Res Methodol
                BMC Med Res Methodol
                BMC Medical Research Methodology
                BioMed Central (London )
                1471-2288
                9 January 2017
                9 January 2017
                2017
                : 17
                : 2
                Affiliations
                [1 ]The Farr Institute for Health Informatics Research, University of Manchester, Vaughan House, Manchester, M13 9GB UK
                [2 ]NIHR School for Primary Care Research, Centre for Primary Care, Institute of Population Health, University of Manchester, Manchester, UK
                [3 ]MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
                Article
                281
                10.1186/s12874-016-0281-5
                5220613
                28068910
                862c5a04-91d0-45c2-ac0f-6fcdb4db72e5
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 18 August 2016
                : 19 December 2016
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100000265, Medical Research Council;
                Award ID: MR/K006665/1
                Award ID: MR/K006665/1
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000265, Medical Research Council;
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Medicine
                multiple imputation,imputed outcome,missing data,missingness
                Medicine
                multiple imputation, imputed outcome, missing data, missingness

                Comments

                Comment on this article

                scite_

                Similar content171

                Cited by43

                Most referenced authors179