53
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: not found

          A comparison of inclusive and restrictive strategies in modern missing data procedures.

          Two classes of modern missing data procedures, maximum likelihood (ML) and multiple imputation (MI), tend to yield similar results when implemented in comparable ways. In either approach, it is possible to include auxiliary variables solely for the purpose of improving the missing data procedure. A simulation was presented to assess the potential costs and benefits of a restrictive strategy, which makes minimal use of auxiliary variables, versus an inclusive strategy, which makes liberal use of such variables. The simulation showed that the inclusive strategy is to be greatly preferred. With an inclusive strategy not only is there a reduced chance of inadvertently omitting an important cause of missingness, there is also the possibility of noticeable gains in terms of increased efficiency and reduced bias, with only minor costs. As implemented in currently available software, the ML approach tends to encourage the use of a restrictive strategy, whereas the MI approach makes it relatively simple to use an inclusive strategy.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Multiple imputation: current perspectives.

            This paper provides an overview of multiple imputation and current perspectives on its use in medical research. We begin with a brief review of the problem of handling missing data in general and place multiple imputation in this context, emphasizing its relevance for longitudinal clinical trials and observational studies with missing covariates. We outline how multiple imputation proceeds in practice and then sketch its rationale. We explore the problem of obtaining proper imputations in some detail and distinguish two main classes of approach, methods based on fully multivariate models, and those that iterate conditional univariate models. We show how the use of so-called uncongenial imputation models are particularly valuable for sensitivity analyses and also for certain analyses in clinical trial settings. We also touch upon other forms of sensitivity analysis that use multiple imputation. Finally, we give some open questions that the increasing use of multiple imputation has thrown up, which we believe are useful directions for future research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods

              Background Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X 2. In 'passive imputation' a value X* is imputed for X and then X 2 is imputed as (X*)2. A recent proposal is to treat X 2 as 'just another variable' (JAV) and impute X and X 2 under multivariate normality. Methods We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X 2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. Results JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. Conclusions Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.
                Bookmark

                Author and article information

                Journal
                Stat Methods Med Res
                Stat Methods Med Res
                SMM
                spsmm
                Statistical Methods in Medical Research
                SAGE Publications (Sage UK: London, England )
                0962-2802
                1477-0334
                August 2015
                August 2015
                : 24
                : 4 , Special Issue dedicated to James Roger
                : 462-487
                Affiliations
                [1 ]Department of Medical Statistics, London School of Hygiene & Tropical Medicine, UK
                [2 ]MRC Biostatistics Unit, Cambridge, UK
                [3 ]MRC Clinical Trials Unit, London, UK
                Author notes

                *See Acknowledgement.

                [*]Jonathan W Bartlett, Department of Medical Statistics, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK. Email: Jonathan.Bartlett@ 123456lshtm.ac.uk
                Article
                10.1177_0962280214521348
                10.1177/0962280214521348
                4513015
                24525487
                d03adbe0-2e41-4935-b238-19509f25626f
                © The Author(s) 2014

                This article is distributed under the terms of the Creative Commons Attribution 3.0 License ( http://www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page ( http://www.uk.sagepub.com/aboutus/openaccess.htm).

                History
                Categories
                Articles

                multiple imputation,compatibility,non-linearities,interactions,rejection sampling,fully conditional specification

                Comments

                Comment on this article