47
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Missing data and multiple imputation in clinical epidemiological research

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          Missing data: our view of the state of the art.

          Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Multiple Imputation after 18+ Years

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Using the outcome for imputation of missing predictor values was preferred.

              Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.
                Bookmark

                Author and article information

                Journal
                Clin Epidemiol
                Clin Epidemiol
                Clinical Epidemiology
                Clinical Epidemiology
                Dove Medical Press
                1179-1349
                2017
                15 March 2017
                : 9
                : 157-166
                Affiliations
                [1 ]Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus N, Denmark
                [2 ]Department of Primary Care and Population Health, University College London, London, UK
                Author notes
                Correspondence: Alma B Pedersen, Department of Clinical Epidemiology, Aarhus University Hospital, Olof Palmes Alle 43-45, 8200 Aarhus N, Denmark, Email abp@ 123456clin.au.dk
                Article
                clep-9-157
                10.2147/CLEP.S129785
                5358992
                28352203
                3c495758-9b4c-42ad-bbef-d0d57afdb60e
                © 2017 Pedersen et al. This work is published and licensed by Dove Medical Press Limited

                The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License ( http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed.

                History
                Categories
                Methodology

                Public health
                missing data,observational study,multiple imputation,mar,mcar,mnar
                Public health
                missing data, observational study, multiple imputation, mar, mcar, mnar

                Comments

                Comment on this article