32
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities

      other

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this policy forum the authors argue that data cleaning is an essential part of the research process, and should be incorporated into study design.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: not found
          • Article: not found

          Post-randomisation exclusions: the intention to treat principle and excluding patients from analysis.

            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Neural Neworks: A Comprehensive Foundation

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Attrition in longitudinal studies. How to deal with missing data.

              The purpose of this paper was to illustrate the influence of missing data on the results of longitudinal statistical analyses [i.e., MANOVA for repeated measurements and Generalised Estimating Equations (GEE)] and to illustrate the influence of using different imputation methods to replace missing data. Besides a complete dataset, four incomplete datasets were considered: two datasets with 10% missing data and two datasets with 25% missing data. In both situations missingness was considered independent and dependent on observed data. Imputation methods were divided into cross-sectional methods (i.e., mean of series, hot deck, and cross-sectional regression) and longitudinal methods (i.e., last value carried forward, longitudinal interpolation, and longitudinal regression). Besides these, also the multiple imputation method was applied and discussed. The analyses were performed on a particular (observational) longitudinal dataset, with particular missing data patterns and imputation methods. The results of this illustration shows that when MANOVA for repeated measurements is used, imputation methods are highly recommendable (because MANOVA as implemented in the software used, uses listwise deletion of cases with a missing value). Applying GEE analysis, imputation methods were not necessary. When imputation methods were used, longitudinal imputation methods were often preferable above cross-sectional imputation methods, in a way that the point estimates and standard errors were closer to the estimates derived from the complete dataset. Furthermore, this study showed that the theoretically more valid multiple imputation method did not lead to different point estimates than the more simple (longitudinal) imputation methods. However, the estimated standard errors appeared to be theoretically more adequate, because they reflect the uncertainty in estimation caused by missing values.
                Bookmark

                Author and article information

                Journal
                PLoS Med
                pmed
                PLoS Medicine
                Public Library of Science (San Francisco, USA )
                1549-1277
                1549-1676
                October 2005
                6 September 2005
                : 2
                : 10
                : e267
                Author notes

                Jan Van den Broeck is an epidemiologist, and Kobus Herbst is a public-health physician at the Africa Centre for Health and Population Studies, Mtubatuba, South Africa. Solveig Argeseanu Cunningham is a demographer at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America. Roger Eeckels is Professor Emeritus of Pediatrics at the Catholic University of Leuven, Leuven, Belgium.

                Competing Interests: The authors have declared that no competing interests exist.

                *To whom correspondence should be addressed. E-mail: jan.broeck@ 123456africacentre.ac.za
                Article
                10.1371/journal.pmed.0020267
                1198040
                16138788
                8a29682d-c0f7-485f-87e7-ce1e1e4fabba
                Copyright: © 2005 Van den Broeck et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
                History
                Categories
                Policy Forum
                Statistics
                Research Methods

                Medicine
                Medicine

                Comments

                Comment on this article