575
views
0
recommends
+1 Recommend
0 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five-number summaries; and graphical methods such as quantile–quantile plots, side-by-side boxplots, and non-parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity-score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.

          Related collections

          Most cited references47

          • Record: found
          • Abstract: found
          • Article: not found

          A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003.

          Propensity-score methods are increasingly being used to reduce the impact of treatment-selection bias in the estimation of treatment effects using observational data. Commonly used propensity-score methods include covariate adjustment using the propensity score, stratification on the propensity score, and propensity-score matching. Empirical and theoretical research has demonstrated that matching on the propensity score eliminates a greater proportion of baseline differences between treated and untreated subjects than does stratification on the propensity score. However, the analysis of propensity-score-matched samples requires statistical methods appropriate for matched-pairs data. We critically evaluated 47 articles that were published between 1996 and 2003 in the medical literature and that employed propensity-score matching. We found that only two of the articles reported the balance of baseline characteristics between treated and untreated subjects in the matched sample and used correct statistical methods to assess the degree of imbalance. Thirteen (28 per cent) of the articles explicitly used statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance. Common errors included using the log-rank test to compare Kaplan-Meier survival curves in the matched sample, using Cox regression, logistic regression, chi-squared tests, t-tests, and Wilcoxon rank sum tests in the matched sample, thereby failing to account for the matched nature of the data. We provide guidelines for the analysis and reporting of studies that employ propensity-score matching. Copyright (c) 2007 John Wiley & Sons, Ltd.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study.

            The propensity score--the probability of exposure to a specific treatment conditional on observed variables--is increasingly being used in observational studies. Creating strata in which subjects are matched on the propensity score allows one to balance measured variables between treated and untreated subjects. There is an ongoing controversy in the literature as to which variables to include in the propensity score model. Some advocate including those variables that predict treatment assignment, while others suggest including all variables potentially related to the outcome, and still others advocate including only variables that are associated with both treatment and outcome. We provide a case study of the association between drug exposure and mortality to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis. In order to investigate this issue more comprehensively, we conducted a series of Monte Carlo simulations of the performance of propensity score models that contained variables related to treatment allocation, or variables that were confounders for the treatment-outcome pair, or variables related to outcome or all variables related to either outcome or treatment or neither. We compared the use of these different propensity scores models in matching and stratification in terms of the extent to which they balanced variables. We demonstrated that all propensity scores models balanced measured confounders between treated and untreated subjects in a propensity-score matched sample. However, including only the true confounders or the variables predictive of the outcome in the propensity score model resulted in a substantially larger number of matched pairs than did using the treatment-allocation model. Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. Greater balance between treated and untreated subjects was obtained after matching on the propensity score than after stratifying on the quintiles of the propensity score. When a confounding variable was omitted from any of the propensity score models, then matching or stratifying on the propensity score resulted in residual imbalance in prognostically important variables between treated and untreated subjects. We considered four propensity score models for estimating treatment effects: the model that included only true confounders; the model that included all variables associated with the outcome; the model that included all measured variables; and the model that included all variables associated with treatment selection. Reduction in bias when estimating a null treatment effect was equivalent for all four propensity score models when propensity score matching was used. Reduction in bias was marginally greater for the first two propensity score models than for the last two propensity score models when stratification on the quintiles of the propensity score model was employed. Furthermore, omitting a confounding variable from the propensity score model resulted in biased estimation of the treatment effect. Finally, the mean squared error for estimating a null treatment effect was lower when either of the first two propensity scores was used compared to when either of the last two propensity score models was used. Copyright 2006 John Wiley & Sons, Ltd.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations.

              Propensity-score matching is increasingly being used to reduce the impact of treatment-selection bias when estimating causal treatment effects using observational data. Several propensity-score matching methods are currently employed in the medical literature: matching on the logit of the propensity score using calipers of width either 0.2 or 0.6 of the standard deviation of the logit of the propensity score; matching on the propensity score using calipers of 0.005, 0.01, 0.02, 0.03, and 0.1; and 5 --> 1 digit matching on the propensity score. We conducted empirical investigations and Monte Carlo simulations to investigate the relative performance of these competing methods. Using a large sample of patients hospitalized with a heart attack and with exposure being receipt of a statin prescription at hospital discharge, we found that the 8 different methods produced propensity-score matched samples in which qualitatively equivalent balance in measured baseline variables was achieved between treated and untreated subjects. Seven of the 8 propensity-score matched samples resulted in qualitatively similar estimates of the reduction in mortality due to statin exposure. 5 --> 1 digit matching resulted in a qualitatively different estimate of relative risk reduction compared to the other 7 methods. Using Monte Carlo simulations, we found that matching using calipers of width of 0.2 of the standard deviation of the logit of the propensity score and the use of calipers of width 0.02 and 0.03 tended to have superior performance for estimating treatment effects. 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
                Bookmark

                Author and article information

                Journal
                Stat Med
                Stat Med
                sim
                Statistics in Medicine
                John Wiley & Sons, Ltd. (Chichester, UK )
                0277-6715
                1097-0258
                10 November 2009
                15 September 2009
                : 28
                : 25
                : 3083-3107
                Affiliations
                [1 ]Institute for Clinical Evaluative Sciences G1 06, 2075 Bayview Avenue, Toronto, Ontario, Canada
                [2 ]Dalla Lana School of Public Health, University of Toronto Toronto, Ontario, Canada
                [3 ]Department of Health Policy, Management and Evaluation, University of Toronto Canada
                Author notes
                *Correspondence to: Peter C. Austin, Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario, Canada M4N 3M5.

                Contract/grant sponsor: Canadian Institutes of Health Research (CIHR); contract/grant number: MOP 86508

                Article
                10.1002/sim.3697
                3472075
                19757444
                1c57c264-4ab6-4e6e-bca2-7dff7b101e3f
                Copyright © 2009 John Wiley & Sons, Ltd.

                Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.

                History
                : 10 July 2008
                : 16 July 2009
                Categories
                Research Articles

                Biostatistics
                propensity-score matching,bias,matching,goodness-of-fit,observational study,balance,standardized difference,propensity score

                Comments

                Comment on this article