Optimal Caliper Width for Propensity Score Matching of Three Treatment Groups: A Monte Carlo Study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Propensity score matching is a method to reduce bias in non-randomized and observational studies. Propensity score matching is mainly applied to two treatment groups rather than multiple treatment groups, because some key issues affecting its application to multiple treatment groups remain unsolved, such as the matching distance, the assessment of balance in baseline variables, and the choice of optimal caliper width. The primary objective of this study was to compare propensity score matching methods using different calipers and to choose the optimal caliper width for use with three treatment groups. The authors used caliper widths from 0.1 to 0.8 of the pooled standard deviation of the logit of the propensity score, in increments of 0.1. The balance in baseline variables was assessed by standardized difference. The matching ratio, relative bias, and mean squared error (MSE) of the estimate between groups in different propensity score-matched samples were also reported. The results of Monte Carlo simulations indicate that matching using a caliper width of 0.2 of the pooled standard deviation of the logit of the propensity score affords superior performance in the estimation of treatment effects. This study provides practical solutions for the application of propensity score matching of three treatment groups.

Related collections

Most cited references 9

Record: found
Abstract: found
Article: not found

A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003.

Peter C Austin (2008)

Propensity-score methods are increasingly being used to reduce the impact of treatment-selection bias in the estimation of treatment effects using observational data. Commonly used propensity-score methods include covariate adjustment using the propensity score, stratification on the propensity score, and propensity-score matching. Empirical and theoretical research has demonstrated that matching on the propensity score eliminates a greater proportion of baseline differences between treated and untreated subjects than does stratification on the propensity score. However, the analysis of propensity-score-matched samples requires statistical methods appropriate for matched-pairs data. We critically evaluated 47 articles that were published between 1996 and 2003 in the medical literature and that employed propensity-score matching. We found that only two of the articles reported the balance of baseline characteristics between treated and untreated subjects in the matched sample and used correct statistical methods to assess the degree of imbalance. Thirteen (28 per cent) of the articles explicitly used statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance. Common errors included using the log-rank test to compare Kaplan-Meier survival curves in the matched sample, using Cox regression, logistic regression, chi-squared tests, t-tests, and Wilcoxon rank sum tests in the matched sample, thereby failing to account for the matched nature of the data. We provide guidelines for the analysis and reporting of studies that employ propensity-score matching. Copyright (c) 2007 John Wiley & Sons, Ltd.

0 comments Cited 337 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study.

Peter C Austin, Paul Grootendorst, Geoffrey Anderson (2007)

The propensity score--the probability of exposure to a specific treatment conditional on observed variables--is increasingly being used in observational studies. Creating strata in which subjects are matched on the propensity score allows one to balance measured variables between treated and untreated subjects. There is an ongoing controversy in the literature as to which variables to include in the propensity score model. Some advocate including those variables that predict treatment assignment, while others suggest including all variables potentially related to the outcome, and still others advocate including only variables that are associated with both treatment and outcome. We provide a case study of the association between drug exposure and mortality to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis. In order to investigate this issue more comprehensively, we conducted a series of Monte Carlo simulations of the performance of propensity score models that contained variables related to treatment allocation, or variables that were confounders for the treatment-outcome pair, or variables related to outcome or all variables related to either outcome or treatment or neither. We compared the use of these different propensity scores models in matching and stratification in terms of the extent to which they balanced variables. We demonstrated that all propensity scores models balanced measured confounders between treated and untreated subjects in a propensity-score matched sample. However, including only the true confounders or the variables predictive of the outcome in the propensity score model resulted in a substantially larger number of matched pairs than did using the treatment-allocation model. Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. Greater balance between treated and untreated subjects was obtained after matching on the propensity score than after stratifying on the quintiles of the propensity score. When a confounding variable was omitted from any of the propensity score models, then matching or stratifying on the propensity score resulted in residual imbalance in prognostically important variables between treated and untreated subjects. We considered four propensity score models for estimating treatment effects: the model that included only true confounders; the model that included all variables associated with the outcome; the model that included all measured variables; and the model that included all variables associated with treatment selection. Reduction in bias when estimating a null treatment effect was equivalent for all four propensity score models when propensity score matching was used. Reduction in bias was marginally greater for the first two propensity score models than for the last two propensity score models when stratification on the quintiles of the propensity score model was employed. Furthermore, omitting a confounding variable from the propensity score model resulted in biased estimation of the treatment effect. Finally, the mean squared error for estimating a null treatment effect was lower when either of the first two propensity scores was used compared to when either of the last two propensity score models was used. Copyright 2006 John Wiley & Sons, Ltd.

0 comments Cited 299 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations.

Peter C Austin (2009)

Propensity-score matching is increasingly being used to reduce the impact of treatment-selection bias when estimating causal treatment effects using observational data. Several propensity-score matching methods are currently employed in the medical literature: matching on the logit of the propensity score using calipers of width either 0.2 or 0.6 of the standard deviation of the logit of the propensity score; matching on the propensity score using calipers of 0.005, 0.01, 0.02, 0.03, and 0.1; and 5 --> 1 digit matching on the propensity score. We conducted empirical investigations and Monte Carlo simulations to investigate the relative performance of these competing methods. Using a large sample of patients hospitalized with a heart attack and with exposure being receipt of a statin prescription at hospital discharge, we found that the 8 different methods produced propensity-score matched samples in which qualitatively equivalent balance in measured baseline variables was achieved between treated and untreated subjects. Seven of the 8 propensity-score matched samples resulted in qualitatively similar estimates of the reduction in mortality due to statin exposure. 5 --> 1 digit matching resulted in a qualitatively different estimate of relative risk reduction compared to the other 7 methods. Using Monte Carlo simulations, we found that matching using calipers of width of 0.2 of the standard deviation of the logit of the propensity score and the use of calipers of width 0.02 and 0.03 tended to have superior performance for estimating treatment effects. 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

0 comments Cited 209 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Robert K. Hills: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2013

Publication date (Electronic): 11 December 2013

Volume: 8

Issue: 12

Electronic Location Identifier: e81045

Affiliations

[1 ]Medical Department, The 309th Hospital of Chinese People's Liberation Army, Beijing, China

[2 ]Department of Health Statistics, The Fourth Military Medical University, Xi'an, Shaanxi, China

[3 ]Information Center, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi, China

[4 ]Department of Gastroenterology, The 309th Hospital of Chinese People's Liberation Army, Beijing, China

Cardiff University, United Kingdom

Author notes

* E-mail: hwcai@ 123456fmmu.edu.cn (HC); jielaixia@ 123456yahoo.com (JX)

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: JX. Performed the experiments: HC. Analyzed the data: YW. Contributed reagents/materials/analysis tools: LW. Wrote the paper: YW. Conducted the literature review: CL ZJ. Contributed the worked example: JS.

Article

Publisher ID: PONE-D-13-30930

DOI: 10.1371/journal.pone.0081045

PMC ID: 3859481

PubMed ID: 24349029

SO-VID: a904126a-bbdd-48ad-9793-3fe4acc47899

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 19 June 2013

Date accepted : 17 October 2013

Page count

Pages: 7

Funding

This study was supported by research grants 30800952 and 81001290 from the National Natural Science Foundation of China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Optimal Caliper Width for Propensity Score Matching of Three Treatment Groups: A Monte Carlo Study

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 9

A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003.

A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study.

Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 246

Cited by 55

Most referenced authors 133