Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

Šinkovec, Hana; Geroldinger, Angelika; Heinze, Georg

doi:10.3390/ijerph16234658

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: found

Is Open Access

Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

research-article

Author(s): Hana Šinkovec , Angelika Geroldinger , Georg Heinze ^*

Publication date (Electronic): 22 November 2019

Journal: International Journal of Environmental Research and Public Health

Publisher: MDPI

Keywords: maximum likelihood estimation, logistic regression, Firth’s correction, separation, penalized likelihood, bias

Read this article at

ScienceOpen Publisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to ‘bring more data’ in order to solve a separation issue. We illustrate the problem by means of examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low.

Related collections

Most cited references 11

Record: found
Abstract: not found
Article: not found

On the existence of maximum likelihood estimates in logistic regression models

Victor A. Albert, J. Anderson (1984)

0 comments Cited 177 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Firth's logistic regression with rare events: accurate effect estimates and predictions?

Rainer Puhr, Georg Heinze, Mariana Nold … (2017)

Firth's logistic regression has become a standard approach for the analysis of binary outcomes with small samples. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards one-half is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted probabilities. We propose two simple modifications of Firth's logistic regression resulting in unbiased predicted probabilities. The first corrects the predicted probabilities by a post hoc adjustment of the intercept. The other is based on an alternative formulation of Firth's penalization as an iterative data augmentation procedure. Our suggested modification consists in introducing an indicator variable that distinguishes between original and pseudo-observations in the augmented data. In a comprehensive simulation study, these approaches are compared with other attempts to improve predictions based on Firth's penalization and to other published penalization strategies intended for routine use. For instance, we consider a recently suggested compromise between maximum likelihood and Firth's logistic regression. Simulation results are scrutinized with regard to prediction and effect estimation. We find that both our suggested methods do not only give unbiased predicted probabilities but also improve the accuracy conditional on explanatory variables compared with Firth's penalization. While one method results in effect estimates identical to those of Firth's penalization, the other introduces some bias, but this is compensated by a decrease in the mean squared error. Finally, all methods considered are illustrated and compared for a study on arterial closure devices in minimally invasive cardiac surgery. Copyright © 2017 John Wiley & Sons, Ltd.

0 comments Cited 80 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Separation in Logistic Regression: Causes, Consequences, and Control.

Mohammad Mansournia, Angelika Geroldinger, Sander Greenland … (2018)

Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection.

0 comments Cited 71 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Int J Environ Res Public Health

Journal ID (iso-abbrev): Int J Environ Res Public Health

Journal ID (publisher-id): ijerph

Title: International Journal of Environmental Research and Public Health

Publisher: MDPI

ISSN (Print): 1661-7827

ISSN (Electronic): 1660-4601

Publication date (Electronic): 22 November 2019

Publication date (Print): December 2019

Volume: 16

Issue: 23

Electronic Location Identifier: 4658

Affiliations

Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria; hana.sinkovec@ 123456meduniwien.ac.at (H.Š.); angelika.geroldinger@ 123456meduniwien.ac.at (A.G.)

Author notes

[* ]Correspondence: georg.heinze@ 123456meduniwien.ac.at

Author information

Angelika Geroldinger https://orcid.org/0000-0002-4659-4911

Georg Heinze https://orcid.org/0000-0003-1147-8491

Article

Publisher ID: ijerph-16-04658

DOI: 10.3390/ijerph16234658

PMC ID: 6926877

PubMed ID: 31766753

SO-VID: 6d757ed0-2d11-4222-961f-6f7eed3a356b

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

Read this article at

Abstract

Related collections

Hurricanes: Open Access

Most cited references 11

On the existence of maximum likelihood estimates in logistic regression models

Firth's logistic regression with rare events: accurate effect estimates and predictions?

Separation in Logistic Regression: Causes, Consequences, and Control.

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 42

Cited by 9

Most referenced authors 171