• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Most studies have some missing data. Jonathan Sterne and colleagues describe the appropriate use and reporting of the multiple imputation approach to dealing with them

      Related collections

      Most cited references 14

      • Record: found
      • Abstract: not found
      • Article: not found

      Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.

        • Record: found
        • Abstract: found
        • Article: not found

        Multiple imputation of missing blood pressure covariates in survival analysis.

        This paper studies a non-response problem in survival analysis where the occurrence of missing data in the risk factor is related to mortality. In a study to determine the influence of blood pressure on survival in the very old (85+ years), blood pressure measurements are missing in about 12.5 per cent of the sample. The available data suggest that the process that created the missing data depends jointly on survival and the unknown blood pressure, thereby distorting the relation of interest. Multiple imputation is used to impute missing blood pressure and then analyse the data under a variety of non-response models. One special modelling problem is treated in detail; the construction of a predictive model for drawing imputations if the number of variables is large. Risk estimates for these data appear robust to even large departures from the simplest non-response model, and are similar to those derived under deletion of the incomplete records.
          • Record: found
          • Abstract: found
          • Article: not found

          Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study.

          To derive a new cardiovascular disease risk score (QRISK) for the United Kingdom and to validate its performance against the established Framingham cardiovascular disease algorithm and a newly developed Scottish score (ASSIGN). Prospective open cohort study using routinely collected data from general practice. UK practices contributing to the QRESEARCH database. The derivation cohort consisted of 1.28 million patients, aged 35-74 years, registered at 318 practices between 1 January 1995 and 1 April 2007 and who were free of diabetes and existing cardiovascular disease. The validation cohort consisted of 0.61 million patients from 160 practices. First recorded diagnosis of cardiovascular disease (incident diagnosis between 1 January 1995 and 1 April 2007): myocardial infarction, coronary heart disease, stroke, and transient ischaemic attacks. Risk factors were age, sex, smoking status, systolic blood pressure, ratio of total serum cholesterol to high density lipoprotein, body mass index, family history of coronary heart disease in first degree relative aged less than 60, area measure of deprivation, and existing treatment with antihypertensive agent. A cardiovascular disease risk algorithm (QRISK) was developed in the derivation cohort. In the validation cohort the observed 10 year risk of a cardiovascular event was 6.60% (95% confidence interval 6.48% to 6.72%) in women and 9.28% (9.14% to 9.43%) in men. Overall the Framingham algorithm over-predicted cardiovascular disease risk at 10 years by 35%, ASSIGN by 36%, and QRISK by 0.4%. Measures of discrimination tended to be higher for QRISK than for the Framingham algorithm and it was better calibrated to the UK population than either the Framingham or ASSIGN models. Using QRISK 8.5% of patients aged 35-74 are at high risk (20% risk or higher over 10 years) compared with 13% when using the Framingham algorithm and 14% when using ASSIGN. Using QRISK 34% of women and 73% of men aged 64-75 would be at high risk compared with 24% and 86% according to the Framingham algorithm. UK estimates for 2005 based on QRISK give 3.2 million patients aged 35-74 at high risk, with the Framingham algorithm predicting 4.7 million and ASSIGN 5.1 million. Overall, 53 668 patients in the validation dataset (9% of the total) would be reclassified from high to low risk or vice versa using QRISK compared with the Framingham algorithm. QRISK performed at least as well as the Framingham model for discrimination and was better calibrated to the UK population than either the Framingham model or ASSIGN. QRISK is likely to provide more appropriate risk estimates to help identify high risk patients on the basis of age, sex, and social deprivation. It is therefore likely to be a more equitable tool to inform management decisions and help ensure treatments are directed towards those most likely to benefit. It includes additional variables which improve risk estimates for patients with a positive family history or those on antihypertensive treatment. However, since the validation was performed in a similar population to the population from which the algorithm was derived, it potentially has a "home advantage." Further validation in other populations is therefore required.

            Author and article information

            [1 ]Department of Social Medicine, University of Bristol, Bristol BS8 2PR
            [2 ]MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR
            [3 ]Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, and University of Melbourne, Parkville, Victoria 3052, Australia
            [4 ]Cancer and Statistical Methodology Groups, MRC Clinical Trials Unit, London NW1 2DA
            [5 ]Medical Statistics Unit, London School of Hygiene and Tropical Medicine London, WC1E 7HT
            [6 ]Department of Public Health and Primary Care, Institute of Public Health, Cambridge
            Author notes
            Correspondence to: J A C Sterne jonathan.sterne@
            Role: professor of medical statistics and epidemiology
            Role: senior scientist
            Role: director of clinical epidemiology and biostatistics unit
            Role: research associate
            Role: senior scientist
            Role: professor of biostatistics
            Role: lecturer in biostatistics
            Role: reader in medical and social statistics
            BMJ : British Medical Journal
            BMJ Publishing Group Ltd.
            29 June 2009
            : 338
            © Sterne et al 2009

            This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            Research Methods & Reporting
            Headache (including migraine)
            Pain (neurology)



            Comment on this article