To investigate the range of methods used to validate diagnoses in the General Practice Research Database (GPRD), to summarize findings and to assess the quality of these validations.
A systematic literature review was performed by searching PubMed and Embase for publications using GPRD data published between 1987 and April 2008. Additional publications were identified from conference proceedings, back issues of relevant journals, bibliographies of retrieved publications and relevant websites. Publications that reported attempts to validate disease diagnoses recorded in the GPRD were included.
We identified 212 publications, often validating more than one diagnosis. In total, 357 validations investigating 183 different diagnoses met our inclusion criteria. Of these, 303 (85%) utilized data from outside the GPRD to validate diagnoses. The remainder utilized only data recorded in the database. The median proportion of cases with a confirmed diagnosis was 89% (range 24–100%). Details of validation methods and results were often incomplete.
A number of methods have been used to assess validity. Overall, estimates of validity were high. However, the quality of reporting of the validations was often inadequate to permit a clear interpretation. Not all methods provided a quantitative estimate of validity and most methods considered only the positive predictive value of a set of diagnostic codes in a highly selected group of cases. We make recommendations for methodology and reporting to strengthen further the use of the GPRD in research.