43
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An efficient record linkage scheme using graphical analysis for identifier error detection

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretically unique identifiers, such as NHS numbers, which are both incomplete and error-prone.

          Methods

          We describe a two-step record linkage algorithm in which identifiers with high cardinality are identified or generated, and used to perform an initial exact match based linkage. Subsequently, the resulting clusters are studied and, if appropriate, partitioned using a graph based algorithm detecting erroneous identifiers.

          Results

          The system was used to cluster over 250 million health records from five data sources within a large UK hospital group. Linkage, which was completed in about 30 minutes, yielded 3.6 million clusters of which about 99.8% contain, with high likelihood, records from one patient. Although computationally efficient, the algorithm's requirement for exact matching of at least one identifier of each record to another for cluster formation may be a limitation in some databases containing records of low identifier quality.

          Conclusions

          The technique described offers a simple, fast and highly efficient two-step method for large scale initial linkage for records commonly found in the UK's National Health Service.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Article: not found

          Graph clustering

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A Theory for Record Linkage

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mortality after Staphylococcus aureus bacteraemia in two hospitals in Oxfordshire, 1997-2003: cohort study.

              To determine the incidence of methicillin resistant and methicillin sensitive Staphylococcus aureus (MRSA and MSSA) bacteraemia in inpatients and associated mortality within 30 days after diagnosis. Anonymised record linkage study of data from hospital information systems and microbiology databases. Teaching hospital and district general hospital in Oxfordshire. Inpatients aged 18 or over admitted to a teaching hospital between 1 April 1997 and 31 March 2004 and to a district general hospital between 1 April 1999 and 31 March 2004. The main part of the study comprised 216 644 inpatients; patients admitted to haematology, nephrology, or oncology services were not included because most were managed as outpatients. Nosocomial MSSA and MRSA bacteraemia; death in hospital within 30 days after bacteraemia. Rates of S aureus bacteraemia rose between 1997 and 2003, and MRSA was responsible for this increase. Overall mortality 30 days after bacteraemia was 29%. The crude odds ratio for death after MRSA bacteraemia compared with MSSA bacteraemia was 1.49 (95% confidence interval 0.99 to 2.26). The spread of MRSA has greatly increased the overall number of cases of S aureus bacteraemia and has contributed to short term mortality after S aureus bacteraemia.
                Bookmark

                Author and article information

                Journal
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central
                1472-6947
                2011
                1 February 2011
                : 11
                : 7
                Affiliations
                [1 ]NIHR Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
                [2 ]MRC Clinical Trials Unit, London, UK
                Article
                1472-6947-11-7
                10.1186/1472-6947-11-7
                3039555
                21284874
                f1392972-f496-43ad-9349-e367da0faaf6
                Copyright ©2011 Finney et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 June 2010
                : 1 February 2011
                Categories
                Research Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article