14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach

      product-review
      1 , 2 ,
      BMC Medical Informatics and Decision Making
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Data requirements by governments, donors and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyse time trends and geographical patterns of demographic and health related indicators. However, since not all datasets have the same structure, variables definitions and codes, they have to be harmonised prior to submitting them to the statistical analyses. Manually searching, renaming and recoding variables are extremely tedious and prone to errors tasks, overall when the number of datasets and variables are large. This article presents an automated approach to harmonise variables names across several datasets, which optimises the search of variables, minimises manual inputs and reduces the risk of error.

          Results

          Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A) captures particular cases that could not be solved in an automated way in the search iterations; the second search (B) is run if search A produced no hits and identifies variables the labels of which contain certain key terms defined by the user. If this search produces no hits, a third one (C) is run to retrieve variables which have been identified in other surveys, as an illustration. For each variable of interest, the outputs of these engines can be (O1) a single best matching variable is found, (O2) more than one matching variable is found or (O3) not matching variables are found. Output O2 is solved by user judgement. Examples using four variables are presented showing that the searches have a 100% sensitivity and specificity after a second iteration.

          Conclusion

          Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets. This is especially relevant when the numbers of datasets or variables to be included are large.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: found
          • Article: not found

          An assessment of interactions between global health initiatives and country health systems.

          (2009)
          Since 2000, the emergence of several large disease-specific global health initiatives (GHIs) has changed the way in which international donors provide assistance for public health. Some critics have claimed that these initiatives burden health systems that are already fragile in countries with few resources, whereas others have asserted that weak health systems prevent progress in meeting disease-specific targets. So far, most of the evidence for this debate has been provided by speculation and anecdotes. We use a review and analysis of existing data, and 15 new studies that were submitted to WHO for the purpose of writing this Report to describe the complex nature of the interplay between country health systems and GHIs. We suggest that this Report provides the most detailed compilation of published and emerging evidence so far, and provides a basis for identification of the ways in which GHIs and health systems can interact to mutually reinforce their effects. On the basis of the findings, we make some general recommendations and identify a series of action points for international partners, governments, and other stakeholders that will help ensure that investments in GHIs and country health systems can fulfil their potential to produce comprehensive and lasting results in disease-specific work, and advance the general public health agenda. The target date for achievement of the health-related Millennium Development Goals is drawing close, and the economic downturn threatens to undermine the improvements in health outcomes that have been achieved in the past few years. If adjustments to the interactions between GHIs and country health systems will improve efficiency, equity, value for money, and outcomes in global public health, then these opportunities should not be missed.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Co-coverage of preventive interventions and implications for child-survival strategies: evidence from national surveys.

            In most low-income countries, several child-survival interventions are being implemented. We assessed how these interventions are clustered at the level of the individual child. We analysed data from Bangladesh, Benin, Brazil, Cambodia, Eritrea, Haiti, Malawi, Nepal, and Nicaragua. A co-coverage score was obtained by adding the number of interventions received by each child (including BCG, diphtheria-pertussis-tetanus, and measles vaccines), tetanus toxoid for the mother, vitamin A supplementation, antenatal care, skilled delivery, and safe water. Socioeconomic status was assessed through principal components analysis of household assets, and concentration indices were calculated. The percentage of children who did not receive a single intervention ranged from 0.3% (14/5495) in Nicaragua to 18.8% (1154/6144) in Cambodia. The proportions receiving all available interventions varied from 0.8% (48/6144) in Cambodia to 13.3% (733/5495) in Nicaragua. There were substantial inequities within all countries. In the poorest wealth quintile, 31% of Cambodian children received no interventions and 17% only one intervention; in Haiti, these figures were 15% and 17%, respectively. Inequities were inversely related to coverage levels. Countries with higher coverage rates tended to show bottom inequity patterns, with the poorest lagging behind all other groups, whereas low-coverage countries showed top inequities with the rich substantially above the rest. The inequitable clustering of interventions at the level of the child raises the possibility that the introduction of new technologies might primarily benefit children who are already covered by existing interventions. Packaging several interventions through a single delivery strategy, while making economic sense, could contribute to increased inequities unless population coverage is very high. Co-coverage analyses of child-health surveys provide a way to assess these issues.
              Bookmark

              Author and article information

              Journal
              BMC Med Inform Decis Mak
              BMC Medical Informatics and Decision Making
              BioMed Central
              1472-6947
              2011
              19 May 2011
              : 11
              : 33
              Affiliations
              [1 ]Swiss Tropical and Public Health Institute, Socinstrasse 57, Basel 4051, Switzerland
              [2 ]University of Basel, Basel, Switzerland
              Article
              1472-6947-11-33
              10.1186/1472-6947-11-33
              3123542
              21595905
              9c2d3286-2f03-43c2-bd82-b52ee18d2584
              Copyright ©2011 Bosch-Capblanch; licensee BioMed Central Ltd.

              This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

              History
              : 10 January 2011
              : 19 May 2011
              Categories
              Software

              Bioinformatics & Computational biology
              Bioinformatics & Computational biology

              Comments

              Comment on this article