Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Data requirements by governments, donors and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyse time trends and geographical patterns of demographic and health related indicators. However, since not all datasets have the same structure, variables definitions and codes, they have to be harmonised prior to submitting them to the statistical analyses. Manually searching, renaming and recoding variables are extremely tedious and prone to errors tasks, overall when the number of datasets and variables are large. This article presents an automated approach to harmonise variables names across several datasets, which optimises the search of variables, minimises manual inputs and reduces the risk of error.

Results

Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A) captures particular cases that could not be solved in an automated way in the search iterations; the second search (B) is run if search A produced no hits and identifies variables the labels of which contain certain key terms defined by the user. If this search produces no hits, a third one (C) is run to retrieve variables which have been identified in other surveys, as an illustration. For each variable of interest, the outputs of these engines can be (O1) a single best matching variable is found, (O2) more than one matching variable is found or (O3) not matching variables are found. Output O2 is solved by user judgement. Examples using four variables are presented showing that the searches have a 100% sensitivity and specificity after a second iteration.

Conclusion

Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets. This is especially relevant when the numbers of datasets or variables to be included are large.

Related collections

Most cited references 2

Record: found
Abstract: found
Article: not found

An assessment of interactions between global health initiatives and country health systems.

(2009)

Since 2000, the emergence of several large disease-specific global health initiatives (GHIs) has changed the way in which international donors provide assistance for public health. Some critics have claimed that these initiatives burden health systems that are already fragile in countries with few resources, whereas others have asserted that weak health systems prevent progress in meeting disease-specific targets. So far, most of the evidence for this debate has been provided by speculation and anecdotes. We use a review and analysis of existing data, and 15 new studies that were submitted to WHO for the purpose of writing this Report to describe the complex nature of the interplay between country health systems and GHIs. We suggest that this Report provides the most detailed compilation of published and emerging evidence so far, and provides a basis for identification of the ways in which GHIs and health systems can interact to mutually reinforce their effects. On the basis of the findings, we make some general recommendations and identify a series of action points for international partners, governments, and other stakeholders that will help ensure that investments in GHIs and country health systems can fulfil their potential to produce comprehensive and lasting results in disease-specific work, and advance the general public health agenda. The target date for achievement of the health-related Millennium Development Goals is drawing close, and the economic downturn threatens to undermine the improvements in health outcomes that have been achieved in the past few years. If adjustments to the interactions between GHIs and country health systems will improve efficiency, equity, value for money, and outcomes in global public health, then these opportunities should not be missed.

0 comments Cited 188 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Co-coverage of preventive interventions and implications for child-survival strategies: evidence from national surveys.

Cesar Victora, Bridget Fenn, Jennifer Bryce … (2015)

In most low-income countries, several child-survival interventions are being implemented. We assessed how these interventions are clustered at the level of the individual child. We analysed data from Bangladesh, Benin, Brazil, Cambodia, Eritrea, Haiti, Malawi, Nepal, and Nicaragua. A co-coverage score was obtained by adding the number of interventions received by each child (including BCG, diphtheria-pertussis-tetanus, and measles vaccines), tetanus toxoid for the mother, vitamin A supplementation, antenatal care, skilled delivery, and safe water. Socioeconomic status was assessed through principal components analysis of household assets, and concentration indices were calculated. The percentage of children who did not receive a single intervention ranged from 0.3% (14/5495) in Nicaragua to 18.8% (1154/6144) in Cambodia. The proportions receiving all available interventions varied from 0.8% (48/6144) in Cambodia to 13.3% (733/5495) in Nicaragua. There were substantial inequities within all countries. In the poorest wealth quintile, 31% of Cambodian children received no interventions and 17% only one intervention; in Haiti, these figures were 15% and 17%, respectively. Inequities were inversely related to coverage levels. Countries with higher coverage rates tended to show bottom inequity patterns, with the poorest lagging behind all other groups, whereas low-coverage countries showed top inequities with the rich substantially above the rest. The inequitable clustering of interventions at the level of the child raises the possibility that the introduction of new technologies might primarily benefit children who are already covered by existing interventions. Packaging several interventions through a single delivery strategy, while making economic sense, could contribute to increased inequities unless population coverage is very high. Co-coverage analyses of child-health surveys provide a way to assess these issues.

0 comments Cited 69 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Med Inform Decis Mak

Title: BMC Medical Informatics and Decision Making

Publisher: BioMed Central

ISSN (Electronic): 1472-6947

Publication date Collection: 2011

Publication date (Electronic): 19 May 2011

Volume: 11

Page: 33

Affiliations

[1 ]Swiss Tropical and Public Health Institute, Socinstrasse 57, Basel 4051, Switzerland

[2 ]University of Basel, Basel, Switzerland

Article

Publisher ID: 1472-6947-11-33

DOI: 10.1186/1472-6947-11-33

PMC ID: 3123542

PubMed ID: 21595905

SO-VID: 9c2d3286-2f03-43c2-bd82-b52ee18d2584

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 10 January 2011

Date accepted : 19 May 2011

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Most referenced authors 106

See all reference authors