What is the INDEPTH Network?
The International Network for the Demographic Evaluation of Populations and their
Health (INDEPTH) Network is an umbrella organization for a group of independent health
research centres operating health and demographic surveillance system (HDSS) sites
in low- and middle-income countries (LMICs). Founded in 1998, it brought together
a number of existing HDSS sites, and since then has encouraged newer HDSS sites to
join.
1
The purpose of this Editorial is to set the scene for a series of profiles from INDEPTH
HDSS member sites, the first examples of which are published in this edition of IJE.
2–5
All these profiles will follow a set pattern, to facilitate a systematic understanding
of the multiplicity of HDSS sites involved in the Network and the various ways in
which they are operated by their parent institutions. This Editorial therefore, follows
the same general pattern as the individual profiles, but seeks to explore the epidemiological
basis on which the HDSSs operate in general, and the role of the Network, rather than
dealing with site-specific issues.
At the central level, the INDEPTH Network operates from its base in Accra, Ghana,
as an international NGO and is also registered as a not-for-profit entity in the USA.
The emphasis on the Network’s position as a Southern-led and -based organization was
an important founding tenet, and this is very welcome in a world where vestiges of
colonialism still occasionally surface in relation to health data and policy. Day-to-day
operations are led by the Executive Director (O.S.), and governance and oversight
are provided by an international Board of Trustees and a Scientific Advisory Committee
(chaired by P.B.).
Why was the INDEPTH Network set up and what does it cover now?
The raison d’être behind the emergence of the Network was the apparently intractable
lack of reliable population-based data on health across many LMICs in Africa, Asia
and Oceania. Recognizing that there are no quick fixes in terms of achieving universal
individual registration of populations in LMICs,
6
the Network represents a medium-term attempt to break the link between material and
data poverty.
7
Epidemiology in many LMICs suffers from a dual lack of reliable population data and
human capacity to make use of them. The immediate consequence is that health policy
making often lacks its essential evidence base, with the possible effect of failing
to use scarce resources effectively in some of the world’s poorest countries.
There are considerable global disparities in terms of epidemiological research output
per population. Figure 1 shows the countries of the world shaded by a crude measure
of this, namely the number of PubMed hits for a search on (‘epidemiology’ and <country>)
per 1000 population. Much of Africa and Asia falls under the level of 0.05 per 1000,
corresponding to rates which represent less than one-twentieth of some of the world’s
leading countries in terms of epidemiological output. Superimposed on the map in Figure
1 are the current 43 HDSS sites run by 36 member centres of the INDEPTH Network. Although
the locations of these sites are somewhat serendipitous, rather than being strategically
planned, it is evident that there is considerable coverage across the areas of the
world that lack substantial epidemiological output. Thus, it is clear that the INDEPTH
Network, through these 43 sites in 20 countries, collectively following a population
of 3.2 million people, does indeed offer possibilities for filling some of the global
gaps in epidemiology.
Figure 1
Countries of the world classified by PubMed citations for (‘epidemiology’ and <country>)
per 1000 population, also showing the location of 43 HDSS site members of the INDEPTH
Network (white dots)
Where are the INDEPTH HDSSs?
From the outset, the INDEPTH Network has operated by accepting as members already
functioning independent health research centres that run HDSSs. Therefore, the Network
has little influence over the locations or geographical distribution of member HDSS
sites. However, since the concept of an HDSS would be somewhat irrelevant in countries
with universal population registration, in practice there is self-selection of site
locations in places where the lack of other reliable population-based data justifies
the considerable effort involved in launching an HDSS. As is evident from Figure 1,
this means that HDSS sites are located across Africa, Asia and Oceania, but by no
means randomly. Several countries contain multiple HDSS sites, whereas many epidemiologically
poor countries contain none.
What populations are covered by the HDSSs and how are they followed up?
HDSSs set out to collect epidemiological data (risks, exposures and outcomes) within
a defined population on a longitudinal basis. In terms of Pearce’s classification
scheme for epidemiological study designs,
8
this places HDSSs as representing ‘the most comprehensive approach since they use
all of the available information on the source population over the risk period’.
Unlike many epidemiological study designs, in which study participants are somehow
selected to represent particular population subgroups, HDSSs generally set out to
cover a real-life population and see what happens epidemiologically over a period
of years and even decades. Issues of representativity and sampling are nevertheless
critical considerations for all HDSSs, and need to be considered at the outset, when
often little is known about potential target populations. Many HDSSs have started
from intentions of covering an area that is at least subjectively thought to be typical
of wider areas, maybe up to national levels. A chicken-and-egg situation arises, however,
in that the motivation for having an HDSS is driven by a recognized lack of population-based
health data, so that at the outset, very little may be known about candidate areas
and maybe even less about the wider situation. There are no simple solutions to this
conundrum.
Even after identifying a target area for an HDSS, there are a number of possible design
considerations. A range of different sampling strategies can be used within the target
area, that have both epidemiological and practical implications.
9
In practical terms, one important consideration is whether the final population is
defined as being within a contiguous area or in a collection of small areas (e.g.
discrete villages) within a wider area. This has important logistic implications in
terms of organizing and maintaining on-going surveillance, as well as affecting the
definition of migration events (see below). The independent INDEPTH HDSSs naturally
include a mixture of approaches to initially identifying target areas, within-area
sampling and population contiguity.
The overall size of the population within an HDSS is a further important factor, as
is the case in any epidemiological study. However, an HDSS is not a classic sample
survey, and so determining the size of the target population is not straightforward.
Size is of course driven by considerations of the rarest event(s) of interest, which
for most HDSSs are mortality-related outcomes. If specific causes of mortality are
of particular concern, then the overall population size needs to be based on numbers
relating to the nth ranked cause of interest.
10
Current INDEPTH member HDSS sites range in population size from tens of thousands
up to around a quarter of a million. In most HDSSs the overall numbers are driven
by mortality outcomes, with the result that surveillance of particular more common
outcomes (such as morbidity and social measures) may in some situations be more effectively
undertaken using a sample drawn from within the overall HDSS population.
During the life of the INDEPTH Network, the technological and methodological possibilities
for obtaining and using geographical data have advanced considerably, to the point
where recording the latitude and longitude of every residential unit, and other salient
features, in an HDSS using global positioning system (GPS) technology have become
commonplace.
Once an HDSS population is defined, an initial detailed census is usually undertaken
to capture details of all residents and the social units in which they live. This
usually involves assigning unique identifiers to all the residents and social units
encountered in the census, using a numbering system that has sufficient capacity for
expansion to reflect the addition of future residents and social units. It is not
simple to arrive at generic definitions of social units across cultures and traditions,
and individual HDSSs have to handle these issues in ways that make sense for their
own context, both for physical structures (housing) and groups of inhabitants (families).
INDEPTH has tried to standardize definitions as far as possible by publishing a resource
kit for HDSS design on its website. This initial census then forms the basis of a
database system that is updated on a regular basis to reflect the dynamic cohort of
people living within the HDSS, as conceptualized in Figure 2. An important consideration
is to determine the modality of the regular update rounds. Since HDSSs operate by
definition in populations that are not otherwise enumerated, and generally have weak
infrastructures, the norm is that local staff have to be recruited to undertake regular
update visits to all the social units in the defined area. This forms a major component
of the ongoing effort of running an HDSS, and consequently issues such as the frequency
of update rounds need to be considered very carefully. Different INDEPTH HDSSs use
various update frequencies, from one to four annual rounds. Certain types of events,
e.g. neonatal mortality, are likely to be particularly sensitive to recall bias, which
in turn is related to update frequency. Thus, it tends to be the case that more frequent
updates are needed in high mortality or high migration settings, whereas in societies
that are more stable, or at later stages of demographic transition, less frequent
updates may prove adequate.
Figure 2
Conceptual structure of the dynamic cohort model used by INDEPTH Health and Demographic
Surveillance System (HDSS) sites
What is being measured and how are the HDSS databases constructed?
Having set up an HDSS, the next challenge is to track the progress of the dynamic
cohort shown in Figure 2 by regularly updating a series of core parameters, detailed
below. Naturally, the operation of an HDSS is not confined only to these core activities,
and most HDSSs will have specific agendas defining what other parameters they may
need to handle, e.g. in relation to the epidemiology of specific diseases, the execution
of clinical trials, monitoring the effectiveness of health systems and other important
issues that can be built onto the basic HDSS platform.
Social units
Keeping track of social units is a challenging issue, since it involves both physical
structures (that can be newly built, in existence or be demolished) and the family
groups associated with physical structures (that can migrate in or out as complete
groups, or particular individuals can migrate to join or leave a group). In some cultures
the physical structures may be large and complex compounds, perhaps housing up to
100 people and possibly containing subunits based on a polygamous social structure.
At the other end of the spectrum, nuclear families may occupy small, discrete dwellings.
Many HDSSs also aim to gather data on socio-economic status, often reflected by a
basket of parameters including details of the physical structure, as well as owning
traditional and modern assets.
Births
Capturing details of new births is a critical function of any HDSS, since births form
a major part of new entrants to the cohort and are critical to any analyses of fertility.
In some settings, traditional behaviours around childbirth (e.g. going to stay at
the maternal grandmother’s residence for the birth and neonatal period) may make births
more difficult to record accurately. There is a particular difficulty around detecting
early neonatal deaths, and separating these reliably from intra-partum stillbirths,
and this becomes more difficult with less frequent update rounds.
Migrations
Tracking details of migration patterns is one of the most complex areas in HDSSs,
fundamentally comprising people moving into the surveillance area, within the area
and out of the area. Many of these complexities are reflected in INDEPTH’s monograph
on migration.
11
Every type of migration needs to be defined by rules (involving duration, intent,
destination, etc.) which are appropriate to the population concerned. Some communities
experience regular patterns of seasonal migration, related to employment or agricultural
production. The possibility of multiple moves per individual over a period of time
must be incorporated, and a further challenge can be the reliable re-identification
of an individual on in-migration as being the same person who previously moved out.
The design of an HDSS site in terms of the contiguity of the surveyed population is
also important, since local moves in a non-contiguous population may be classified
as in- and out-migrations, whereas similar moves in a contiguous area would amount
to within-site migrations.
Deaths
Deaths, documented by age and sex, are a critical outcome measure for every HDSS and,
in addition to reporting basic mortality rates, are an essential component in formulating
life tables and other demographic measures for HDSS populations. As noted above, one
of the most difficult issues involves reliably identifying early neonatal deaths.
Causes of death
Identifying the causes of death is a much more difficult issue in populations where
most deaths do not occur in health facilities. The only realistic approach to attributing
the cause of death is by carrying out verbal autopsy (VA) interviews with relatives
or caretakers of deceased individuals, and then using those data to arrive at a likely
cause of death. The INDEPTH Network was closely associated with developing a WHO standard
instrument for VA interviews.
12
In many HDSSs, interpretation of the VA data was done by giving the VA data to local
physicians, often more than one per case, in order to arrive at a consensus cause.
However, this is an expensive and time consuming process that is gradually being superseded
for most purposes by the application of computer-based probabilistic models.
13
INDEPTH is currently part of a new round of VA tool development in conjunction with
WHO, which aims to simplify and shorten the VA process, as well as moving the scope
of VA beyond research settings into non-enumerated populations.
Databases
Maintaining a database that reflects all the details of the population in a dynamic
cohort is one of the most demanding tasks for most HDSSs, and a range of different
approaches are used. The longitudinal nature of the HDSS data demands the use of relational
database management systems (RDBMS) to handle the considerable volume of data involved
over long periods of time. The basic principles of implementing an RDBMS for an HDSS
have not changed fundamentally since the 1980s, when one of the longest-standing INDEPTH
member HDSS sites made the transition to an RDBMS system.
14
However, appropriate hardware and software resources have progressed through several
generations of development in the meantime, and that is reflected in the current range
of implementations across the INDEPTH Network. These include implementations built
on proprietary RDBMS systems such as Microsoft FoxPro™, Microsoft Access™ and Structured
Query Language (SQL), as well as generic systems made available for the use of HDSS
sites, such as the Household Registration System from the Population Council,
15
subsequently re-engineered as the paperless SQL-based ‘Open-HDS’. As commercial hardware
and software specifications move on (e.g. Microsoft’s decision to cease supporting
FoxPro™), long-term HDSS operations are sometimes forced to migrate their database
operations onto new platforms, which is not a trivial matter for long-term databases
linked to live surveillance.
Ethical issues
Running an HDSS over a long period raises a range of ethical issues that are different
in some respects from those pertaining to many epidemiological studies. In the first
place, the core HDSS data on vital events that are routinely collected in an HDSS
population tend to be considered as research data, and subject to research ethics
approval and informed consent, even though in countries that implement universal vital
registration, it is regarded as a civic duty or even a legal obligation to provide
such data. But, however population data are viewed, there are essential standards
of confidentiality and anonymity that must be safeguarded. In HDSS data, there are
three particularly critical types of data in this respect. Individual identities (whether
by name or some other identifier) have to be protected at all stages of the process—from
field interviewers observing adequate standards of confidentiality through database
systems (and their backups) being held securely, to not revealing identifiers in any
data sharing or outputs. Closely coupled with this, since HDSSs now commonly collect
the GPS locations of households, it is important to also regard these data as confidential,
since in principle they can be used to identify and locate households, and thereby
their residents. Anonymizing GPS data is a much more difficult issue than simply removing
names from a database.
16
Third, HDSS databases typically accumulate a large volume of personal, often medical,
data (such as HIV status) that are sensitive and must be kept confidential.
Key findings and publications
Outputs from the INDEPTH Network mentioned here comprise those that are based on data
from more than one HDSS site, or which make external comparisons. The individual HDSS
site profile papers will provide further details of site-specific outputs. The INDEPTH
Network website (www.indepth-network.org) provides information about the Network,
its organization and current activities.
One of the clear strengths of a network such as INDEPTH is its potential to collate
data from member HDSS sites into outputs that enable systematic comparisons to be
made. The first major INDEPTH output was a monograph published in 2002 that outlined
basic HDSS concepts and gave details of 22 HDSS site members at that time.
17
Two further monographs relating to health equity in small areas
18
and migration
11
followed in 2005 and 2009, respectively. In a different format, using a supplement
in an open-access journal, three sets of multi-site papers were published in 2009–10.
The first related to cross-site findings on non-communicable disease risk factors
from a group of INDEPTH member HDSS sites in Asia.
19–
27
The second related to mortality clustering across a range of INDEPTH member HDSS sites
28
–
36
and the third to results from eight INDEPTH member HDSS sites, which participated
in the WHO–SAGE programme on ageing.
37–
46
The latter Supplement represented an innovation for the INDEPTH Network with the combined
dataset used for the analyses also being published online together with the papers.
Publications based on these public-domain data are now emerging.
47
A number of other papers have considered particular issues at the Network level.
48–
53
In addition, there have been some outputs that have involved inter-site collaborations
but not included wide representation across the Network.
54–
59
In some cases, multiple INDEPTH members are also members of other research networks
such as the RTS,S Clinical Trials Partnership
60
and the Alpha Network.
61
Several other studies have made comparisons between HDSS data from single INDEPTH
HDSS sites and other sources.
62–
65
Future analysis plans
As well as the substantial and continuing volume of outputs from individual HDSS sites,
the INDEPTH Network will continue to produce multi-site outputs in particular topic
areas. Current priorities include comparative assessments of fertility and cause-specific
mortality patterns, as well as retrospective analyses of HDSS data against correspondingly
timed weather data, which offer insights into the possible future population effects
of changes in climatic conditions.
Strengths and weaknesses
HDSS sites represent an inherently strong epidemiological design, giving considerably
greater analytical scope than can be achieved from e.g. cross-sectional approaches.
However, the resources required to run an HDSS effectively are very considerable,
particularly since the greatest gaps in health data are generally found in more logistically
challenged environments. Not least this makes it very difficult for many HDSS sites
to recruit and retain highly competent personnel, particularly those with experience
in database management and epidemiological analysis, with the result that HDSS sites
sometimes find it difficult to maximize their outputs.
A recurrent issue that arises in considering HDSS data is how the site populations
are, or are not, representative of the wider surrounding populations. Although this
does not pose any technical issues in terms of analysing data within an HDSS site,
it is of concern when it comes to interpreting HDSS data into wider epidemiological
and policy arenas. There are no simple solutions to this issue, since HDSSs are always
located in places where little is known about the surrounding population. It is possible
to make comparisons with other data sources, such as national censuses and cluster
sample surveys,
62–
65
but these sources come with their own disadvantages such as greater recall bias, and
hence it is very difficult to attribute causes to observed differences. An empirical
investigation into this issue used Swedish national data from 1925, a time when Sweden
shared many characteristics with contemporary LMICs.
66
This showed that the majority of individual counties could have been taken as adequately
representative of the national population, and the less representative counties were
self-evidently so (including the capital city and the most remote regions). Although
this does not offer any absolute evidence about the representativity of INDEPTH member
HDSS sites, it suggests that it is not reasonable to assume by default that HDSS populations
are unrepresentative.
The diversity observed across the INDEPTH member HDSS sites is a further source of
both strength and weakness. As discussed earlier, there has never been any master
plan for establishing HDSS sites in particular locations, and there are also significant
(but often locally appropriate) detailed methodological differences between HDSS sites.
This brings strength in terms of having highly functional and locally supported HDSS
sites in many locations, something that might not have happened so effectively in
trying to locate HDSS sites more systematically. However, it also brings some weaknesses
when it comes to making comparisons across HDSS sites and between the countries that
they represent. In contrast, the much stricter uniformity enforced across the Demographic
and Household Survey (DHS) series of cross-sectional surveys makes comparisons simpler,
67
but that stems from a completely different organizational paradigm. Nevertheless,
the common core activities of all INDEPTH member DHSS sites, in following vital events
longitudinally in a defined population, mean that the pooled INDEPTH data represent
a major unified source of data on otherwise undocumented populations.
An interesting development in some situations, e.g. in China,
68
is the concept of a distributed national network of HDSS-type surveillance, which
perhaps represents a further intermediate step for the future. This has the advantage
of being more widely representative, but at the same time bringing the advantages
of a longitudinal approach. This may become a more common model as countries move
towards universal individual registration.
Data sharing and collaboration
Data sharing issues have become increasingly important for all health researchers
in recent years, and also continue to generate much debate.
69
There is also a continuing dialogue between researchers and funders on these issues.
70
The INDEPTH Network is firmly committed to the principles and practice of sharing
data, as expressed in the INDEPTH Data Access and Sharing Policy document, available
as Supplementary data at IJE online.
The issues involved in sharing HDSS data are complex. By the nature of the dynamic
cohort, there is never any point in time when data collection is ‘complete’, and talking
about sharing data at pre-determined intervals after completion is therefore not entirely
helpful. Ways to work around these conceptual difficulties therefore have to be found,
involving declaring particular periods of data from an HDSS as being ready for sharing
at appropriate times. INDEPTH has already launched the iSHARE portal for making data
from HDSS member sites publicly available (www.idepth-ishare.org) to bona fide users,
not unlike the arrangements for access to DHS data sets.
In the existing version of iSHARE, data files from the participating HDSS sites are
arranged in separate event files (births, deaths, migrations), but plans are underway
to standardize iSHARE data into a common event-based data format. The common event
attributes involved are shown in Table 1, and the range of different possible events
are listed in Table 2. This structure will allow all participating sites to present
HDSS core data in a straightforward and standardized format, which will facilitate
a wide range of possible analytical approaches.
Table 1
Common event attributes for the INDEPTH data specification
Attribute
Variable name
Description
Record number
RecNr
A sequential number uniquely identifying each record in the data file
Centre identifier
CentreId
An identifier issued by INDEPTH to each member centre of the format CCCSS, where CCC
is a sequential centre identifier and SS is a sequential identifier of the site within
the centre in the case of multiple site centres
Individual identifier
IndividualId
A number uniquely identifying all the records belonging to a specific individual in
the data file. For data anonymization purposes, this number should not be the same
as the identifier used by a contributing centre to identify the individual, but the
contributing centre should retain a mapping from this identifier to their identifier
Country identifier
CountryId
ISO 3166-1 numeric code of the country in which the surveillance site is situated
Location identifier
LocationId
Unique identifier associated with a residential unit within the site and is the location
where the individual was or became resident when the event occurred. For data anonymization
purposes, this identifier should not be the same as the identifier used internally
by the contributing centre, but the contributing centre should retain a mapping of
this identifier to their internal location identifier
Date of birth
DoB
The date of birth of the individual
Event
EventCode
A code identifying the type of event that has occurred (Table 2)
Event date
EventDate
The date on which the event occurred
Observation date
ObservationDate
Date on which the event was observed (recorded), also known as surveillance visit
date
Event count
EventCount
The total number of events associated with this individual in this data set
Event number
EventNr
A number increasing from 1 to EventCount for each event record in order of event occurrence
Table 2
Event types for the INDEPTH data specification
Event
Code
Definition
Attributes
Attribute description
Birth
BTH
The birth of an individual to a resident female
MotherId
DeliveryEventId
The IndividualId of the mother
The RecNr of the delivery event associated with this birth
Enumeration
ENU
Starting event for all individuals present at the baseline census of the surveillance
area. It is the date on which the individual was first observed to be present in the
surveillance area during the baseline census
In-migration
IMG
The event of migrating into the surveillance area
Origin
Classification scheme to be developed
Out-migration
OMG
The event of migrating out of the surveillance area
Destination
Classification scheme to be developed
Location exit
EXT
The event of leaving a residential location within the surveillance area to take up
residence in another residential location within the surveillance area
Destination
The LocationId of the location within the surveillance area to which the individual
relocated
Location entry
ENT
The event of taking up residence in a residential location within the surveillance
area following a location exit event. Note that location exit and entry are actually
two parts of the same action of changing residential location and as such happen on
the same event date
Origin
The LocationId of the residential location from which the individual moved
Death
DTH
The death of the individual under surveillance. The date of death is the event date
Cause1
Cause2
Cause3
Likelihood1
Likelihood2
Likelihood3
Up to three causes of death coded using the WHO list of verbal autopsy death causes.
Likelihood values associated with each possible cause of death
Delivery
DLV
The event of a pregnancy end after 28 weeks of gestation, which may or may not result
in the birth of one or more individuals (represented in this dataset by a BTH event
linked to this delivery event)
LBCnt
SBCnt
Parity
Live birth count
Stillbirth count
The number of live births to these women prior to this delivery
Observation end
OBE
An event inserted when a data set is right censored at an arbitrary date and this
individual remained under surveillance beyond this date. The right censor date is
the date of this event
Last observation
OBL
An event indicating the last point in time on which this individual was observed to
be present and under surveillance. Event date equals observation date in this instance.
Normally there should be no individuals with this event as their last event if the
right censoring date is prior to the start of the last complete census round
Observation
OBS
Used to record characteristics of individuals under surveillance valid at the time
of the observation. Could be used to record aspects such as educational attainment,
employment status or anthropometry measures. Specific examples of this event are not
part of the minimum core individual dataset, but are specified to allow for site or
working group needs
Conclusion
Our aim here is to describe the essential nature of the INDEPTH Network as a background
to detailed profiles of constituent member HDSS sites. Although all those sites have
important differences, the huge volume of detailed individual data generated across
Africa, Asia and Oceania by the Network constitutes a unique resource of great value
to demographers, epidemiologists and health planners.
Supplementary Data
Supplementary Data are available at IJE online.
Funding
Osman Sankoh is funded by core support grants to INDEPTH from the Hewlett Foundation,
Gates Foundation, Sida/GLOBFORSK and Wellcome Trust.
Supplementary Material
Supplementary Data