Symposium 2001/11 23 July 2001 English only |
Symposium on Global Review of 2000 Round of
Population and Housing Censuses:
Mid-Decade Assessment and Future Prospects
Statistics
Division
Department
of Economic and Social Affairs
United
Nations Secretariat
New York,
7-10 August 2001
Evaluation
of population census data through demographic analysis *
Gabriel B. Fosu **
CONTENTS
1. Methods for overall
assessment of census quality
Evaluation of
Population Census Data through Demographic Analysis
Demographic analysis is an
important tool for evaluating census data, particularly in countries where
independent sources of data, such as vital registration and sample surveys, are
lacking or where a post-enumeration survey (PES) is not conducted. A weakness
with demographic analysis is that it generally does not provide enough
information to separate errors of coverage from errors in content. Moreover, demographic
methods require reliable data on the components of population—fertility,
mortality and migration—which are often unavailable. A number of methods are
available, and they differ with regard to data requirements, the quality of the
results and the technical sophistication required to use them.
For overall assessment of
census quality, methods of evaluating coverage error use data on age-sex groups
or age cohorts. The age-sex pyramid is a standard method, as are such summary
indices as Whipple’s Index and Myers’ Index. Stable population analysis can
also be undertaken, as long as certain assumptions are met, such as constant
fertility and mortality rates and no migration into or out of the population.
In countries where mortality has been declining, a quasi-stable model may be
appropriate. This method has been widely used in developing countries. One
disadvantage is that the estimates tend to be sensitive to changes in
fertility, so countries with recent declines in fertility may not achieve
satisfactory results.
Results of a census may be
compared with data from other demographic systems, such as vital registration
of births and deaths and net migration, if such data are available. The cohort
component method of demographic analysis uses data from two successive censuses
as well as life-table survival rates, age-specific fertility rates and
estimated levels of international migration between censuses. The population
enumerated in the first census is projected forward to the reference date of
the second census, based on estimated levels and age schedules of fertility,
mortality and migration, and the “expected” population is compared with the
enumerated population in the second census. In some developing countries where
this method has been used, indirect estimates of fertility and mortality must
be derived.
Another method of analysis
involves comparing age distributions of successive censuses. In a population
closed to migration, variations in the size of birth cohorts are due to mortality.
This method is widely used because it requires little data other than
information from two censuses. Its usefulness increases significantly when data
from more than two censuses are available. A final method, the cohort survival
regression method, uses population counts by age from two censuses and deaths
by age during the intercensal period to assess the relative completeness of
coverage. This method has not seen wide application in recent censuses.
Many census organizations
employ qualified demographers and statisticians, who have the expertise to
undertake demographic analysis. Others, however, need outside assistance and
additional training for their own staff. Cooperation between subject-matter
specialists (data processing experts, cartographers, GIS experts and
demographic statisticians) is essential throughout the census operation. Census
questions must be designed to collect adequate information for demographic
analysis. For example, the sex of persons who died during the year preceding
the census is critical if gender-disaggregated analysis is planned. Moreover,
the integrity of actual census responses should not be compromised in the
processing stage. Finally, in some countries, lack of funds has led to delays
in processing or to tabulating only a sample of census returns. This could have
adverse effects on the use of demographic methods in the evaluation of census
data.
1.
Over
90 per cent of all countries carry out censuses to count their population and
to collect information about the people living in various geographic regions.
The uses of census data are varied. According to the United Nations,
Information on the size, distribution and characteristics of a
country’s population is essential for describing and assessing its economic,
social and demographic circumstances and for developing sound policies and
programmes [in such fields as education and literacy, employment and manpower,
family planning, housing, maternal and child health, rural development,
transportation and highway planning, urbanization and welfare] aimed at
fostering the welfare of a country and its population (United Nations, 1998).
2.
Also,
the uses of census data in business, industry, labour and research institutions
have multiplied. Information technology has radically extended possible uses of
population census data beyond the traditional models. As population census data
have become more and more pervasive in our lives, so have the calls for
increasing their scope, completeness, accuracy and validity, and for improving
their national value and international comparability.
3.
The
conduct of a census is a massive operation. Consequently, despite all the
meticulous preparations, there is always some degree of error. Two main types
of errors usually occur—coverage errors and content errors. A large number of
methods have been developed to evaluate census data. Among these are
demographic analysis, the post-enumeration survey (PES) and comparison of
census data with administrative statistics and household surveys. The methods
differ widely with regard to data requirements, the level of technical
sophistication and the quality of the results. This paper examines the use of
demographic analysis in the evaluation of population census data. Specifically,
the following will be discussed: the data requirements for better use of
demographic analysis; whether demographic analysis can be a stand-alone method
or whether it is most effective when used in conjunction with other methods,
such as the PES; and, in a limited sense, how widely demographic analysis is
used in developing countries.
4.
In
many countries, especially where reliable registration systems are lacking,
demographic analysis is the basic methodological option for the evaluation of
census data, and it is used whether a PES is conducted or not. However, a basic
weakness in using demographic analysis for census evaluation is that
demographic methods “generally do not provide sufficient information to
separate errors of coverage from errors in content” (United States Bureau of
Census, 1985).
5.
Furthermore,
demographic methods for estimating coverage errors are not reliable except in
situations where reasonably accurate data on fertility, mortality and migration
are available. Due to the general lack of reliable data on such population
components, we should be cautious in using demographic analysis as the only
method for evaluating census coverage.
6.
Nevertheless,
even when data from the census being evaluated are the only available data,
some demographic techniques can still be used to provide information on the
magnitude of error in the data based on internal consistency checks.
7.
On the
other hand, if additional sources of data are available (e.g., vital
registration or demographic surveys), a much broader range of demographic
techniques can be used. Such methods, based on comparison of two or more
sources of data, tend to be more powerful in their ability to assess the
relative contributions of types of errors and their possible causes.
8.
In
recent years, an increasing number of countries have undertaken demographic
surveys as a part of the worldwide Demographic and Health Surveys (DHS)
project. Additionally, by the end of the 2000 round of censuses, most countries
will have at least two sets of census data—one from the 2000 round and another
from either the 1980 or 1990 round of censuses. The availability of such data
will undoubtedly make demographic analysis more useful as a method of census
evaluation.
9.
The
need for accurate data for evaluation of census data through demographic
analysis has been aptly summarized by the United States Bureau of the Census:
Where at least two censuses and reasonably
accurate information on levels of fertility, mortality, and migration are
available, demographic analysis can provide defensible and consistent estimates
of census coverage (at least at the national level) and substantial evidence on
the overall quality of census age data. However, since estimates of census
error are derived as “residual” differences between the actual and expected
census counts, it is important to have fairly accurate information on levels of
fertility, mortality, and migration. The accuracy of estimates of census error
derived from demographic analysis of successive censuses depends entirely upon
the accuracy of the information from the previous census and on the components
of population change. Where this information is of uncertain quality, it is
often difficult to determine what portion of the estimated census error to
ascribe to errors made during the census being evaluated as opposed to errors
attributable to the data used in the calculation of the expected population (US
Bureau of the Census, 1985).
10.
Detailed
methodologies for applying demographic analysis in census evaluation can be
found in several demography-related texts (for example, Shryock et al., 1976;
United Nations, 1983, 1998; US Bureau of the Census, 1985: Arriaga et al.,
1994). The methods differ with regard to data requirements, the quality of the
results and the technical sophistication required in using them. Here, only a
brief synopsis of selected methods will be discussed.
11.
The
most intensive methods of evaluating coverage error are done with regard to
age-sex groups or age cohorts.
·
Graphical analysis of age-sex distribution
(age-sex pyramid).
This technique has become a standard method in the evaluation of all population
censuses (Shryock et al., 1976; US Bureau of the Census, 1985).
·
Summary indices of age-sex data, including age-sex ratios and
age-sex accuracy indices, such as Whipple’s Index, Myers’ Index, UN Age-Sex
Accuracy Index, and other smoothing techniques (Arriaga et al., 1994).
·
Stable population analysis. Comparison of reported age-sex
distribution with a stable or quasi-stable population model.
12.
Inherent
in the application of some demographic methods is the requirement that certain
“theoretical assumptions” must be met in their application. For instance, the
use of stable population models requires that both fertility and mortality have
been constant in the past, while quasi-stable models are applicable when
mortality decline has been under way for a known duration.
Specifically the
characteristics of stable population include:
·
constant
crude birth and death rates,
·
fixed
age structure,
·
total
population size varies by a constant growth rate (r), known as the intrinsic
growth rate, and
·
the
population is a closed one (no migration).
Data requirements:
·
Census
count of population to be evaluated by age and sex;
·
Estimates
of two of the following parameters:
1.
Growth
rate (r) of the population,
2.
Birth
rate (b), and
3.
Probability
of surviving from birth to age “x”.
13.
This
method has been widely used in developing countries, especially since the
conditions assumed under the model are satisfied in many countries. However,
the recent decline in fertility in a number of countries limits its usefulness,
since the estimates tend to be sensitive to changes in fertility.
14.
Coverage
errors refer to under- or overenumeration, whereas content errors refer to
response quality of specific questions. Several methods of evaluation involve
comparative analyses of data from successive censuses.
a.
Comparison of census with other sources of data
15.
The estimate of the population based on the most
recent census can be compared with data from other demographic systems, such as
vital statistics of births and deaths and net migration between censuses.
16.
Very
often, one or more population censuses are used as benchmarks for post-censal
and intercensal population estimates and projections. Estimates based on
recorded data on components of change tend to be the most satisfactory.
However, estimates are also made based on mathematical assumptions (arithmetic
or geometric increase). Post-censal estimates may be used to evaluate a
subsequent census, while subsequent censuses can also be employed to evaluate
the methods used in estimations or projections.
b. Cohort component
method
17.
Population
projections derived from the previous census data and fertility, mortality and
migration statistics can be compared with new census results.
18.
The
population enumerated in the first census is “projected” to the reference date
of the second census based on estimated levels and age schedules of fertility,
mortality and migration in the intercensal period. The “expected” population is
compared with the enumerated population in the second census.
Data requirements:
·
The
population enumerated by age and sex in two successive censuses;
·
Life-table
survival rates for males and females assumed to be representative of mortality
conditions in the intercensal period;
·
Age-specific
fertility rates for women aged 15 to 49 assumed to be representative of the
level and age structure of fertility during the intercensal period;
·
Estimated
sex ratio at birth; and
·
Where
there is a substantial level of net migration, estimated levels and age pattern
of international migration during the intercensal period.
19.
Several
developing countries have used this method. Since registration data are usually
deficient and satisfactory adjustment is usually not feasible, indirect
estimates of fertility and mortality levels are usually derived from two
censuses. Where reliable estimates can be derived from sufficient information,
this method can provide age- and sex-specific estimates of net census error.
Hence, this method is the most powerful among other alternative methods for
evaluation of the census.
c. Comparison of age
distributions of two censuses based on intercensal cohort survival rates
20.
The
size of birth cohorts enumerated in successive censuses is compared. In a
population closed to migration, the variations in the number of persons in a
birth cohort between two successive censuses will be due to mortality. Hence,
the ratio of size of the birth cohort in the first census to that of the second
census should approximate the expected survival rate based on prevailing
conditions of mortality.
Data requirements:
Relatively little information other
than enumeration from two censuses is needed to apply the method.
·
Population
by age and sex from two successive censuses;
·
Life
table assumed to be representative of mortality conditions in the intercensal
period; and
·
Volume
of net migration by age and sex during the intercensal period.
21.
This
method is widely used in part because only information from two census counts
is needed in its application. The usefulness of this method increases
significantly when data from three or more censuses are available. In several
studies, this method has been useful in assessing the extent to which distorted
distributions are due to historical factors and demographic shifts rather than
census errors.
d. Cohort survival
regression method
22.
This
method uses regression to derive estimates of coverage correction factors to
make age data from two censuses mutually consistent. It is an extension of the
census survival method for assessing the relative level of coverage in the two
censuses. It determines coverage error correction factors for the population
enumerated in two successive censuses. When this is combined with data on the
number of deaths during the intercensal period from vital registration or a
life table, it makes the population in each cohort in the second census
consistent with the size of the cohort in the first census and the implied
mortality in the intercensal period. The ratio of the implied census coverage
correction factors in the two censuses is used as an estimate of relative
coverage in the censuses. Ordinary least squares regression (OLS) is used to
derive the estimated coverage.
·
Population counts by age from two censuses; and
·
Deaths
by age during the intercensal period, or the use of an appropriate life table
for deriving age-specific survival probabilities over a period of time equal to
the intercensal period.
23.
Although
this seems to be a useful method in assessing the relative completeness of
coverage in successive censuses, there are few known applications in recent
censuses.
·
Availability of
expertise. Many national population census and
statistical offices in developing countries have qualified demographers and
statisticians on their staffs. Hence, there is often no need for additional
staff in this area or for technical assistance in undertaking demographic
analysis of census data. However, the high turnover of staff may deplete the
stock of available expertise, which may make developing countries dependent on
outside assistance in demographic analyses of censuses. Nevertheless, there may
be the need for further training of staff in the application of new statistical
packages for demographic analyses.
·
Integrity of census data. The steady advances made in census enumeration
have generally led to improved accuracy. The need to produce accurate results
may sometimes compromise the integrity of the data. For instance, overediting
could, if rejection rates were high, lead to imposition of artificial values on
much of the data. However, with improved evaluation methods (e.g., demographic
analysis), we can publish data that reflect actual census responses and draw a
clearer distinction between (1) primary data with an acceptable range of
derived variables that stem from the actual census, and (2) secondary data
based largely on sets of assumptions and indirect estimations.
·
Cooperation. There is the need for continued close
coordination among subject-matter specialists (data processing experts,
cartographers, GIS experts, and demographic statisticians) throughout the
census operation.
·
Inadequate information. When inadequate information is
collected in a census, it renders some
census questions potentially unusable. A case in point is the collection of
data on deaths occurring in the year preceding the census but lack of critical
information on sex of the deceased, which would enable complete
gender-disaggregated analysis. Also, the omission of a crucial variable such as
place of birth can affect the use of census data for migration analysis.
Various difficulties may be posed by the data for estimating both internal and
international migration.
·
Funds. Sometimes only a sample of the census returns
is ever tabulated because funds run out or there are so many delays in the
compilation and processing. This undoubtedly can affect the use of demographic
methods in the evaluation of census data.
Arriaga, E. A, P. D. Johnson and E. Jamison
(1994). Population Analysis with
Micro-Computers, Volumes I and II. Washington D.C.
Shryock H. S, J. S. Siegel and Associates
(1976). The Methods and Materials of
Demography. New York: Academic Press.
United Nations (1983). Indirect Techniques for Demographic Estimation, Manual X. Population
Studies, No. 81. Sales No. E83.XIII.2.
United Nations (1998). Principles and Recommendations for Population and Housing Censuses,
Revision 1. Statistical Papers, Series M, No. 67/Rev. 1. Sales No.
E.98.XVII.8.
United States Bureau of the Census (1985). Evaluating Censuses of Population and
Housing, Statistical Training Document, ISP-TR-5. Washington, D.C.: US
Bureau of the Census.