23 July 2001
Symposium on Global Review of 2000 Round of
Population and Housing Censuses:
Mid-Decade Assessment and Future Prospects
Department of Economic and Social Affairs
United Nations Secretariat
New York, 7-10 August 2001
Evaluation of population census data through demographic analysis *
Gabriel B. Fosu **
Evaluation of Population Census Data through Demographic Analysis
Demographic analysis is an important tool for evaluating census data, particularly in countries where independent sources of data, such as vital registration and sample surveys, are lacking or where a post-enumeration survey (PES) is not conducted. A weakness with demographic analysis is that it generally does not provide enough information to separate errors of coverage from errors in content. Moreover, demographic methods require reliable data on the components of population—fertility, mortality and migration—which are often unavailable. A number of methods are available, and they differ with regard to data requirements, the quality of the results and the technical sophistication required to use them.
For overall assessment of census quality, methods of evaluating coverage error use data on age-sex groups or age cohorts. The age-sex pyramid is a standard method, as are such summary indices as Whipple’s Index and Myers’ Index. Stable population analysis can also be undertaken, as long as certain assumptions are met, such as constant fertility and mortality rates and no migration into or out of the population. In countries where mortality has been declining, a quasi-stable model may be appropriate. This method has been widely used in developing countries. One disadvantage is that the estimates tend to be sensitive to changes in fertility, so countries with recent declines in fertility may not achieve satisfactory results.
Results of a census may be compared with data from other demographic systems, such as vital registration of births and deaths and net migration, if such data are available. The cohort component method of demographic analysis uses data from two successive censuses as well as life-table survival rates, age-specific fertility rates and estimated levels of international migration between censuses. The population enumerated in the first census is projected forward to the reference date of the second census, based on estimated levels and age schedules of fertility, mortality and migration, and the “expected” population is compared with the enumerated population in the second census. In some developing countries where this method has been used, indirect estimates of fertility and mortality must be derived.
Another method of analysis involves comparing age distributions of successive censuses. In a population closed to migration, variations in the size of birth cohorts are due to mortality. This method is widely used because it requires little data other than information from two censuses. Its usefulness increases significantly when data from more than two censuses are available. A final method, the cohort survival regression method, uses population counts by age from two censuses and deaths by age during the intercensal period to assess the relative completeness of coverage. This method has not seen wide application in recent censuses.
Many census organizations employ qualified demographers and statisticians, who have the expertise to undertake demographic analysis. Others, however, need outside assistance and additional training for their own staff. Cooperation between subject-matter specialists (data processing experts, cartographers, GIS experts and demographic statisticians) is essential throughout the census operation. Census questions must be designed to collect adequate information for demographic analysis. For example, the sex of persons who died during the year preceding the census is critical if gender-disaggregated analysis is planned. Moreover, the integrity of actual census responses should not be compromised in the processing stage. Finally, in some countries, lack of funds has led to delays in processing or to tabulating only a sample of census returns. This could have adverse effects on the use of demographic methods in the evaluation of census data.
1. Over 90 per cent of all countries carry out censuses to count their population and to collect information about the people living in various geographic regions. The uses of census data are varied. According to the United Nations,
Information on the size, distribution and characteristics of a country’s population is essential for describing and assessing its economic, social and demographic circumstances and for developing sound policies and programmes [in such fields as education and literacy, employment and manpower, family planning, housing, maternal and child health, rural development, transportation and highway planning, urbanization and welfare] aimed at fostering the welfare of a country and its population (United Nations, 1998).
2. Also, the uses of census data in business, industry, labour and research institutions have multiplied. Information technology has radically extended possible uses of population census data beyond the traditional models. As population census data have become more and more pervasive in our lives, so have the calls for increasing their scope, completeness, accuracy and validity, and for improving their national value and international comparability.
3. The conduct of a census is a massive operation. Consequently, despite all the meticulous preparations, there is always some degree of error. Two main types of errors usually occur—coverage errors and content errors. A large number of methods have been developed to evaluate census data. Among these are demographic analysis, the post-enumeration survey (PES) and comparison of census data with administrative statistics and household surveys. The methods differ widely with regard to data requirements, the level of technical sophistication and the quality of the results. This paper examines the use of demographic analysis in the evaluation of population census data. Specifically, the following will be discussed: the data requirements for better use of demographic analysis; whether demographic analysis can be a stand-alone method or whether it is most effective when used in conjunction with other methods, such as the PES; and, in a limited sense, how widely demographic analysis is used in developing countries.
4. In many countries, especially where reliable registration systems are lacking, demographic analysis is the basic methodological option for the evaluation of census data, and it is used whether a PES is conducted or not. However, a basic weakness in using demographic analysis for census evaluation is that demographic methods “generally do not provide sufficient information to separate errors of coverage from errors in content” (United States Bureau of Census, 1985).
5. Furthermore, demographic methods for estimating coverage errors are not reliable except in situations where reasonably accurate data on fertility, mortality and migration are available. Due to the general lack of reliable data on such population components, we should be cautious in using demographic analysis as the only method for evaluating census coverage.
6. Nevertheless, even when data from the census being evaluated are the only available data, some demographic techniques can still be used to provide information on the magnitude of error in the data based on internal consistency checks.
7. On the other hand, if additional sources of data are available (e.g., vital registration or demographic surveys), a much broader range of demographic techniques can be used. Such methods, based on comparison of two or more sources of data, tend to be more powerful in their ability to assess the relative contributions of types of errors and their possible causes.
8. In recent years, an increasing number of countries have undertaken demographic surveys as a part of the worldwide Demographic and Health Surveys (DHS) project. Additionally, by the end of the 2000 round of censuses, most countries will have at least two sets of census data—one from the 2000 round and another from either the 1980 or 1990 round of censuses. The availability of such data will undoubtedly make demographic analysis more useful as a method of census evaluation.
9. The need for accurate data for evaluation of census data through demographic analysis has been aptly summarized by the United States Bureau of the Census:
Where at least two censuses and reasonably accurate information on levels of fertility, mortality, and migration are available, demographic analysis can provide defensible and consistent estimates of census coverage (at least at the national level) and substantial evidence on the overall quality of census age data. However, since estimates of census error are derived as “residual” differences between the actual and expected census counts, it is important to have fairly accurate information on levels of fertility, mortality, and migration. The accuracy of estimates of census error derived from demographic analysis of successive censuses depends entirely upon the accuracy of the information from the previous census and on the components of population change. Where this information is of uncertain quality, it is often difficult to determine what portion of the estimated census error to ascribe to errors made during the census being evaluated as opposed to errors attributable to the data used in the calculation of the expected population (US Bureau of the Census, 1985).
10. Detailed methodologies for applying demographic analysis in census evaluation can be found in several demography-related texts (for example, Shryock et al., 1976; United Nations, 1983, 1998; US Bureau of the Census, 1985: Arriaga et al., 1994). The methods differ with regard to data requirements, the quality of the results and the technical sophistication required in using them. Here, only a brief synopsis of selected methods will be discussed.
11. The most intensive methods of evaluating coverage error are done with regard to age-sex groups or age cohorts.
· Graphical analysis of age-sex distribution (age-sex pyramid). This technique has become a standard method in the evaluation of all population censuses (Shryock et al., 1976; US Bureau of the Census, 1985).
· Summary indices of age-sex data, including age-sex ratios and age-sex accuracy indices, such as Whipple’s Index, Myers’ Index, UN Age-Sex Accuracy Index, and other smoothing techniques (Arriaga et al., 1994).
· Stable population analysis. Comparison of reported age-sex distribution with a stable or quasi-stable population model.
12. Inherent in the application of some demographic methods is the requirement that certain “theoretical assumptions” must be met in their application. For instance, the use of stable population models requires that both fertility and mortality have been constant in the past, while quasi-stable models are applicable when mortality decline has been under way for a known duration.
Specifically the characteristics of stable population include:
· constant crude birth and death rates,
· fixed age structure,
· total population size varies by a constant growth rate (r), known as the intrinsic growth rate, and
· the population is a closed one (no migration).
· Census count of population to be evaluated by age and sex;
· Estimates of two of the following parameters:
1. Growth rate (r) of the population,
2. Birth rate (b), and
3. Probability of surviving from birth to age “x”.
13. This method has been widely used in developing countries, especially since the conditions assumed under the model are satisfied in many countries. However, the recent decline in fertility in a number of countries limits its usefulness, since the estimates tend to be sensitive to changes in fertility.
14. Coverage errors refer to under- or overenumeration, whereas content errors refer to response quality of specific questions. Several methods of evaluation involve comparative analyses of data from successive censuses.
a. Comparison of census with other sources of data
15. The estimate of the population based on the most recent census can be compared with data from other demographic systems, such as vital statistics of births and deaths and net migration between censuses.
16. Very often, one or more population censuses are used as benchmarks for post-censal and intercensal population estimates and projections. Estimates based on recorded data on components of change tend to be the most satisfactory. However, estimates are also made based on mathematical assumptions (arithmetic or geometric increase). Post-censal estimates may be used to evaluate a subsequent census, while subsequent censuses can also be employed to evaluate the methods used in estimations or projections.
b. Cohort component method
17. Population projections derived from the previous census data and fertility, mortality and migration statistics can be compared with new census results.
18. The population enumerated in the first census is “projected” to the reference date of the second census based on estimated levels and age schedules of fertility, mortality and migration in the intercensal period. The “expected” population is compared with the enumerated population in the second census.
· The population enumerated by age and sex in two successive censuses;
· Life-table survival rates for males and females assumed to be representative of mortality conditions in the intercensal period;
· Age-specific fertility rates for women aged 15 to 49 assumed to be representative of the level and age structure of fertility during the intercensal period;
· Estimated sex ratio at birth; and
· Where there is a substantial level of net migration, estimated levels and age pattern of international migration during the intercensal period.
19. Several developing countries have used this method. Since registration data are usually deficient and satisfactory adjustment is usually not feasible, indirect estimates of fertility and mortality levels are usually derived from two censuses. Where reliable estimates can be derived from sufficient information, this method can provide age- and sex-specific estimates of net census error. Hence, this method is the most powerful among other alternative methods for evaluation of the census.
c. Comparison of age distributions of two censuses based on intercensal cohort survival rates
20. The size of birth cohorts enumerated in successive censuses is compared. In a population closed to migration, the variations in the number of persons in a birth cohort between two successive censuses will be due to mortality. Hence, the ratio of size of the birth cohort in the first census to that of the second census should approximate the expected survival rate based on prevailing conditions of mortality.
Relatively little information other than enumeration from two censuses is needed to apply the method.
· Population by age and sex from two successive censuses;
· Life table assumed to be representative of mortality conditions in the intercensal period; and
· Volume of net migration by age and sex during the intercensal period.
21. This method is widely used in part because only information from two census counts is needed in its application. The usefulness of this method increases significantly when data from three or more censuses are available. In several studies, this method has been useful in assessing the extent to which distorted distributions are due to historical factors and demographic shifts rather than census errors.
d. Cohort survival regression method
22. This method uses regression to derive estimates of coverage correction factors to make age data from two censuses mutually consistent. It is an extension of the census survival method for assessing the relative level of coverage in the two censuses. It determines coverage error correction factors for the population enumerated in two successive censuses. When this is combined with data on the number of deaths during the intercensal period from vital registration or a life table, it makes the population in each cohort in the second census consistent with the size of the cohort in the first census and the implied mortality in the intercensal period. The ratio of the implied census coverage correction factors in the two censuses is used as an estimate of relative coverage in the censuses. Ordinary least squares regression (OLS) is used to derive the estimated coverage.
· Population counts by age from two censuses; and
· Deaths by age during the intercensal period, or the use of an appropriate life table for deriving age-specific survival probabilities over a period of time equal to the intercensal period.
23. Although this seems to be a useful method in assessing the relative completeness of coverage in successive censuses, there are few known applications in recent censuses.
· Availability of expertise. Many national population census and statistical offices in developing countries have qualified demographers and statisticians on their staffs. Hence, there is often no need for additional staff in this area or for technical assistance in undertaking demographic analysis of census data. However, the high turnover of staff may deplete the stock of available expertise, which may make developing countries dependent on outside assistance in demographic analyses of censuses. Nevertheless, there may be the need for further training of staff in the application of new statistical packages for demographic analyses.
· Integrity of census data. The steady advances made in census enumeration have generally led to improved accuracy. The need to produce accurate results may sometimes compromise the integrity of the data. For instance, overediting could, if rejection rates were high, lead to imposition of artificial values on much of the data. However, with improved evaluation methods (e.g., demographic analysis), we can publish data that reflect actual census responses and draw a clearer distinction between (1) primary data with an acceptable range of derived variables that stem from the actual census, and (2) secondary data based largely on sets of assumptions and indirect estimations.
· Cooperation. There is the need for continued close coordination among subject-matter specialists (data processing experts, cartographers, GIS experts, and demographic statisticians) throughout the census operation.
· Inadequate information. When inadequate information is collected in a census, it renders some census questions potentially unusable. A case in point is the collection of data on deaths occurring in the year preceding the census but lack of critical information on sex of the deceased, which would enable complete gender-disaggregated analysis. Also, the omission of a crucial variable such as place of birth can affect the use of census data for migration analysis. Various difficulties may be posed by the data for estimating both internal and international migration.
· Funds. Sometimes only a sample of the census returns is ever tabulated because funds run out or there are so many delays in the compilation and processing. This undoubtedly can affect the use of demographic methods in the evaluation of census data.
Arriaga, E. A, P. D. Johnson and E. Jamison (1994). Population Analysis with Micro-Computers, Volumes I and II. Washington D.C.
Shryock H. S, J. S. Siegel and Associates (1976). The Methods and Materials of Demography. New York: Academic Press.
United Nations (1983). Indirect Techniques for Demographic Estimation, Manual X. Population Studies, No. 81. Sales No. E83.XIII.2.
United Nations (1998). Principles and Recommendations for Population and Housing Censuses, Revision 1. Statistical Papers, Series M, No. 67/Rev. 1. Sales No. E.98.XVII.8.
United States Bureau of the Census (1985). Evaluating Censuses of Population and Housing, Statistical Training Document, ISP-TR-5. Washington, D.C.: US Bureau of the Census.