Symposium 2001/06

6 July 2001

 

                                                                                                           English only

 

Symposium on Global Review of 2000 Round of

Population and Housing Censuses: 

Mid-Decade Assessment and Future Prospects

Statistics Division

Department of Economic and Social Affairs

United Nations Secretariat

New York, 7-10 August 2001

 

 

 

 

 

 

 

 

 

 

 

Adapting new technologies to census operations  *

Arij Dekker**



CONTENTS

 

Summary. 22

A. Introduction. 31

B. Management, communication, logistics and quality assurance. 53

C. Data capture. 69

1. Intelligent character recognition (ICR). 70

2. Automatic coding. 100

3. Outsourcing and decentralization. 113

D. GIS, Remote Sensing and GPS. 121

E. Data processing and storage. 139

1. Census-processing software. 140

2. Data storage. 156

F. Use of the internet 158

1. The Internet for data collection. 159

2. The Internet for data dissemination. 185

G. Data dissemination: other issues. 207

1. Statistical disclosure control 208

2. High-capacity physical media. 210

3. Structured archives: the statistical data warehouse. 211

H. How to choose appropriate technology. 213

I. Future technologies. 213

J. Conclusions. 214

K. Discussion. 214

References. 215

Glossary. 217

 



Summary

Adapting New Technologies to Census Operations

 

Even small improvements in census technology can result in important gains in the quality and cost-effectiveness of the whole census operation. At present a number of organizations are attempting to help bring innovation to census and statistical operations. Among the concerns regarding new technology are these: how to choose appropriate technology; how to maintain the integrity of existing statistical systems; how to deal with outsourcing certain tasks; and how to maintain confidentiality of data. Some technologies, such as mobile telephony, have made person-to-person communication in the field easier, as have fax and e-mail capabilities. Bar-code technology has made management of materials more efficient.

 

In the 2000 round of censuses, intelligent character recognition (ICR) made a breakthrough in many countries, although illegible handwritten characters and badly printed questionnaires still led to problems. In general, countries that planned carefully for the new technology and conducted pre-tests were more successful in their operations. The next step, automatic or computer-assisted coding, is also being explored, and some data, such as geographic names, may lend themselves to such coding. For some census operations, especially one-time, high-volume tasks such as data entry, outsourcing may be a good solution. Contractors with the necessary equipment and skills can supplement the census staff, but outsourcing also raises questions of overcoming bureaucratic obstacles, managing the contractor and enforcing confidentiality rules.

 

Census mapping has made great strides in the last few decades, from an activity requiring extensive fieldwork and manual drawing to one using remote sensing and computer-assisted map production. Geographic information system (GIS) technology is increasingly being used in population and housing censuses to generate maps for enumeration and for data presentation purposes. Global positioning systems (GPS) are cheap and available, and they can be used by cartographic field staff to annotate topographical maps and satellite photographs to produce excellent maps for enumerators. 

 

Data-processing software for censuses, which was previously developed and provided by non-profit agencies, is being supplanted by commercially available software. However, customizing general-purpose software for census purposes requires considerable programming skills, which may not always be available in a census organization.

 

The Internet as a tool for census data collection is still in its infancy, although several countries did allow some Internet enumeration in their most recent censuses. Generally, such data were collected from a small portion of the population on an experimental basis. Problems with this method include the need for authentication from each household; lack of coverage of households in many countries at this stage; and the fear that hackers could compromise the integrity of the census. Moreover, data collected via the Internet would have to be integrated into other data streams, including mail-back questionnaires and telephone responses. As a tool for data dissemination, however, the Internet is quickly becoming the principal medium, and statistical offices are responding with more electronic publications and effective web sites. Technology is also under development for the storage of census data, including data “warehouses”, which would contain all the data and metadata from a census.

 

It is impossible to create a single set of guidelines to help census planners choose the best new technology. Choices depend on the magnitude of the project, the availability of local skills, the funding situation, prior experience, time for preparation, and other factors. Census planners need to be conservative, because their solutions must be right the first time. New technology should never endanger the continuity of existing reporting systems and if possible should reinforce it.


A. Introduction

1.                  It is commonly known that the art of population census taking goes back many centuries. Ever since the end of the nineteenth century, there have been efforts to take advantage of a succession of newly available technologies to make such large and costly statistical enquiries more efficient and effective. A census is labour-intensive, requiring large numbers of temporary staff. Personnel costs usually are the principal component of census budgets, with expenditure for information and communication technology coming second.

 

2.                  Even small improvements in the methodologies used, or in the effectiveness of the equipment, can result in important gains in quality and/or cost-effectiveness of the whole operation. Census budgets depend on national cost levels and the depth of the enquiry, but generally vary between a few dollars per capita in low-cost countries to as much as 30 dollars per capita in highly developed environments. A rough estimate of the total expense of the current round of censuses would put it between 30 and 50 billion dollars. This is certainly an enticing target for those trying to improve the rate of value-for-money.

 

3.                  The name of Herman Hollerith stands out as an early adaptor of modern technology to census work. He borrowed from the ideas of Joseph-Marie Jacquard, who had invented punched cards to control looms. Hollerith saw a way to use such cards in sorting and tabulation. By doing this he not only expedited the release of the results of the 1890 US census; he started an entire industry.

 

4.                  There have been many less-known census innovators who have put newly discovered methods and technology to good use. Information technology has usually been on the forefront of these efforts. Census data-processing equipment has graduated from machines just assisting tabulation work, to indispensable tools in virtually all phases of census work. Computers are used for planning, to support mapping, in project management, in all stages of data capture, cleaning, coding, and reporting, and in demographic analysis (Dekker, 1997). Many of the recent improvements in census taking have been possible thanks to the ever-growing capabilities of data-processing equipment and communication networks operating on local, national, and worldwide levels. For the sake of continuity it is important that the use of newer technology is embedded into, and builds upon, existing sound methodology (United Nations, 1998).

 

5.                  There are presently several important efforts to bring coordination and focus to the innovation process in official statistics and census taking. One is the Paris 21 initiative: Partnership in Statistics for Development in the 21st Century. The members of Paris 21—there are several hundred of them—are drawn from leading national and international statistical agencies, academic institutions, etc. One of the several issues currently being reviewed by the experts combining their efforts under the Paris 21 initiative is how census work can be made more cost-effective (See the web site at http://www.paris21.org for details).

 

6.                  The United Nations Statistics Division (UNSD) has a long history of furthering sound statistical principles and the sharing of know-how. A web site giving access to information on good statistical practices has recently been opened (http://www.esa.un.org/unsd/goodprac). On a regional scale, Eurostat has conducted a series of technical seminars by the names of NTTS (New Techniques and Technologies for Statistics) and ETK (Exchange of Technology and Know-how). The 2001 meetings on these issues were conducted in June in a combined form on Crete, Greece.

 

7.                  Noteworthy also is the Eurostat web site by the name of VIROS (Virtual Institute for Research in Official Statistics, web site http://www.europa.eu.int/en/comm/eurostat/ research/viros). VIROS identifies and classifies areas of research where participating organizations may place the results of their studies and experiences, while remaining entirely responsible for it. Eurostat acts as a central coordinator, attempting to integrate the individual elements into a coherent set. The ultimate goal is to facilitate access to information on research activities and results. Eurostat is naturally interested in such issues, facing, as it does, the need to combine many statistical traditions, and overlaying them where possible with state-of-the-art integration technology.

 

8.                  When considering the technological options before them, census offices face a number of questions. Some of these are:

·        How to make an informed choice in selecting appropriate technology;

·        How to maintain the integrity of the existing statistical and census systems;

·        How to deal with the option of outsourcing[1], and management of outsourced tasks; and

·        Confidentiality concerns relating to the preferred solutions.

 

9.                  This paper will look briefly at various areas where census work has recently benefited from new technology and will discuss the issues referred to above. Definite answers on the questions raised can be formulated only by individual census organizations themselves.

B. Management, communication, logistics and quality assurance

10.              A nationwide census differs in many respects from day-to-day statistical work. It lacks the repetitive nature that allows collections with a greater periodicity to gradually be improved. The level of expenditure and number of staff are much higher than statistical managers are used to. Some governments therefore establish census offices separate from the national statistical agency. It may be necessary to recruit professional management, experienced in dealing with large but temporary organizations. Since a census can be seen as a large time-critical project, with many interlocking operations, the use of modern project management software is of vital importance.

 

11.              A census operation requires efficient communication between thousands of persons, as well as procurement and storage of a large variety of items, most of which have to be distributed to all corners of the country and then recollected.

 

12.              Recent developments in mobile telephony (cell phones) have made person-to-person communication easier, even in countries with extensive and reliable fixed-line networks. But complete mobile coverage has not been accomplished in most developing countries. Census communication with remote areas continues to be problematic in some cases. It is still possible that satellite telephone systems, which function everywhere on earth, will fill this void. Some ambitious projects in this domain, such as that known as “Iridium,” have not drawn enough initial subscribers. But with most of the enormous investment costs now written off, user prices are coming down. The ground stations including antennas are still rather voluminous but completely portable. Operations planners need to be cognizant of all communications options open to them, including regional differences, and make arrangements accordingly.

 

13.              Where printed or printable communication is required, fax technology is rapidly giving way to electronic mail. This is true for census operations, but relying on e-mail entails vulnerability to Internet service interrupts, computer illiteracy and virus attacks. It is important always to keep a fax capability for backup.

 

14.              Improved computer software and wide availability of personal computers (PCs) have made managing the movement of goods much easier. Bar-code technology can be a key element in this. Using bar codes instead of printed numbers has advantages in avoiding transcription errors and speeding up processing. A combination of the two can be used if easy human recognition of the codes may also sometimes be required. Census managers, who are not logistics professionals, tend to overlook this established technology.

 

15.              A typical application of bar-code technology is to label all items specific for a particular enumeration area (maps, enumerator identification, summary sheets, transport box) with a specific bar code. At the point where the materials are sent out, the codes will be scanned, allowing automatic update of a database of items forwarded. The same process can be used to maintain a database of items retrieved from the field.

 

16.              Labeling individual questionnaires with unique codes can also be helpful, although the resulting administrative overhead is considerable. Such identifiers can protect against the fairly common problem that entire batches of questionnaires arrive back erroneously geocoded. Standard retail scanners, but also most intelligent character recognition systems (see Section C.1), will read bar codes without difficulty.

 

17.             Quality assurance, including the use of scientifically sound sampling methods,  should be an integrating part of all census operations. Many of the methods in this field depend on statistical principle and have been developed by statistical innovators (Deming, 1986). The census office must strive for a consistent level of assured quality throughout its operations, and cannot afford to disregard the techniques that help to achieve and verify it (Statistics Sweden, 2001).

C. Data capture

1. Intelligent character recognition (ICR)

18.              It is probably true to say that the current round of censuses has seen the breakthrough of ICR technology. In the 1985-1994 round only about 20 per cent of countries undertaking censuses used some form of character or mark recognition (Decker, 1994). The large majority still relied on keyboard data capturing. In the current round nearly all census offices of industrial market economies—and numerous other ones—apply imaging through scanners, recognition software and other tools required to partially do away with manual data entry.

 

19.              There is no doubt that recognition technology has made great strides in the last decennium, but it seems true also that the example provided by census “pioneers” has made switching course easier for those organizations that otherwise might have hesitated. ICR offers a promise of greater efficiency, but it is inherently riskier than keyboard data entry. For example: poorly designed or badly printed questionnaires are a nuisance in manual data entry, but may sink an anticipated ICR data-capturing operation. The need for elaborate pre-tests, already so obvious in traditional census taking, is even more apparent when scanning technology is to be used.

 

20.              The main fundamental problem still existing is that handwritten characters are often poorly recognized where the writer is not already familiar to the recognition system. In censuses which use auto-response or a large number of enumerators, this obviously is the case. To avoid the problem, it is possible to limit the automatic recognition to marks or numeric digits only. But even digits cannot always be reliably interpreted, so quite a few manual data-entry personnel will still be required to fill the gaps.

 

21.              Scattered information suggests that the ICR process proceeds not always as smoothly as anticipated. Experiences obtained during the final operations tests induced the United States Bureau of the Census to move from a one-pass to a two-pass processing system, where sample data from the long forms will be computer-stored only during a second capturing operation (Prewitt, 2000). This change of approach has had no effect on processing deadlines. Some European countries (for example, Estonia) have reported difficulties in recognizing handwritten alphabetic characters, requiring them to hire additional staff to assist the automatic recognition process. A recent meeting in Bangkok (United Nations, 2001) heard about problems of varying severity in China, Indonesia, Macao Special Administrative Region of China, the Philippines and Thailand[2]. (For information on the details of the problems experienced, retrieve the country papers from the web site at http://www.unescap.org/stat/pop-it/pop-wdt.htm.)

 

22.              In Thailand, earlier plans to establish 15 regional ICR centers for the April 2000 census were cancelled after more sophisticated (and expensive) scanners and software turned out to be required. A single ICR complex now operates in Bangkok (Fujitsu 4099 scanners, TeleForm software). Some problems were reported with poorly written characters and scanner maintenance.

 

23.              The census of the Philippines on 1 May 2000 works with four decentralized capturing centers, using Kodak 3590 scanners and Eyes and Hands software. One of the biggest problems here is that the print quality of some questionnaires is not in accordance with specifications, which causes the ICR software to tag them as unidentifiable. Another difficulty is illegible handwritten entries. The number of verification licenses, required to manually correct such rejects, had been underestimated. This has been a learning process. Experiences are sufficiently positive to use ICR again for the upcoming census of agriculture and fisheries.

 

24.              The Macao Special Administrative Region of China reports good results for its pilot operation for the 2001 Census. The paper contains an interesting table, obtained from a sample of 150,000 images of digits. The table does not immediately confirm the effectiveness of ICR as implemented. It would seem useful to dispense training to enumerators about how to best write certain numerals.

 

Digit

0

1

2

3

4

5

6

7

8

9

All

Recognition rate (%)

94.83

96.83

94.92

91.11

96.00

94.95

97.29

97.72

90.43

81.74

95.64

Reject rate (%)

5.17

3.17

5.08

8.89

4.00

5.05

2.71

2.28

9.57

18.26

4.36

Accuracy rate (%)

99.38

99.89

99.78

99.73

99.89

99.41

99.79

99.59

99.12

100.00

99.72

Error rate (%)

0.62

0.11

0.28

0.27

0.11

0.59

0.21

0.41

0.88

0.00

0.28

 

25.              ICR for the 1 July 2000 Census of Indonesia is handled by 29 processing centers throughout the country, using Kodak DS 3500 scanners and NCS NestorReader recognition software embedded in own Visual Basic programming. The country paper reports many troubles that hamper the census ICR operation. These include sub-standard questionnaire printing (despite elaborate quality controls), poor writing by enumerators, inadequate document handling in the field resulting in unusable forms, scanner maintenance problems, and complex file management. The authors deserve the highest praise for sharing these experiences for others to learn from. The massive nature of the operation in Indonesia, scattered civil unrest, financial constraints, and various logistics problems have obviously all been a factor here. Despite the difficulties, the Central Bureau of Statistics (CBS) of Indonesia is confident that the data-capture operation will be completed successfully. 

 

26.              The October 2000 Census of Aruba (not reported in Bangkok) used Fujitsu M3079DG scanners and Eyes and Hands software. All data for this small country of about 100,000 people were captured by April 2001. The operation was quite carefully prepared, and proceeded smoothly, including the integrated computer-assisted coding work. There were no cost advantages compared to keyboard data entry.

 

27.              Such problems as are reported can be divided into those that have to do with the recognition process itself, and all other ones. If the recognition rate is unacceptably low, this can usually be remedied by reducing the pre-set security level. But there is a price to pay: error rates will go up. Other problems may include unreliable paper transport in the scanners, which can have plenty of causes, including dirt, the use of correction fluid on sheets, and damaged forms, possibly as a result of bad weather conditions. It is not unheard of that such difficulties require large numbers of questionnaires to be transcribed, again increasing error rates.

 

28.              As a general rule, success is often reported by census offices that went through a long and careful preparation process, including several pre-tests. Those that have to cut short on the groundwork may become the source of less fortunate stories. Complete quality assurance management—for example, in the printing process of the questionnaires—is of the essence here.

 

29.              If recognition of handwritten text is now becoming a more reliable tool, it would be logical to think of speech recognition as the next step. After all, this is a more direct method of data collection. Speech recognition has broad economic potential and is a topic of much research. Some commercial applications of this technology are appearing, especially in processing verbal instructions received by telephone, and in the automotive industry. But progress in this area has been slower than expected. Statistical applications are still rare.

2. Automatic coding

30.               Recognizing verbal texts usually has the purpose of accommodating associated automatic coding. That is, the computer reads a text—for example, the name of a geographic area—and then selects the applicable code from an associated file or database.

 

31.              Such solutions, which ideally would allow completely automatic data capture and coding, depend on two prerequisites: (1) the recognition process must be sufficiently reliable and (2) the search algorithms do indeed lead from the recognized term(s) to the appropriate code. A 100-per-cent character-recognition rate is not required, since the algorithm may still be successful with incomplete or partially mangled terms.

 

32.              However, there are indeed problems with this process. First there is the recognition reject rate, as referred to above, which might require an unexpected level of human interference. Next comes the difficulty of automatically determining the applicable codes, the severity of which depends on the nature of the variable concerned. Geographic terms are usually not too difficult to code automatically, except perhaps for the lowest level (e.g., village), where spelling may not be standardized and homonyms occur. Occupation and industry tend to be more problematic. Despite the efforts by census field staff to extract full information from respondents, these variables will often be reported in terms that cannot be easily linked to ISCO, ISIC or NACE codebooks (see Glossary for terms).

 

33.              The issues of automatic and computer-assisted coding have been the subject of considerable research (Meyer and Rivière, 1997; Dopita, 1999; Blum, 1997). The tasks are a challenge to those applying modern methods of artificial intelligence, neural networks, and fuzzy logic[3]. But however elegant and advanced the matching algorithms are, once reporting from the field is multi-interpretable, too general, or otherwise inadequate, there is no easy way out. Many specialists feel that in those situations it is difficult to conceive automatic solutions that approach in quality the judgement of an experienced human coder. By letting the computer take care of the simpler cases, and relaying the remainder to human coders, an efficiency gain can nevertheless be obtained.

 

34.              As to the coding of industry, it may be noted that this can be improved by using a register of establishments or enterprises, and their known ISIC or NACE codes. Respondents may find it easier to report the name of their employer than to describe the principal economic activity of the company. This approach obviously requires the existence of a comprehensive national business register.

 

35.              In conclusion: ICR in censuses has certainly not become an off-the-shelf technology. It requires careful design and extensive testing of questionnaires. The integration of ICR with associated operations, such as coding, needs ample prior thought and a clear strategy, again to be tested for effectiveness.

 

3. Outsourcing and decentralization

36.              Census data entry, through ICR or otherwise, is a potential candidate for outsourcing. Since it is a one-time high-volume application, there might be contractors that possess equipment and skills allowing them to offer the census office conditions that it could not match in an in-house operation. Meanwhile, it should be noted that outsourcing brings responsibilities of contracting and monitoring that require resources too. Confidentiality concerns multiply where outside contractors dealing with individual data are concerned. Quality assurance, already a major consideration in any event, becomes even more crucial if outside contractors are involved (see, for example, Whitford and Reichert, 2001). It would be attractive if the contractor could work within the census premises. In any event, contractor staff should be subject to confidentiality rules at least as severe as the ones imposed on temporary census staff.

 

37.              It should be noted that managers with an excellent in-house management record may still have difficulty controlling outsourced work, which requires different skills. These include knowledge of the service market, awareness of legal issues, negotiating skills, and more. In a census situation one easily ends up in circumstances where the supplier is in control, since the census organization, even while unhappy with the services provided, cannot afford to turn away.

 

38.              Sometimes government regulations put barriers in the way of outsourcing tasks that could better be assigned to specialized providers outside the census office. That situation obviously should be changed, but most likely the required reforms need to be implemented at a government level different from the one supervising national statistical services.

 

39.              Decentralized data capture would allow the census organization to keep matters in its own hands, but obtain advantages by spreading the work to its regional centers. The problems are somewhat comparable to outsourcing, although easier managed. Much depends on the local situation: magnitude of the task at hand, conditions of the labour market, efficiency of communication and transport and so forth. Assigning more work outside the capital may also have a social and public relations benefit. General guidelines in this domain are impossible to formulate.

D. GIS, Remote Sensing and GPS

40.              A more comprehensive discussion of these issues can be found in the paper on “Identifying and resolving problems of census mapping,” also presented in this Census Symposium. Since new mapping technology is an essential part of census innovation, brief remarks are included here.

 

41.              Mapping technology has made great strides over the past decades. It has moved from an activity depending on field exploration and manual drawing, to one using remote sensing and computer-assisted map management.

 

42.              While aerial photography from airplanes was used for census mapping (mostly for dense urban areas) before the era of satellite technology, the latter offers a much more cost-effective solution for remote sensing. Commercially available satellite pictures provide resolutions well beyond those required to identify individual buildings. Availability of such photographs greatly reduces—but certainly does not remove—the need for on-the-ground inspection.

 

43.              The fieldwork itself benefits from the now common availability of cheap hand-held global positioning systems (GPS), which again depend on satellite technology. Topographical maps and satellite pictures establish the starting platform for census field mapping. Cartographic staff armed with maps, pictures and GPS systems can now complete and annotate the maps to produce excellent orientation material for enumerators.

 

44.              Maps are now usually produced, stored, and updated using specialized computer systems and commercial software. The essential elements of satellite photographs or paper maps can be digitized by hand-tracing on digitizing tablets. Once the maps are finished, they can be printed and reprinted at will. The vector images are stored in computer files without the risk of degradation over time.

 

45.              It is in this context useful to point to a growing tendency for national statistical agencies to establish basic statistical reporting areas independent from the administrative territorial organization (Jacob and Royer, 1999), sometimes in the form of a grid of squares[4]. The reporting areas should be large enough to maintain individual response confidentiality, yet small enough to allow regrouping of these statistical areas into the lowest level of administrative territorial units. The approach removes some of the problems of maintaining time-series in the context of ever-changing administrative borders.

 

46.              The value of census information is enhanced if combined with underlying base maps which permit users to generate thematic maps of their choice. Several census offices now market integrated products—usually on CD-ROM—that provide this capability. Other offices adhere to the opinion that such a service goes beyond the task of national statistical agencies, and limit themselves to providing aggregated census data to commercial publishers. This should not be confused with outsourcing, since responsibility for the final product lies with the counterparts. The census office is accountable only for supplying reliable data that respect the requirements of statistical disclosure control.

 

47.              Many statistical agencies maintain one or more geographic information systems (GIS) for their own use. At the same time it is widely accepted that the role of statisticians is to provide data of the best possible quality to users. In many cases the task of integrating information from various sources into complex GIS systems is best left to others. This is especially true if such GIS systems serve a specialized user community, such as urban planners or environmentalists.

 

48.              Electronic maps have become indispensable and cost-effective tools for a wide range of operations in censuses and statistics.

E. Data processing and storage

1. Census-processing software

49.              Many countries, especially those in the developing world, have long relied on public-domain software for their census-processing requirements. Such software was built and maintained by non-profit agencies, usually supported by subsidies from national or international donors.

 

50.              It would appear that overall there has been less effort in this respect recently than at the time of previous census rounds. This can be explained partially by the growing capabilities of commercially available software. There may also be a case of donor fatigue. Donors tend to prefer to think in terms of projects with a clear beginning and end. Developing and maintaining a software system is a never-ending task, since changing hardware and software environments require ongoing support and re-development efforts, which can be considerable.

 

51.              Due to the relative scarcity of new (re-)development, some public-domain census or survey processing systems are starting to look a little obsolete. They may, for example, be completely or partially DOS-based. Even while that software might be as effective as ever, and perfectly able to do the job, the DOS (Disk Operating System) interface is unfamiliar to a new generation of users. They also may find it difficult to convince their supervisors and peers that it is preferable to work in an apparently dated environment. Using modern tools is better for a data-processing person's professional reputation. A consequence of these developments appears to be increasing use of alternative software, such as commercial statistical software systems (SAS (Statistical Analysis System), SPSS (Statistical Package for the Social Sciences) and others) and database application generators (MS Access).

 

52.              Some recent announcements have improved the picture for non-profit software. The United States Bureau of the Census, through its International Programs Center, is now offering additional modules of its CSPro Census and Survey Processing system, which is being developed in cooperation with Macro International and Serpro S.A. (See the CSPro web site at http://www.census.gov/ipc/www/cspro). CELADE, the Population Division of the United Nations Economic Commission for Latin America and the Caribbean (ECLAC), continues work on more advanced versions of the statistical database system Redatam (See web site at http://www.cepal.cl/celade-eng.).

 

53.              Developing census-processing applications in software not specifically intended for that purpose can be described as customizing that software for census purposes. It requires programming skills that are not always readily available. Some associated use of modern object-oriented programming languages is nearly unavoidable. There is also no particular place where scheduled training in such a specialized subject (developing census applications in general-purpose software) can be obtained. As a result, census organizations have relied on outside contractors, which did not necessarily understand fully the statistical issues involved. In this sense the current situation as regards census processing is more complex than that of the preceding census round.

 

54.              On the other hand, where initially enough basic computer skills were available to the census office, census data-processing staff may have received additional exposure to modern general-purpose software. This will benefit other statistical development work, or their careers, or perhaps both.

 

55.              The difficulty of customizing general-purpose software for census applications should not be underestimated. It can be considerably more complex than applying specialized census software. Outsourcing the assignment might only compound the problems. Contractors to be entrusted with the duty of developing census-processing systems should have a proven record in such work. The census office will still need specific expertise to undertake the task of contracting and supervising the activities.

 

56.              The broader issue of statistical disclosure control, including cell-suppressing software, required by all census offices to protect the confidentiality of individual responses, will be briefly discussed in Section G.1 below.

2. Data storage

57.              Census data used to be stored often simply as flat files. A principal concern was to make sure that the data and meta-information were properly preserved over time. This was done in order to guarantee that additional computer analysis would be possible at some later stage—for example, at the occasion of the next census. Statistical agencies are now increasingly aware of the fact that data from various collections can have much added value if preserved, with associated metadata, in a common storage structure, sometimes called a “data warehouse”. While this fashionable term may go as quickly as it came, the underlying principle is unchallenged. The relational storage model has been explored for depositing statistical information, but not always to complete satisfaction (See Section G.3 on structured archives.).

F. Use of the internet

1. The Internet for data collection

58.              While electronic mail has been fairly common since the late 1980s, wide access to the contents of the Internet at reasonable transmission speeds was still unusual in the previous round of censuses. Problems with the use of paper-based questionnaires had become apparent long before. In many countries response rates on mailed questionnaires are declining, a consequence of respondent fatigue and, perhaps, a diminished sense of civic responsibility. Where enumerators still personally visit dwellings, the chances of finding respondents at home during working hours have become smaller due to modern lifestyles and smaller household sizes.

 

59.              Census offices have proposed and/or used various measures to remedy these problems. These include more elaborate information campaigns and efforts to mobilize the cooperation of civil societies, having enumerators work weekends and evening hours, approaching respondents by telephone, and sampling the initial set of non-respondents (thus, in a sense, giving up on complete coverage). While some successes have been reported, the efforts and costs required to obtain an acceptable response rate are now considerably greater than before.

 

60.              Thus it is only logical that attention has focused on the Internet as a gateway into an increasing number of homes. Using “push” technology, it would be possible to deliver to each Internet-connected household a uniquely identified electronic questionnaire, possibly already pre-filled with basic data obtained from the civil registry. Respondents would correct and complete the information, and then return it by data transmission to the census office, which would receive an electronic record, thus avoiding most of the data entry work.

 

61.              Electronic data collection from establishments (including enterprises, government agencies and public-sector entities) has already become fairly common. If households and individuals are approached in the same way, one could use the methods of CASI (Computer-Assisted Self-Interviewing) (Figueiredo and Lucas, 1999; Keller, 1999), which can render valuable assistance to respondents and prevent mistakes.

 

62.              Unfortunately, there are as yet several problems that hold back electronic data collection from households:

·        Incomplete coverage: While the number of households having access to the Internet grows rapidly nearly everywhere, there are only a few countries where the connection rate has surpassed 50per cent;

·        Bias: Internet access is more common to affluent and younger households; therefore, wide use of this data-collection channel might result in a biased response pattern;

·        Unstructured address system: As compared to the postal system or the telephone network, the Internet addressing system is much less regulated, which reflects the origins of the Internet. Subscribers largely invent their own addresses and may change them at any time. They could have one or several addresses. It would be a major effort to assemble current e-mail addresses of households at any given point in time, and nearly impossible to maintain such a register with a degree of reliability. This essentially precludes the use of push technology for censuses at the present moment.

·        Attraction to hackers: There is little doubt that allowing respondents to use the Internet would attract hackers, who would consider it a challenge to be enumerated twice, use someone else’s identification, or worse. Census offices understandably are not looking forward to such challenges.

 


 


Figure 1.  Swiss census data collection via the Internet (demo version, partial screen)

 

63.              Notwithstanding these difficulties, several census offices, including those of Switzerland (Figure 1; see also the web site of the Swiss Federal Statistical Office at http://statistik.admin.ch), the United States and Singapore, have allowed electronic response during the current round of censuses (Haug and Buscher, 2000; Prewitt, 2000; United Nations, 2001). This did not involve “push” technology. Rather, respondents would be required to take the initiative themselves by downloading census forms or completing them while online with the census office. To avoid misuse, it is essential that each household can authenticate its response. This might involve certification with a unique identification code, unknown to others. That code, then, has to be delivered to the household, which may have to be done by hand delivery. Safe and reliable electronic delivery of authentication codes is again a problem difficult to tackle within the current state of technology. Electronic response requires encryption on the browser side, since unprotected responses could be intercepted.

 

64.              The United States census limited Internet response to the short form only and did not undertake major efforts to publicize or recommend this method. It appears that in all three countries, demonstrating that the census organization is in tune with modern technology was a factor in opening up the Internet channel.

 

65.               Since it is unlikely that the printed questionnaire could be abandoned at short notice, census data collection via the Internet requires seamless integration of the two data streams. There were three streams in the case of Singapore, where response via telephone (CATI) was another alternative.

 

66.              Several census offices—for example, the Office for National Statistics of the United Kingdom—have reported that they have decided not to use the Internet for data collection at this time, having studied the dangers still present in those uncharted waters. Statistics Canada has been conducting an “Internet Test” in two distinct geographical areas for its census of 15 May 2001.

 

67.              In conclusion, there remain a number of problems, of a varied nature, that so far have prevented the wide use of electronic questionnaires for census purposes. The Internet needs to grow, and methods suitable for census data collection by means of it have to be developed and tested. Expectations are that the situation surrounding electronic response will have evolved significantly as the next round of censuses approaches.

2. The Internet for data dissemination

68.              The technology of dissemination of statistical information is undergoing a fundamental shift. The printed publication has certainly not disappeared and remains important, for example, to provide a permanent and continuously accessible record, and for easiest browsing. But online consultation of statistical sites—with or without payment for the information obtained—is becoming the principal avenue of information dissemination. This takes place via the Internet, since the independently managed bulletin boards, to be reached through point-to-point communication with the information provider, cannot offer comparable user comfort.

 

69.              The challenge to statistical offices is considerable. Long used to the relative peace of carefully preparing a publication and then waiting for it to come into print, they now must adhere to a strict calendar of electronic release. Users always want the data sooner but will complain when the data have to be revised later or—worse—turn out to have contained any error.

 

70.              Under these conditions, designing a dissemination strategy has not become any simpler. The user community rightfully expects statistics to make full use of new media, yet there continues to be substantial demand for paper publications. This may happen in a situation of restricted funding and shortage of technical skills. Statistical offices must not only formulate a strategy but also revisit it periodically. Where costs dictate it, the use of dissemination outlets needs to be adjusted on the basis of reports on their use. Cost recovery may help to improve the situation.

 

71.              Just like printed publications, publication in electronic form can be of varying cognitive quality. This may be even more true for electronic publications. Furthermore, the rapid technological developments make providing the best possible interface a moving target. Eurostat in its NORIS (Nomenclature on Research in Statistics) identifies the following examples of research in this area (see also web site at http://europa. eu.int/en/comm/eurostat/research/viros):

 

·        Contributing to Internet-related standardization activities so that statistical requirements can be taken into account;

·        Bandwidth-intensive applications: statistical queries, audio- and video-broadcasting;

·        Use of intelligent agents (knowbots) for information interchange;

·        Improving man-machine interfaces, including the use of virtual reality;

·        Application of GIS technologies to improve the visualization of geographically oriented statistical information.

 

72.              The capability of a statistical organization’s web site is becoming of ever greater importance. Statistical and census organizations nowadays are assessed not only on the quality and timeliness of their printed information, but also, and perhaps more important, on the effectiveness of their web presence.

 

73.              Appropriate measures should be taken with regard to web sites. They must be built and maintained by professionals. There should be, if at all possible, a continuous monitoring system of user satisfaction and visitors’ browsing behavior. This is for the purpose of easing access to popular items, noting signs of apparent user confusion, and general continuous improvement of the site. If dynamic access to databases is offered, such applications should be reasonably bug-free and have reached sufficient maturity (United Nations, 2001). Launching a high-technology service that results in numerous disappointed users brings benefits to no one.

 

74.              The need to maintain a full range of up-to-date information and computer technology (ICT) capabilities, including web skills, in an environment where such qualities are in high demand, is a burden to many national statistical agencies. Outsourcing can be a solution, but since information dissemination is a core activity of official statistics offices, it is not an obvious alternative.

 

75.              As an aside it may be mentioned here that the Internet offers excellent possibilities to disseminate and retrieve international standards and guidelines for statistical work. An example is the classifications server RAMON developed by Eurostat (See web site at http://europa.eu.int/comm/eurostat/ramon.).

G. Data dissemination: other issues

1. Statistical disclosure control

76.              As the mass of readily accessible statistical information increases, there is an urgent need to improve the protection of individual information provided by persons or establishments, using techniques known as statistical disclosure control. The odds here could be shifting in an unfavorable direction, since statisticians need to provide more information faster, while ill-intentioned users attempting to filter out sensitive information have access to ever more powerful analytical computer tools, and they have time on their side. It has become impractical to visually inspect each table or data cube (see below) for potential risks, but automatic screening tools are coming to the rescue (Willenborg and de Waal, 2001; Giessing, 1999). They will suppress, combine, or otherwise obscure potentially risky cell values.

2. High-capacity physical media

77.              Information dissemination on non-rewritable high-capacity media also remains an important delivery channel, especially for massive data that are not highly time-sensitive, such as most census information. Censuses nowadays routinely result in the production of many CDs, and the first DVD products of much higher capacity have appeared (US Bureau of the Census, 2001). Data structures on CD-ROM and those underlying a web site can have much in common, including the use of browsing through hyperlinks. Parallel development of the applications is an efficient way to benefit from that.

3. Structured archives: the statistical data warehouse

78.              As already mentioned above, storage of census data in a “warehouse” structure favours its use in conjunction with other statistical information kept there. This is strictly speaking not a census issue, since it addresses the broader subject of statistical information management. A warehouse might consist of a number of data cubes, n-dimensional spaces, where one dimension consists of observations, and the others are selection dimensions. In a simple example dealing with a census cube, observations could be the total numbers of males and females, and selecting dimensions age group, place of residence, ethnicity, occupation, and so on.  Or, see the diagram in Figure 2 for an example in four dimensions (including three 3-dimensional sub-structures) from the area of business statistics.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2. Data cube in four dimensions (Basset and Stoyka, 1996)

 

79.              Cubes require the existence of a superstructure that allows them to be approached via hierarchical menus (“drill-down”) and logical combinations of keywords. Metadata need to be available too, preferably stored while avoiding redundancies. Storage formats other than the data cube should also be accommodated by the data warehouse.

 

80.              While practical applications exists and can be accessed through the web sites of several national statistical agencies, the subject remains a work in progress. Data warehouses are by no means restricted to statistics, and the topic is much broader than can be described here. (For more information, see, for example, Kambayashi et al., 2000, or explore the Internet.)

 

Figure 3.  Organization Chart of Divisions for Business and Social Statistics, Statistics Netherlands

 

81.              Taking this concept to a further level, one could impose the requirement that after completion of a census or survey the information gathered is to be stored in the data warehouse first, and that periodic or one-off publications are generated only by retrieving data from this central storage system.

 

82.              This concept is illustrated by the example in Figure 3 (Keller and Willeboordse, 2000). BaseLine is the final product of Registers and Surveys. It holds all data as supplied by primary and secondary data sources. MicroBase contains the data as they result from editing, imputation, translation and micro-integration. The output-aggregate database StatBase holds the results after estimation for (sub-)populations of statistical units. StatBase claims to contain all publishable data produced by Statistics Netherlands. The publication-data warehouse StatLine can be seen as a set of views on StatBase. It presents the total output of the Bureau as a structured set of multi-dimensional tables. StatLine is disseminated both on CD-ROM and on the Internet.

 

83.              Several national statistical agencies have done important work on these issues, such as Statistics Canada (CANSIM II), Statistics Sweden (PC-Axis) and Statistics Netherlands. A trial version of the Stat-series of programs (the name of the full package is StatSuite) is downloadable (from http://neon.vb.cbs.nl; the principal web site of Statistics Netherlands is http://www.cbs.nl).  PC-Axis retrieval software can be obtained from the web site of Statistics Sweden (http://www.scb.se) Other developers may also be willing to provide test versions of their software if requested.

 

84.              As regards the applicability of data cubes, it would seem that the principal problem is not so much their storage and retrieval, but logical design of these information containers. Formulating a comprehensive set of cubes that fulfill the various requirements of (1) easily accommodating the results of statistical collections; (2) satisfying the requirements of a wide variety of users and (3) fully respecting confidentiality concerns, is not a simple assignment.

 

85.              Whatever new or revised data dissemination product is being envisaged, the importance of extensive prototyping and launching “beta” versions—among real and critically minded users—cannot be overemphasized. This point was also made, and convincingly, by the recent ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies. The report of this Workshop and several of its other papers constitute highly recommended reading (United Nations, 2001).

H. How to choose appropriate technology

86.              At this point it would have been useful to provide clear guidelines to census planners about how to make an informed choice of technology and about approaches such as outsourcing and its associated risks. Unfortunately, this is impossible, as conditions and considerations vary widely, not only from country to country, but also—and with increasing speed—over time.

 

87.              There is not one preferable set of technologies for census operations. The best choice depends on the magnitude of the project, the availability of local skills, the funding situation, existing prior experience, available time for preparation, and many other factors. The current round of censuses shows a surprisingly wide spectrum of methods and techniques being used.

 

88.              Informed choices are never possible without the information being available to the decision makers. Census planners need to acquaint themselves with the state of the art, both at a national level and internationally. Preferably they should travel to comparable countries that have recently used methods and technology that may be of interest. The superiors of these planners need to recognize this need for exploration and allocate the resources for the task to take place.

 

89.              In deciding the parameters for a new census, one might want to look first at the preceding census. What worked well and what could use improvement? If an approach was satisfactory the last time, the arguments to replace it with something else need to be twice as strong.

 

90.              Every decision has a financial angle. If census costs can be reduced significantly while maintaining or even improving quality, that certainly should be worthy of serious consideration.

 

91.              Outsourcing is by no means the panacea that some would have it to be. Stories of success and failure are equally present. Here, as well as elsewhere, there is no substitute for solid fact-finding, careful negotiating, making sure that the chances of misunderstanding are minimal, and a continuous quality assurance programme (Whitford and Reichert, 2001).

 

92.              The final and most important consideration should be: What effect do the available alternatives have on the quality of service provided to information users? Statistical offices and census organizations live by the grace of the service they render to others. They need to strive incessantly to provide better information, in terms of timeliness, data quality, ease of access, completeness and pertinence. Any potential improvement in these areas merits review.

I. Future technologies

93.              There is little doubt that the ever-evolving technological environment in the future will have an even more profound effect on census-taking methods—perhaps moderated by legal requirements and confidentiality concerns.

 

94.              Already it has become possible to uniquely identify individuals through certain physiological characteristics, such as fingerprint or iris patterns, facial identifiers, or vocal frequency sequences (voice prints) (Jain, 1999). This technology has a bright future, due to applications too numerous to sum up here, such as ATM (automated teller machine) transactions, building access control and payment authorizations. It can remove the burden placed on people by the requirement that they constantly carry identification and credit cards and remember perhaps a series of personal identification codes.

 

95.              Biometric identification in combination with access to a database (perhaps wireless access) can remove the need for statisticians to ask respondents the same questions repeatedly about unchanging characteristics (e.g., date of birth, sex, ethnic origin or place of residence at prior census). In a broader sense, it would make it technically easier to establish and maintain electronic civil registers that would be more complete and current than those in existence today.

 

96.              As an intermediate step one could think of a personal multi-purpose chipcard (“smartcard”), from which information could be copied without manual transcription. Applications at this level are already widespread in banking, library management, medical services and more. In a rather far-reaching concept, such data could also be stored in personal “digital data vaults” on the Internet. Once authorized by the owner, information users such as census organizations would be able to retrieve from there the data items they require.

 

97.              The use of biometrics or personal chipcards in civil registration or censuses so far has been experimental at most, and digital data vaults are an idea that has just recently been launched. But it is easy to foresee that these and similar techniques, with all their various implications, will become the subject of increasing debate in the not-too-distant future. It is part of what has been termed "pervasive computing," an ever-growing presence of computer power and associated sensors and controls in daily life. Statistical organizations need to involve themselves in this debate to make sure that new developments and standards take their requirements into consideration.

 

98.              A few countries have dropped the door-to-door census for what is called an “administrative” or “virtual” census (Laan and Everaers, 2001). This may involve a comprehensive inspection and merging of various registers to arrive at the national universe of dwellings and persons. Again, it will be an advantage if the statistical office is a partner in the definition and maintenance of the principal registers. In other countries the “short form”, which contains the questions to be asked from everyone, has been reduced to a bare minimum.

 

99.              In both cases—administrative census and minimal short form—additional information is usually gathered through sample surveys. These methods share methodological ground. Good statistical practice prescribes the use of sampling methods wherever the underlying universe is sufficiently known. In future census rounds sampling technique will become increasingly important, and with it the need for statisticians to explain their ways to the world.

J. Conclusions

100.          New technologies make their way into census work, but not always as quickly and broadly as might have been expected. Census planners need to be conservative, since they know that their solutions must be right the first time. Nevertheless, one sees innovation turn into standard practice. This includes digital mapping, ICR, and electronic publishing. The Internet has become an essential medium for information dissemination and will grow in importance for data collection.

 

101.          The conditions under which censuses are conducted differ greatly between countries. Even for most sub-tasks there is no single best technological approach. Technical awareness, a sense of the realistic, a methodical approach, and plenty of preparation time, are the principal requirements for census planners.

 

102.          Census technology changes much more rapidly than the underlying statistical methods and principles. New technology should never endanger the continuity of existing reporting systems, and, if possible, should reinforce continuity.

 

103.          Outsourcing raises many problems, including confidentiality concerns, but it can deliver economies and resolve bottlenecks. Again, the local situation—including the management ability of the census office—determines whether it is a valid alternative. The solution is more evident for one-off special operations (e.g., census data entry), than for ongoing tasks, such as web-site management. Where outsourcing could offer advantages, but bureaucratic obstacles stand in the way, the obstacles should be removed.

 

104.          The current census round shows substantial technological evolution from the preceding cycle; in the next round the difference will only be greater.

K. Discussion

105.          Here are some slightly provocative questions that symposium participants may wish to discuss:

 

·        As compared to data capture by keyboard, ICR has advantages as well as drawbacks. Given technical problems experienced by some countries, is the move towards ICR justified by experience gained so far?

 

·        There is no doubt that ICR equipment interprets characters less accurately than human operators. Can we call that progress?

 

·        Do we still need technical assistance projects producing census software? Or can commercial software systems now fully cover census requirements?

 

·        Why, with CD-ROMs and the Internet, continue to print costly census reports?

 

·        Do “data cubes” present a valuable concept? Or is this just a solution looking for a problem? Are there more suitable storage formats for statistics?

 

·        Will the census of population and housing as we know it disappear because of ever-advancing technology?


References

Basset P., and A. Stoyka (1996). Statistics Canada’s aggregate output database – CANSIM II. Proceedings of the Conference on Output Databases, Voorburg, the Netherlands.

 

Blum, Olivia (1997). Editing and Coding Module. In New Census Technologies: The Israeli Experience. Proceedings of the Euro-Med Workshop, March 1997.

 

Dekker, Arij (1994). Computer methods in population census data processing. International Statistical Review, vol. 62, No. 1., pp. 55-70.

 

_____ (1997). Data Processing for Demographic Censuses and Surveys, with Special Emphasis on Methods Applicable to Developing Country Environments. UNFPA/NIDI, The Hague, ISBN 90-70990-67-9.

 

Deming, W. Edwards (1986). Out of the Crisis. Center for Advanced Engineering Study. Cambridge, MA: MIT.

 

Dopita, Patricia (1999). Population Census Evaluation, 1996 Census Data Quality: Occupation. Canberra: Australian Bureau of Statistics.

 

Figueiredo, José and Ana Lucas (1999). Potentials and Pitfalls of INE-P IS/IT Strategy on the Past Ten Years. Proceedings of the strategic reflection colloquium on IT issues for statistics. Eurostat, Luxemburg, September 1999.

 

Giessing, Sarah (1999). Transferable software for automated secondary cell suppression. Seminar on the Exchange of Technology and Know-how (ETK), sponsored by Eurostat, Prague.

 

Haug, Werner, and Marco Buscher (2000). E-census, the Swiss Census 2000 on the Internet. INSEE/Eurostat Workshop, “Census beyond 2001”, Paris, 20-21 November.

 

 Jacob, Michel, and Jean-François Royer (1999). Le recensement de la population de 1999. In Les actualités du Conseil national de l’information statistique.

 

Jain, Anil, et al., eds. (1999). Biometrics: Personal Identification in Networked Society. Kluwer International Series in Engineering and Computer Science, Volume 479. Kluwer Academic Publishers, Dordrecht, the Netherlands, ISBN 0-7923-8345-1.

 

Kambayashi, Yahiko, Mukesh Mohania and A. Min Tjoa, eds. (2000). Data Warehousing and Knowledge Discovery, Second International Conference, DaWaK 2000, London, UK, 4-6 September 2000, ISBN 3-540-67980-4.

 

Keller, Wouter (1999). Preparing for a new era in statistical processing: how new technologies and methodologies will affect statistical processes and their organization. Proceedings of the Strategic Reflection Colloquium on IT Issues for Statistics, Eurostat, Luxemburg, September 1999.

 

Keller, Wouter, and Ad Willeboordse (2000). Statistical Processing in the Internet Era: the Dutch View. Conference on Network of Statistics for Better European Compliance and Quality of Operation, Radenci, Slovenia, 13-15 November 2000. (This paper can be retrieved from the web site of the Statistical Office of Slovenia at http://www.sigov.si/zrs)

 

Laan, Paul van der, and Peter Everaers (2001). The Dutch Virtual Census. Meeting 66, ISI 53rd Session, Seoul, 2001.

 

Meyer, Eric, and Pascal Rivière (1997). SICORE, un outil et une méthode pour le chiffrement automatique à l’INSEE. International Blaise Users Group, Paris.

 

Prewitt, Kenneth (2000). Prepared Statement before the Subcommittee on the Census, Committee on Government Reform, U.S. House of Representatives (8 March 2000).

 

Statistics Sweden (2001). Q2001 – International Conference on Quality in Official Statistics. Organized by Statistics Sweden and Eurostat, Stockholm, Sweden, 14-15 May 2001. (Web site at. http://www.q2001.scb.se)

United Nations, Economic and Social Commission for Asia and the Pacific (2001). Report on the Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March 2001. (This report and other workshop papers can be retrieved from the ESCAP web site at http://www.unescap.org/stat/pop-it/pop-wdt/pop-wdt.htm)

 

United Nations, Population Fund (UNFPA) (2000). Report of Joint Interagency Coordinating Committee on Censuses for sub-Saharan Africa and PARIS 21 Census Task Force Meetings. Eurostat, Luxemburg, October 2000.

 

United Nations, Statistics Division (1998). Principles and Recommendations for Population and Housing Censuses, Revision 1. Statistical Papers Series M, No. 67/Rev. 1.

 

United States Bureau of the Census, Public Information Office (2001). Census Bureau breaks new ground with release of DVD products. News release dated 6 February 2001.

 

Whitford, David, and Jennifer Reichert (2001). Quality Assurance Challenges in the United States’ Census 2000. Q2001 - International Conference on Quality in Official Statistics, Organized by Statistics Sweden and Eurostat, Stockholm, Sweden, 14-15 May 2001.

 

Willenborg, L., and T. de Waal (2001). Elements of Statistical Disclosure Control. Springer Verlag, Berlin/Hamburg, ISBN 0-387-95121-0.


Glossary

Artificial intelligence, neural networks, fuzzy logic: various forms of innovative software techniques that often depend on non-deterministic (heuristic) methods

ATM transaction: a transaction through an Automated Teller Machine, or money dispenser

Automatic coding: the conversion, by unassisted computer, of verbal texts into applicable codes

Biometric identification: identification of individuals through one or more of their physical characteristics

Bulletin board: digital information service, often operated independently from the Internet

CASI (Computer-Assisted Self-Interviewing): the technique whereby respondents independently complete electronic questionnaires, assisted only by specially-designed computer programs

CATI (Computer-Assisted Telephone Interviewing):  Respondents answer questions by telephone, interviewers key the responses directly into computers

Computer-assisted coding: coding activity whereby human coders decide and computer systems provide assistance

Data cube: multi-dimensional structure for storing statistical information

 Data warehouse: the assembled data capital of enterprises or institutions, stored and managed in a way that favours access and analysis

Digital data vault: a space on the Internet where citizens can safely store, and eventually provide access to, personal data

DVD: Digital Video Disk, the more capacious successor of the CD-ROM

GIS (Geographic Information System): an information system designed to capture, store, update, manipulate, analyze and display all forms of geographically referenced information

GPS (Global Positioning Systems): by now common instruments that show the geographic location of the carrier

ICR (Intelligent Character Recognition): the art of interpreting written or printed characters through image scanning and computer analysis. Formerly called Optical

Character Recognition when the role of recognition engines was less crucial.

ICT: Information and Communication Technology

ISCO: International Standard Classification of Occupations

ISIC: International Standard Classification of all Economic Activities

Knowbot: (from Knowledge Robot) intelligent agent gathering information on the Internet; more specific than search engines

Meta-information: ancillary information clarifying statistical figures (definitions, standards, units, collection method and so forth)

NACE: Nomenclature Générale des Activités Economiques – statistical classification of economic activities used by the European Union

Object-oriented languages: languages for computer programming that attach code to objects and classes. Different from more monolithic procedural languages.

Outsourcing: delegating (part of) activities to an outside contractor

Pervasive computing: the omnipresence of computer power and associated sensors and controls in daily life

Point-to-point communication: (as used in the present text) electronic data communication by direct connection, not using the Internet

Push technology: using the Internet to deliver specific but unrequested information to selected e-mail addresses

Quality Assurance: a planned and systematic pattern of all the actions necessary to provide adequate confidence that a product will conform to established requirements.

Relational storage model: currently the most popular data model for general-purpose database systems; theoretical foundation formulated by E.F. Codd 

Remote sensing: monitoring from a distance, as from aeroplanes or earth satellites

Satellite telephony: telephone communication through geo-stationary satellites, no land-based relay stations

Smartcard: electronic card carrying a computer chip, and providing (much) more than memory functionality

Statistical disclosure control: the complex of measures preventing unauthorized access to sensitive statistical information

Voice print: A stored digital model of an individual’s voice, used for identification purposes



*       This document was reproduced without formal editing.

**     Specialist in Census Technology, The Netherlands.  The views expressed in the paper are those of the author and do not imply the expression of any opinion on the part of the United Nations Secretariat.

[1] For a definition of this and many other terms used in this paper, refer to the Glossary.

2 For example, recognition rates of handwritten characters might drop under 90 per cent. This value should always be considered in connection with the security level, a pre-set parameter that decides how “confident” the recognition engine(s) must be before accepting a character as representing a particular symbol. Among these accepted characters there are usually mistakes (the “errors”). On the other hand, the rejected characters contain “confirms”, which are characters that would have been correctly recognized at a lower security level. The remaining rejects are “corrects,” always requiring operator action.

[3] Automatic coding can be seen as a form of translation, and uses methods similar to those applied in the popular but even more difficult research area of machine translation of natural languages

[4] A particularly narrow square grid of 100 by 100 meters has been used since 1968 (!) by the Federal Statistical Office of Switzerland, principally for environmental and agricultural statistics.