Symposium
2001/06 6 July 2001 English only
|
Symposium on Global Review of 2000 Round of
Population
and Housing Censuses:
Mid-Decade
Assessment and Future Prospects
Statistics
Division
Department
of Economic and Social Affairs
United
Nations Secretariat
New York, 7-10 August 2001
Adapting new technologies
to census operations *
Arij
Dekker**
B. Management, communication,
logistics and quality assurance. 53
1. Intelligent character
recognition (ICR)
3. Outsourcing and
decentralization
D. GIS, Remote Sensing and GPS
E. Data processing and storage
1. The Internet for data
collection
2. The Internet for data
dissemination
G. Data dissemination: other
issues
1. Statistical disclosure
control
2. High-capacity physical
media
3. Structured archives: the
statistical data warehouse
H. How to choose appropriate
technology
Adapting New
Technologies to Census Operations
Even small improvements in census technology can
result in important gains in the quality and cost-effectiveness of the whole
census operation. At present a number of organizations are attempting to help
bring innovation to census and statistical operations. Among the concerns
regarding new technology are these: how to choose appropriate technology; how
to maintain the integrity of existing statistical systems; how to deal with
outsourcing certain tasks; and how to maintain confidentiality of data. Some
technologies, such as mobile telephony, have made person-to-person
communication in the field easier, as have fax and e-mail capabilities.
Bar-code technology has made management of materials more efficient.
In the 2000 round of censuses, intelligent character
recognition (ICR) made a breakthrough in many countries, although illegible
handwritten characters and badly printed questionnaires still led to problems.
In general, countries that planned carefully for the new technology and
conducted pre-tests were more successful in their operations. The next step,
automatic or computer-assisted coding, is also being explored, and some data,
such as geographic names, may lend themselves to such coding. For some census
operations, especially one-time, high-volume tasks such as data entry,
outsourcing may be a good solution. Contractors with the necessary equipment
and skills can supplement the census staff, but outsourcing also raises
questions of overcoming bureaucratic obstacles, managing the contractor and
enforcing confidentiality rules.
Census mapping has made great strides in the last
few decades, from an activity requiring extensive fieldwork and manual drawing to
one using remote sensing and computer-assisted map production. Geographic
information system (GIS) technology is increasingly being used in population
and housing censuses to generate maps for enumeration and for data presentation
purposes. Global positioning systems (GPS) are cheap and available, and they
can be used by cartographic field staff to annotate topographical maps and
satellite photographs to produce excellent maps for enumerators.
Data-processing software for censuses, which was
previously developed and provided by non-profit agencies, is being supplanted
by commercially available software. However, customizing general-purpose
software for census purposes requires considerable programming skills, which
may not always be available in a census organization.
The Internet as a tool for census data collection is
still in its infancy, although several countries did allow some Internet
enumeration in their most recent censuses. Generally, such data were collected
from a small portion of the population on an experimental basis. Problems with
this method include the need for authentication from each household; lack of
coverage of households in many countries at this stage; and the fear that
hackers could compromise the integrity of the census. Moreover, data collected
via the Internet would have to be integrated into other data streams, including
mail-back questionnaires and telephone responses. As a tool for data
dissemination, however, the Internet is quickly becoming the principal medium,
and statistical offices are responding with more electronic publications and
effective web sites. Technology is also under development for the storage of
census data, including data “warehouses”, which would contain all the data and
metadata from a census.
It is impossible to create a single set of
guidelines to help census planners choose the best new technology. Choices
depend on the magnitude of the project, the availability of local skills, the
funding situation, prior experience, time for preparation, and other factors.
Census planners need to be conservative, because their solutions must be right
the first time. New technology should never endanger the continuity of existing
reporting systems and if possible should reinforce it.
1.
It
is commonly known that the art of population census taking goes back many
centuries. Ever since the end of the nineteenth century, there have been
efforts to take advantage of a succession of newly available technologies to
make such large and costly statistical enquiries more efficient and effective.
A census is labour-intensive, requiring large numbers of temporary staff.
Personnel costs usually are the principal component of census budgets, with
expenditure for information and communication technology coming second.
2.
Even
small improvements in the methodologies used, or in the effectiveness of the
equipment, can result in important gains in quality and/or cost-effectiveness
of the whole operation. Census budgets depend on national cost levels and the
depth of the enquiry, but generally vary between a few dollars per capita in
low-cost countries to as much as 30 dollars per capita in highly developed
environments. A rough estimate of the total expense of the current round of
censuses would put it between 30 and 50 billion dollars. This is certainly an
enticing target for those trying to improve the rate of value-for-money.
3.
The
name of Herman Hollerith stands out as an early adaptor of modern technology to
census work. He borrowed from the ideas of Joseph-Marie Jacquard, who had
invented punched cards to control looms. Hollerith saw a way to use such cards
in sorting and tabulation. By doing this he not only expedited the release of
the results of the 1890 US census; he started an entire industry.
4.
There
have been many less-known census innovators who have put newly discovered
methods and technology to good use. Information technology has usually been on
the forefront of these efforts. Census data-processing equipment has graduated
from machines just assisting tabulation work, to indispensable tools in
virtually all phases of census work. Computers are used for planning, to
support mapping, in project management, in all stages of data capture,
cleaning, coding, and reporting, and in demographic analysis (Dekker, 1997).
Many of the recent improvements in census taking have been possible thanks to
the ever-growing capabilities of data-processing equipment and communication
networks operating on local, national, and worldwide levels. For the sake of
continuity it is important that the use of newer technology is embedded into,
and builds upon, existing sound methodology (United Nations, 1998).
5.
There
are presently several important efforts to bring coordination and focus to the
innovation process in official statistics and census taking. One is the Paris
21 initiative: Partnership in Statistics for Development in the 21st
Century. The members of Paris 21—there are several hundred of them—are drawn
from leading national and international statistical agencies, academic
institutions, etc. One of the several issues currently being reviewed by the
experts combining their efforts under the Paris 21 initiative is how census
work can be made more cost-effective (See the web site at http://www.paris21.org
for details).
6.
The
United Nations Statistics Division (UNSD) has a long history of furthering
sound statistical principles and the sharing of know-how. A web site giving
access to information on good statistical practices has recently been opened
(http://www.esa.un.org/unsd/goodprac). On a regional scale, Eurostat has
conducted a series of technical seminars by the names of NTTS (New Techniques
and Technologies for Statistics) and ETK (Exchange of Technology and Know-how).
The 2001 meetings on these issues were conducted in June in a combined form on
Crete, Greece.
7.
Noteworthy
also is the Eurostat web site by the name of VIROS (Virtual Institute for Research in Official Statistics, web site
http://www.europa.eu.int/en/comm/eurostat/ research/viros). VIROS identifies
and classifies areas of research where participating organizations may place
the results of their studies and experiences, while remaining entirely
responsible for it. Eurostat acts as a central coordinator, attempting to integrate
the individual elements into a coherent set. The ultimate goal is to facilitate access to information on research
activities and results. Eurostat is naturally interested in such issues,
facing, as it does, the need to combine many statistical traditions, and
overlaying them where possible with state-of-the-art integration technology.
8.
When
considering the technological options before them, census offices face a number
of questions. Some of these are:
·
How
to make an informed choice in selecting appropriate technology;
·
How
to maintain the integrity of the existing statistical and census systems;
·
How
to deal with the option of outsourcing[1],
and management of outsourced tasks; and
·
Confidentiality
concerns relating to the preferred solutions.
9.
This
paper will look briefly at various areas where census work has recently
benefited from new technology and will discuss the issues referred to above.
Definite answers on the questions raised can be formulated only by individual
census organizations themselves.
10.
A
nationwide census differs in many respects from day-to-day statistical work. It
lacks the repetitive nature that allows collections with a greater periodicity
to gradually be improved. The level of expenditure and number of staff are much
higher than statistical managers are used to. Some governments therefore
establish census offices separate from the national statistical agency. It may
be necessary to recruit professional management, experienced in dealing with
large but temporary organizations. Since a census can be seen as a large
time-critical project, with many interlocking operations, the use of modern
project management software is of vital importance.
11.
A
census operation requires efficient communication between thousands of persons,
as well as procurement and storage of a large variety of items, most of which
have to be distributed to all corners of the country and then recollected.
12.
Recent
developments in mobile telephony (cell phones) have made person-to-person
communication easier, even in countries with extensive and reliable fixed-line
networks. But complete mobile coverage has not been accomplished in most
developing countries. Census communication with remote areas continues to be
problematic in some cases. It is still possible that satellite telephone
systems, which function everywhere on earth, will fill this void. Some
ambitious projects in this domain, such as that known as “Iridium,” have not
drawn enough initial subscribers. But with most of the enormous investment
costs now written off, user prices are coming down. The ground stations
including antennas are still rather voluminous but completely portable.
Operations planners need to be cognizant of all communications options open to
them, including regional differences, and make arrangements accordingly.
13.
Where
printed or printable communication is required, fax technology is rapidly
giving way to electronic mail. This is true for census operations, but relying
on e-mail entails vulnerability to Internet service interrupts, computer
illiteracy and virus attacks. It is important always to keep a fax capability
for backup.
14.
Improved
computer software and wide availability of personal computers (PCs) have made
managing the movement of goods much
easier. Bar-code technology can be a key element in this. Using bar codes
instead of printed numbers has advantages in avoiding transcription errors and
speeding up processing. A combination of the two can be used if easy human
recognition of the codes may also sometimes be required. Census managers, who
are not logistics professionals, tend to overlook this established technology.
15.
A
typical application of bar-code technology is to label all items specific for a
particular enumeration area (maps, enumerator identification, summary sheets,
transport box) with a specific bar code. At the point where the materials are
sent out, the codes will be scanned, allowing automatic update of a database of
items forwarded. The same process can be used to maintain a database of items
retrieved from the field.
16.
Labeling
individual questionnaires with unique codes can also be helpful, although the
resulting administrative overhead is considerable. Such identifiers can protect
against the fairly common problem that entire batches of questionnaires arrive
back erroneously geocoded. Standard retail scanners, but also most intelligent
character recognition systems (see Section C.1), will read bar codes without
difficulty.
17.
Quality
assurance, including the use of scientifically sound sampling methods, should be an integrating part of all census
operations. Many of the methods in this field depend on statistical principle and
have been developed by statistical innovators (Deming, 1986). The census office
must strive for a consistent level of assured quality throughout its
operations, and cannot afford to disregard the techniques that help to achieve
and verify it (Statistics Sweden, 2001).
18.
It
is probably true to say that the current round of censuses has seen the
breakthrough of ICR technology. In the 1985-1994 round only about 20 per cent
of countries undertaking censuses used some form of character or mark
recognition (Decker, 1994). The large majority still relied on keyboard data
capturing. In the current round nearly all census offices of industrial market
economies—and numerous other ones—apply imaging through scanners, recognition
software and other tools required to partially do away with manual data entry.
19.
There
is no doubt that recognition technology has made great strides in the last
decennium, but it seems true also that the example provided by census “pioneers”
has made switching course easier for those organizations that otherwise might
have hesitated. ICR offers a promise of greater efficiency, but it is
inherently riskier than keyboard data entry. For example: poorly designed or
badly printed questionnaires are a nuisance in manual data entry, but may sink
an anticipated ICR data-capturing operation. The need for elaborate pre-tests,
already so obvious in traditional census taking, is even more apparent when
scanning technology is to be used.
20.
The
main fundamental problem still existing is that handwritten characters are
often poorly recognized where the writer is not already familiar to the
recognition system. In censuses which use auto-response or a large number of
enumerators, this obviously is the case. To avoid the problem, it is possible
to limit the automatic recognition to marks or numeric digits only. But even
digits cannot always be reliably interpreted, so quite a few manual data-entry
personnel will still be required to fill the gaps.
21.
Scattered
information suggests that the ICR process proceeds not always as smoothly as
anticipated. Experiences obtained during the final operations tests induced the
United States Bureau of the Census to move from a one-pass to a two-pass
processing system, where sample data from the long forms will be
computer-stored only during a second capturing operation (Prewitt, 2000). This
change of approach has had no effect on processing deadlines. Some European
countries (for example, Estonia) have reported difficulties in recognizing
handwritten alphabetic characters, requiring them to hire additional staff to
assist the automatic recognition process. A recent meeting in Bangkok (United
Nations, 2001) heard about problems of varying severity in China, Indonesia,
Macao Special Administrative Region of China, the Philippines and Thailand[2].
(For information on the details of the problems experienced, retrieve the
country papers from the web site at
http://www.unescap.org/stat/pop-it/pop-wdt.htm.)
22.
In
Thailand, earlier plans to establish 15 regional ICR centers for the April 2000
census were cancelled after more sophisticated (and expensive) scanners and
software turned out to be required. A single ICR complex now operates in
Bangkok (Fujitsu 4099 scanners, TeleForm software). Some problems were reported
with poorly written characters and scanner maintenance.
23.
The
census of the Philippines on 1 May 2000 works with four decentralized capturing
centers, using Kodak 3590 scanners and Eyes and Hands software. One of the
biggest problems here is that the print quality of some questionnaires is not
in accordance with specifications, which causes the ICR software to tag them as
unidentifiable. Another difficulty is illegible handwritten entries. The number
of verification licenses, required to manually correct such rejects, had been
underestimated. This has been a learning process. Experiences are sufficiently
positive to use ICR again for the upcoming census of agriculture and fisheries.
24.
The
Macao Special Administrative Region of China reports good results for its pilot
operation for the 2001 Census. The paper contains an interesting table,
obtained from a sample of 150,000 images of digits. The table does not
immediately confirm the effectiveness of ICR as implemented. It would seem useful
to dispense training to enumerators about how to best write certain numerals.
Digit |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
All |
Recognition
rate (%) |
94.83 |
96.83 |
94.92 |
91.11 |
96.00 |
94.95 |
97.29 |
97.72 |
90.43 |
81.74 |
95.64 |
Reject
rate (%) |
5.17 |
3.17 |
5.08 |
8.89 |
4.00 |
5.05 |
2.71 |
2.28 |
9.57 |
18.26 |
4.36 |
Accuracy
rate (%) |
99.38 |
99.89 |
99.78 |
99.73 |
99.89 |
99.41 |
99.79 |
99.59 |
99.12 |
100.00 |
99.72 |
Error
rate (%) |
0.62 |
0.11 |
0.28 |
0.27 |
0.11 |
0.59 |
0.21 |
0.41 |
0.88 |
0.00 |
0.28 |
25.
ICR
for the 1 July 2000 Census of Indonesia is handled by 29 processing centers
throughout the country, using Kodak DS 3500 scanners and NCS NestorReader
recognition software embedded in own Visual Basic programming. The country
paper reports many troubles that hamper the census ICR operation. These include
sub-standard questionnaire printing (despite elaborate quality controls), poor
writing by enumerators, inadequate document handling in the field resulting in
unusable forms, scanner maintenance problems, and complex file management. The
authors deserve the highest praise for sharing these experiences for others to
learn from. The massive nature of the operation in Indonesia, scattered civil
unrest, financial constraints, and various logistics problems have obviously
all been a factor here. Despite the difficulties, the Central Bureau of
Statistics (CBS) of Indonesia is confident that the data-capture operation will
be completed successfully.
26.
The
October 2000 Census of Aruba (not reported in Bangkok) used Fujitsu M3079DG
scanners and Eyes and Hands software. All data for this small country of about 100,000
people were captured by April 2001. The operation was quite carefully prepared,
and proceeded smoothly, including the integrated computer-assisted coding work.
There were no cost advantages compared to keyboard data entry.
27.
Such
problems as are reported can be divided into those that have to do with the
recognition process itself, and all other ones. If the recognition rate is
unacceptably low, this can usually be remedied by reducing the pre-set security
level. But there is a price to pay: error rates will go up. Other problems may
include unreliable paper transport in the scanners, which can have plenty of
causes, including dirt, the use of correction fluid on sheets, and damaged
forms, possibly as a result of bad weather conditions. It is not unheard of
that such difficulties require large numbers of questionnaires to be
transcribed, again increasing error rates.
28.
As
a general rule, success is often reported by census offices that went through a
long and careful preparation process, including several pre-tests. Those that
have to cut short on the groundwork may become the source of less fortunate
stories. Complete quality assurance management—for example, in the printing
process of the questionnaires—is of the essence here.
29.
If
recognition of handwritten text is now becoming a more reliable tool, it would
be logical to think of speech recognition as the next step. After all, this is
a more direct method of data collection. Speech recognition has broad economic
potential and is a topic of much research. Some commercial applications of this
technology are appearing, especially in processing verbal instructions received
by telephone, and in the automotive industry. But progress in this area has
been slower than expected. Statistical applications are still rare.
30.
Recognizing verbal texts usually has the
purpose of accommodating associated automatic coding. That is, the computer
reads a text—for example, the name of a geographic area—and then selects the
applicable code from an associated file or database.
31.
Such
solutions, which ideally would allow completely automatic data capture and
coding, depend on two prerequisites: (1) the recognition process must be
sufficiently reliable and (2) the search algorithms do indeed lead from the
recognized term(s) to the appropriate code. A 100-per-cent
character-recognition rate is not required, since the algorithm may still be
successful with incomplete or partially mangled terms.
32.
However,
there are indeed problems with this process. First there is the recognition
reject rate, as referred to above, which might require an unexpected level of
human interference. Next comes the difficulty of automatically determining the
applicable codes, the severity of which depends on the nature of the variable
concerned. Geographic terms are usually not too difficult to code
automatically, except perhaps for the lowest level (e.g., village), where
spelling may not be standardized and homonyms occur. Occupation and industry
tend to be more problematic. Despite the efforts by census field staff to
extract full information from respondents, these variables will often be
reported in terms that cannot be easily linked to ISCO, ISIC or NACE codebooks
(see Glossary for terms).
33.
The
issues of automatic and computer-assisted coding have been the subject of
considerable research (Meyer and Rivière, 1997; Dopita, 1999; Blum, 1997). The
tasks are a challenge to those applying modern methods of artificial
intelligence, neural networks, and fuzzy logic[3].
But however elegant and advanced the matching algorithms are, once reporting
from the field is multi-interpretable, too general, or otherwise inadequate,
there is no easy way out. Many specialists feel that in those situations it is
difficult to conceive automatic solutions that approach in quality the
judgement of an experienced human coder. By letting the computer take care of
the simpler cases, and relaying the remainder to human coders, an efficiency
gain can nevertheless be obtained.
34.
As
to the coding of industry, it may be noted that this can be improved by using a
register of establishments or enterprises, and their known ISIC or NACE codes.
Respondents may find it easier to report the name of their employer than to
describe the principal economic activity of the company. This approach
obviously requires the existence of a comprehensive national business register.
35.
In
conclusion: ICR in censuses has certainly not become an off-the-shelf
technology. It requires careful design and extensive testing of questionnaires.
The integration of ICR with associated operations, such as coding, needs ample
prior thought and a clear strategy, again to be tested for effectiveness.
36.
Census
data entry, through ICR or otherwise, is a potential candidate for outsourcing.
Since it is a one-time high-volume application, there might be contractors that
possess equipment and skills allowing them to offer the census office
conditions that it could not match in an in-house operation. Meanwhile, it
should be noted that outsourcing brings responsibilities of contracting and
monitoring that require resources too. Confidentiality concerns multiply where
outside contractors dealing with individual data are concerned. Quality
assurance, already a major consideration in any event, becomes even more
crucial if outside contractors are involved (see, for example, Whitford and
Reichert, 2001). It would be attractive if the contractor could work within the
census premises. In any event, contractor staff should be subject to confidentiality
rules at least as severe as the ones imposed on temporary census staff.
37.
It
should be noted that managers with an excellent in-house management record may
still have difficulty controlling outsourced work, which requires different
skills. These include knowledge of the service market, awareness of legal
issues, negotiating skills, and more. In a census situation one easily ends up
in circumstances where the supplier is in control, since the census
organization, even while unhappy with the services provided, cannot afford to
turn away.
38.
Sometimes
government regulations put barriers in the way of outsourcing tasks that could
better be assigned to specialized providers outside the census office. That situation
obviously should be changed, but most likely the required reforms need to be
implemented at a government level different from the one supervising national
statistical services.
39.
Decentralized
data capture would allow the census organization to keep matters in its own
hands, but obtain advantages by spreading the work to its regional centers. The
problems are somewhat comparable to outsourcing, although easier managed. Much
depends on the local situation: magnitude of the task at hand, conditions of
the labour market, efficiency of communication and transport and so forth.
Assigning more work outside the capital may also have a social and public
relations benefit. General guidelines in this domain are impossible to
formulate.
40.
A
more comprehensive discussion of these issues can be found in the paper on
“Identifying and resolving problems of census mapping,” also presented in this
Census Symposium. Since new mapping technology is an essential part of census
innovation, brief remarks are included here.
41.
Mapping
technology has made great strides over the past decades. It has moved from an
activity depending on field exploration and manual drawing, to one using remote
sensing and computer-assisted map management.
42.
While
aerial photography from airplanes was used for census mapping (mostly for dense
urban areas) before the era of satellite technology, the latter offers a much
more cost-effective solution for remote sensing. Commercially available
satellite pictures provide resolutions well beyond those required to identify
individual buildings. Availability of such photographs greatly reduces—but
certainly does not remove—the need for on-the-ground inspection.
43.
The
fieldwork itself benefits from the now common availability of cheap hand-held
global positioning systems (GPS), which again depend on satellite technology.
Topographical maps and satellite pictures establish the starting platform for
census field mapping. Cartographic staff armed with maps, pictures and GPS
systems can now complete and annotate the maps to produce excellent orientation
material for enumerators.
44.
Maps
are now usually produced, stored, and updated using specialized computer
systems and commercial software. The essential elements of satellite photographs
or paper maps can be digitized by hand-tracing on digitizing tablets. Once the
maps are finished, they can be printed and reprinted at will. The vector images
are stored in computer files without the risk of degradation over time.
45.
It
is in this context useful to point to a growing tendency for national
statistical agencies to establish basic statistical reporting areas independent
from the administrative territorial organization (Jacob and Royer, 1999),
sometimes in the form of a grid of squares[4].
The reporting areas should be large enough to maintain individual response
confidentiality, yet small enough to allow regrouping of these statistical
areas into the lowest level of administrative territorial units. The approach
removes some of the problems of maintaining time-series in the context of
ever-changing administrative borders.
46.
The
value of census information is enhanced if combined with underlying base maps
which permit users to generate thematic maps of their choice. Several census
offices now market integrated products—usually on CD-ROM—that provide this
capability. Other offices adhere to the opinion that such a service goes beyond
the task of national statistical agencies, and limit themselves to providing
aggregated census data to commercial publishers. This should not be confused
with outsourcing, since responsibility for the final product lies with the
counterparts. The census office is accountable only for supplying reliable data
that respect the requirements of statistical disclosure control.
47.
Many
statistical agencies maintain one or more geographic information systems (GIS)
for their own use. At the same time it is widely accepted that the role of
statisticians is to provide data of the best possible quality to users. In many
cases the task of integrating information from various sources into complex GIS
systems is best left to others. This is especially true if such GIS systems
serve a specialized user community, such as urban planners or
environmentalists.
48.
Electronic
maps have become indispensable and cost-effective tools for a wide range of
operations in censuses and statistics.
49.
Many
countries, especially those in the developing world, have long relied on public-domain
software for their census-processing requirements. Such software was built and
maintained by non-profit agencies, usually supported by subsidies from national
or international donors.
50.
It
would appear that overall there has been less effort in this respect recently
than at the time of previous census rounds. This can be explained partially by
the growing capabilities of commercially available software. There may also be
a case of donor fatigue. Donors tend to prefer to think in terms of projects
with a clear beginning and end. Developing and maintaining a software system is
a never-ending task, since changing hardware and software environments require
ongoing support and re-development efforts, which can be considerable.
51.
Due
to the relative scarcity of new (re-)development, some public-domain census or
survey processing systems are starting to look a little obsolete. They may, for
example, be completely or partially DOS-based. Even while that software might
be as effective as ever, and perfectly able to do the job, the DOS (Disk
Operating System) interface is unfamiliar to a new generation of users. They
also may find it difficult to convince their supervisors and peers that it is
preferable to work in an apparently dated environment. Using modern tools is
better for a data-processing person's professional reputation. A consequence of
these developments appears to be increasing use of alternative software, such
as commercial statistical software systems (SAS (Statistical Analysis System),
SPSS (Statistical Package for the Social Sciences) and others) and database
application generators (MS Access).
52.
Some
recent announcements have improved the picture for non-profit software. The
United States Bureau of the Census, through its International Programs Center,
is now offering additional modules of its CSPro Census and Survey Processing
system, which is being developed in cooperation with Macro International and
Serpro S.A. (See the CSPro web site at http://www.census.gov/ipc/www/cspro).
CELADE, the Population Division of the United Nations Economic Commission for
Latin America and the Caribbean (ECLAC), continues work on more advanced
versions of the statistical database system Redatam (See web site at
http://www.cepal.cl/celade-eng.).
53.
Developing
census-processing applications in software not specifically intended for that
purpose can be described as customizing that software for census purposes. It
requires programming skills that are not always readily available. Some
associated use of modern object-oriented programming languages is nearly
unavoidable. There is also no particular place where scheduled training in such
a specialized subject (developing census applications in general-purpose
software) can be obtained. As a result, census organizations have relied on
outside contractors, which did not necessarily understand fully the statistical
issues involved. In this sense the current situation as regards census
processing is more complex than that of the preceding census round.
54.
On
the other hand, where initially enough basic computer skills were available to
the census office, census data-processing staff may have received additional
exposure to modern general-purpose software. This will benefit other
statistical development work, or their careers, or perhaps both.
55.
The
difficulty of customizing general-purpose software for census applications
should not be underestimated. It can be considerably more complex than applying
specialized census software. Outsourcing the assignment might only compound the
problems. Contractors to be entrusted with the duty of developing
census-processing systems should have a proven record in such work. The census
office will still need specific expertise to undertake the task of contracting
and supervising the activities.
56.
The
broader issue of statistical disclosure control, including cell-suppressing
software, required by all census offices to protect the confidentiality of
individual responses, will be briefly discussed in Section G.1 below.
57.
Census
data used to be stored often simply as flat files. A principal concern was to
make sure that the data and meta-information were properly preserved over time.
This was done in order to guarantee that additional computer analysis would be
possible at some later stage—for example, at the occasion of the next census.
Statistical agencies are now increasingly aware of the fact that data from
various collections can have much added value if preserved, with associated
metadata, in a common storage structure, sometimes called a “data warehouse”.
While this fashionable term may go as quickly as it came, the underlying
principle is unchallenged. The relational storage model has been explored for
depositing statistical information, but not always to complete satisfaction (See
Section G.3 on structured archives.).
58.
While
electronic mail has been fairly common since the late 1980s, wide access to the
contents of the Internet at reasonable transmission speeds was still unusual in
the previous round of censuses. Problems with the use of paper-based
questionnaires had become apparent long before. In many countries response
rates on mailed questionnaires are declining, a consequence of respondent
fatigue and, perhaps, a diminished sense of civic responsibility. Where
enumerators still personally visit dwellings, the chances of finding
respondents at home during working hours have become smaller due to modern
lifestyles and smaller household sizes.
59.
Census
offices have proposed and/or used various measures to remedy these problems.
These include more elaborate information campaigns and efforts to mobilize the
cooperation of civil societies, having enumerators work weekends and evening
hours, approaching respondents by telephone, and sampling the initial set of
non-respondents (thus, in a sense, giving up on complete coverage). While some
successes have been reported, the efforts and costs required to obtain an
acceptable response rate are now considerably greater than before.
60.
Thus
it is only logical that attention has focused on the Internet as a gateway into
an increasing number of homes. Using “push” technology, it would be possible to
deliver to each Internet-connected household a uniquely identified electronic
questionnaire, possibly already pre-filled with basic data obtained from the
civil registry. Respondents would correct and complete the information, and
then return it by data transmission to the census office, which would receive
an electronic record, thus avoiding most of the data entry work.
61.
Electronic
data collection from establishments (including enterprises, government agencies
and public-sector entities) has already become fairly common. If households and
individuals are approached in the same way, one could use the methods of CASI
(Computer-Assisted Self-Interviewing) (Figueiredo and Lucas, 1999; Keller,
1999), which can render valuable assistance to respondents and prevent
mistakes.
62.
Unfortunately,
there are as yet several problems that hold back electronic data collection
from households:
·
Incomplete coverage: While the number of
households having access to the Internet grows rapidly nearly everywhere, there
are only a few countries where the connection rate has surpassed 50per cent;
·
Bias: Internet access is more
common to affluent and younger households; therefore, wide use of this
data-collection channel might result in a biased response pattern;
·
Unstructured address system:
As compared
to the postal system or the telephone network, the Internet addressing system
is much less regulated, which reflects the origins of the Internet. Subscribers
largely invent their own addresses and may change them at any time. They could
have one or several addresses. It would be a major effort to assemble current
e-mail addresses of households at any given point in time, and nearly
impossible to maintain such a register with a degree of reliability. This
essentially precludes the use of push technology for censuses at the present
moment.
·
Attraction to hackers: There is little doubt that
allowing respondents to use the Internet would attract hackers, who would
consider it a challenge to be enumerated twice, use someone else’s
identification, or worse. Census offices understandably are not looking forward
to such challenges.
Figure 1.
Swiss census data collection via the Internet (demo version, partial
screen)
63.
Notwithstanding
these difficulties, several census offices, including those of Switzerland
(Figure 1; see also the web site of the Swiss Federal Statistical Office at http://statistik.admin.ch),
the United States and Singapore, have allowed electronic response during the
current round of censuses (Haug and Buscher, 2000; Prewitt, 2000; United
Nations, 2001). This did not involve “push” technology. Rather, respondents would
be required to take the initiative themselves by downloading census forms or
completing them while online with the census office. To avoid misuse, it is
essential that each household can authenticate its response. This might involve
certification with a unique identification code, unknown to others. That code,
then, has to be delivered to the household, which may have to be done by hand
delivery. Safe and reliable electronic delivery of authentication codes is
again a problem difficult to tackle within the current state of technology.
Electronic response requires encryption on the browser side, since unprotected
responses could be intercepted.
64.
The
United States census limited Internet response to the short form only and did
not undertake major efforts to publicize or recommend this method. It appears
that in all three countries, demonstrating that the census organization is in
tune with modern technology was a factor in opening up the Internet channel.
65.
Since it is unlikely that the printed
questionnaire could be abandoned at short notice, census data collection via
the Internet requires seamless integration of the two data streams. There were
three streams in the case of Singapore, where response via telephone (CATI) was
another alternative.
66.
Several
census offices—for example, the Office for National Statistics of the United
Kingdom—have reported that they have decided not to use the Internet for data
collection at this time, having studied the dangers still present in those
uncharted waters. Statistics Canada has been conducting an “Internet Test” in
two distinct geographical areas for its census of 15 May 2001.
67.
In
conclusion, there remain a number of problems, of a varied nature, that so far
have prevented the wide use of electronic questionnaires for census purposes.
The Internet needs to grow, and methods suitable for census data collection by
means of it have to be developed and tested. Expectations are that the
situation surrounding electronic response will have evolved significantly as
the next round of censuses approaches.
68.
The
technology of dissemination of statistical information is undergoing a
fundamental shift. The printed publication has certainly not disappeared and remains
important, for example, to provide a permanent and continuously accessible
record, and for easiest browsing. But online consultation of statistical
sites—with or without payment for the information obtained—is becoming the
principal avenue of information dissemination. This takes place via the
Internet, since the independently managed bulletin boards, to be reached
through point-to-point communication with the information provider, cannot
offer comparable user comfort.
69.
The
challenge to statistical offices is considerable. Long used to the relative
peace of carefully preparing a publication and then waiting for it to come into
print, they now must adhere to a strict calendar of electronic release. Users
always want the data sooner but will complain when the data have to be revised
later or—worse—turn out to have contained any error.
70.
Under
these conditions, designing a dissemination strategy has not become any
simpler. The user community rightfully expects statistics to make full use of
new media, yet there continues to be substantial demand for paper publications.
This may happen in a situation of restricted funding and shortage of technical
skills. Statistical offices must not only formulate a strategy but also revisit
it periodically. Where costs dictate it, the use of dissemination outlets needs
to be adjusted on the basis of reports on their use. Cost recovery may help to
improve the situation.
71.
Just
like printed publications, publication in electronic form can be of varying
cognitive quality. This may be even more true for electronic publications.
Furthermore, the rapid technological developments make providing the best
possible interface a moving target. Eurostat in its NORIS (Nomenclature on
Research in Statistics) identifies the following examples of research in this
area (see also web site at http://europa.
eu.int/en/comm/eurostat/research/viros):
·
Contributing
to Internet-related standardization activities so that statistical requirements
can be taken into account;
·
Bandwidth-intensive
applications: statistical queries, audio- and video-broadcasting;
·
Use
of intelligent agents (knowbots) for information interchange;
·
Improving
man-machine interfaces, including the use of virtual reality;
·
Application
of GIS technologies to improve the visualization of geographically oriented
statistical information.
72.
The
capability of a statistical organization’s web site is becoming of ever greater
importance. Statistical and census organizations nowadays are assessed not only
on the quality and timeliness of their printed information, but also, and
perhaps more important, on the effectiveness of their web presence.
73.
Appropriate
measures should be taken with regard to web sites. They must be built and
maintained by professionals. There should be, if at all possible, a continuous
monitoring system of user satisfaction and visitors’ browsing behavior. This is
for the purpose of easing access to popular items, noting signs of apparent
user confusion, and general continuous improvement of the site. If dynamic
access to databases is offered, such applications should be reasonably bug-free
and have reached sufficient maturity (United Nations, 2001). Launching a
high-technology service that results in numerous disappointed users brings
benefits to no one.
74.
The
need to maintain a full range of up-to-date information and computer technology
(ICT) capabilities, including web skills, in an environment where such
qualities are in high demand, is a burden to many national statistical agencies.
Outsourcing can be a solution, but since information dissemination is a core
activity of official statistics offices, it is not an obvious alternative.
75.
As
an aside it may be mentioned here that the Internet offers excellent
possibilities to disseminate and retrieve international standards and
guidelines for statistical work. An example is the classifications server RAMON
developed by Eurostat (See web site at http://europa.eu.int/comm/eurostat/ramon.).
76.
As
the mass of readily accessible statistical information increases, there is an
urgent need to improve the protection of individual information provided by
persons or establishments, using techniques known as statistical disclosure
control. The odds here could be shifting in an unfavorable direction, since
statisticians need to provide more information faster, while ill-intentioned
users attempting to filter out sensitive information have access to ever more
powerful analytical computer tools, and they have time on their side. It has
become impractical to visually inspect each table or data cube (see below) for
potential risks, but automatic screening tools are coming to the rescue
(Willenborg and de Waal, 2001; Giessing, 1999). They will suppress, combine, or
otherwise obscure potentially risky cell values.
77.
Information
dissemination on non-rewritable high-capacity media also remains an important
delivery channel, especially for massive data that are not highly
time-sensitive, such as most census information. Censuses nowadays routinely
result in the production of many CDs, and the first DVD products of much higher
capacity have appeared (US Bureau of the Census, 2001). Data structures on
CD-ROM and those underlying a web site can have much in common, including the
use of browsing through hyperlinks. Parallel development of the applications is
an efficient way to benefit from that.
78.
As
already mentioned above, storage of census data in a “warehouse” structure
favours its use in conjunction with other statistical information kept there.
This is strictly speaking not a census issue, since it addresses the broader
subject of statistical information management. A warehouse might consist of a
number of data cubes, n-dimensional spaces, where one dimension consists of
observations, and the others are selection dimensions. In a simple example
dealing with a census cube, observations could be the total numbers of males
and females, and selecting dimensions age group, place of residence, ethnicity,
occupation, and so on. Or, see the
diagram in Figure 2 for an example in four dimensions (including three
3-dimensional sub-structures) from the area of business statistics.
Figure 2. Data cube in four dimensions (Basset and Stoyka, 1996)
79.
Cubes
require the existence of a superstructure that allows them to be approached via
hierarchical menus (“drill-down”) and logical combinations of keywords.
Metadata need to be available too, preferably stored while avoiding
redundancies. Storage formats other than the data cube should also be
accommodated by the data warehouse.
80.
While practical applications exists and can be accessed through the web
sites of several national statistical agencies, the subject remains a work in
progress. Data warehouses are by no means restricted to statistics, and the
topic is much broader than can be described here. (For more information, see,
for example, Kambayashi et al., 2000, or explore the Internet.)
Figure 3. Organization Chart of Divisions for Business
and Social Statistics, Statistics Netherlands
81.
Taking
this concept to a further level, one could impose the requirement that after
completion of a census or survey the information gathered is to be stored in
the data warehouse first, and that periodic or one-off publications are
generated only by retrieving data from this central storage system.
82.
This
concept is illustrated by the example in Figure 3 (Keller and Willeboordse,
2000). BaseLine is the final product of Registers and Surveys. It holds all
data as supplied by primary and secondary data sources. MicroBase contains the
data as they result from editing, imputation, translation and
micro-integration. The output-aggregate database StatBase holds the results
after estimation for (sub-)populations of statistical units. StatBase claims to
contain all publishable data produced
by Statistics Netherlands. The publication-data warehouse StatLine can be seen
as a set of views on StatBase. It presents the total output of the Bureau as a
structured set of multi-dimensional tables. StatLine is disseminated both on
CD-ROM and on the Internet.
83.
Several
national statistical agencies have done important work on these issues, such as
Statistics Canada (CANSIM II), Statistics Sweden (PC-Axis) and Statistics
Netherlands. A trial version of the Stat-series of programs (the name of the
full package is StatSuite) is downloadable (from http://neon.vb.cbs.nl; the principal web
site of Statistics Netherlands is http://www.cbs.nl). PC-Axis retrieval software can be obtained
from the web site of Statistics Sweden (http://www.scb.se) Other developers may
also be willing to provide test versions of their software if requested.
84.
As
regards the applicability of data cubes, it would seem that the principal
problem is not so much their storage and retrieval, but logical design of these
information containers. Formulating a comprehensive set of cubes that fulfill
the various requirements of (1) easily accommodating the results of statistical
collections; (2) satisfying the requirements of a wide variety of users and (3)
fully respecting confidentiality concerns, is not a simple assignment.
85.
Whatever
new or revised data dissemination product is being envisaged, the importance of
extensive prototyping and launching “beta” versions—among real and critically
minded users—cannot be overemphasized. This point was also made, and
convincingly, by the recent ESCAP Workshop on Population Data Analysis, Storage
and Dissemination Technologies. The report of this Workshop and several of its
other papers constitute highly recommended reading (United Nations, 2001).
86.
At
this point it would have been useful to provide clear guidelines to census
planners about how to make an informed choice of technology and about
approaches such as outsourcing and its associated risks. Unfortunately, this is
impossible, as conditions and considerations vary widely, not only from country
to country, but also—and with increasing speed—over time.
87.
There
is not one preferable set of technologies for census operations. The best
choice depends on the magnitude of the project, the availability of local
skills, the funding situation, existing prior experience, available time for
preparation, and many other factors. The current round of censuses shows a
surprisingly wide spectrum of methods and techniques being used.
88.
Informed
choices are never possible without the information being available to the
decision makers. Census planners need to acquaint themselves with the state of
the art, both at a national level and internationally. Preferably they should
travel to comparable countries that have recently used methods and technology
that may be of interest. The superiors of these planners need to recognize this
need for exploration and allocate the resources for the task to take place.
89.
In
deciding the parameters for a new census, one might want to look first at the
preceding census. What worked well and what could use improvement? If an
approach was satisfactory the last time, the arguments to replace it with
something else need to be twice as strong.
90.
Every
decision has a financial angle. If census costs can be reduced significantly
while maintaining or even improving quality, that certainly should be worthy of
serious consideration.
91.
Outsourcing
is by no means the panacea that some would have it to be. Stories of success
and failure are equally present. Here, as well as elsewhere, there is no
substitute for solid fact-finding, careful negotiating, making sure that the
chances of misunderstanding are minimal, and a continuous quality assurance
programme (Whitford and Reichert, 2001).
92.
The
final and most important consideration should be: What effect do the available
alternatives have on the quality of service provided to information users?
Statistical offices and census organizations live by the grace of the service
they render to others. They need to strive incessantly to provide better
information, in terms of timeliness, data quality, ease of access, completeness
and pertinence. Any potential improvement in these areas merits review.
93.
There
is little doubt that the ever-evolving technological environment in the future
will have an even more profound effect on census-taking methods—perhaps
moderated by legal requirements and confidentiality concerns.
94.
Already
it has become possible to uniquely identify individuals through certain
physiological characteristics, such as fingerprint or iris patterns, facial
identifiers, or vocal frequency sequences (voice prints) (Jain, 1999). This
technology has a bright future, due to applications too numerous to sum up
here, such as ATM (automated teller machine) transactions, building access
control and payment authorizations. It can remove the burden placed on people
by the requirement that they constantly carry identification and credit cards
and remember perhaps a series of personal identification codes.
95.
Biometric
identification in combination with access to a database (perhaps wireless
access) can remove the need for statisticians to ask respondents the same
questions repeatedly about unchanging characteristics (e.g., date of birth,
sex, ethnic origin or place of residence at prior census). In a broader sense,
it would make it technically easier to establish and maintain electronic civil
registers that would be more complete and current than those in existence
today.
96.
As
an intermediate step one could think of a personal multi-purpose chipcard
(“smartcard”), from which information could be copied without manual
transcription. Applications at this level are already widespread in banking,
library management, medical services and more. In a rather far-reaching
concept, such data could also be stored in personal “digital data vaults” on
the Internet. Once authorized by the owner, information users such as census
organizations would be able to retrieve from there the data items they require.
97.
The
use of biometrics or personal chipcards in civil registration or censuses so
far has been experimental at most, and digital data vaults are an idea that has
just recently been launched. But it is easy to foresee that these and similar
techniques, with all their various implications, will become the subject of
increasing debate in the not-too-distant future. It is part of what has been
termed "pervasive computing," an ever-growing presence of computer
power and associated sensors and controls in daily life. Statistical
organizations need to involve themselves in this debate to make sure that new
developments and standards take their requirements into consideration.
98.
A
few countries have dropped the door-to-door census for what is called an
“administrative” or “virtual” census (Laan and Everaers, 2001). This may
involve a comprehensive inspection and merging of various registers to arrive
at the national universe of dwellings and persons. Again, it will be an
advantage if the statistical office is a partner in the definition and
maintenance of the principal registers. In other countries the “short form”,
which contains the questions to be asked from everyone, has been reduced to a
bare minimum.
99.
In
both cases—administrative census and minimal short form—additional information
is usually gathered through sample surveys. These methods share methodological
ground. Good statistical practice prescribes the use of sampling methods
wherever the underlying universe is sufficiently known. In future census rounds
sampling technique will become increasingly important, and with it the need for
statisticians to explain their ways to the world.
100.
New
technologies make their way into census work, but not always as quickly and broadly
as might have been expected. Census planners need to be conservative, since
they know that their solutions must be right the first time. Nevertheless, one
sees innovation turn into standard practice. This includes digital mapping,
ICR, and electronic publishing. The Internet has become an essential medium for
information dissemination and will grow in importance for data collection.
101.
The
conditions under which censuses are conducted differ greatly between countries.
Even for most sub-tasks there is no single best technological approach.
Technical awareness, a sense of the realistic, a methodical approach, and
plenty of preparation time, are the principal requirements for census planners.
102.
Census
technology changes much more rapidly than the underlying statistical methods
and principles. New technology should never endanger the continuity of existing
reporting systems, and, if possible, should reinforce continuity.
103.
Outsourcing
raises many problems, including confidentiality concerns, but it can deliver
economies and resolve bottlenecks. Again, the local situation—including the
management ability of the census office—determines whether it is a valid
alternative. The solution is more evident for one-off special operations (e.g.,
census data entry), than for ongoing tasks, such as web-site management. Where
outsourcing could offer advantages, but bureaucratic obstacles stand in the
way, the obstacles should be removed.
104.
The
current census round shows substantial technological evolution from the
preceding cycle; in the next round the difference will only be greater.
105.
Here
are some slightly provocative questions that symposium participants may wish to
discuss:
·
As compared to data capture
by keyboard, ICR has advantages as well as drawbacks. Given technical problems
experienced by some countries, is the move towards ICR justified by experience
gained so far?
·
There is no doubt that ICR
equipment interprets characters less accurately than human operators. Can we call
that progress?
·
Do we still need technical
assistance projects producing census software? Or can commercial software
systems now fully cover census requirements?
·
Why, with CD-ROMs and the
Internet, continue to print costly census reports?
·
Do “data cubes” present a
valuable concept? Or is this just a solution looking for a problem? Are there
more suitable storage formats for statistics?
·
Will the census of
population and housing as we know it disappear because of ever-advancing
technology?
Basset P., and A. Stoyka (1996). Statistics Canada’s aggregate output database – CANSIM II. Proceedings of the Conference on Output Databases, Voorburg, the Netherlands.
Blum, Olivia (1997). Editing and Coding Module. In New Census Technologies: The Israeli Experience. Proceedings of the Euro-Med Workshop, March 1997.
Dekker, Arij (1994). Computer methods in population census data processing. International Statistical Review, vol. 62, No. 1., pp. 55-70.
_____ (1997). Data Processing for Demographic Censuses and Surveys, with Special Emphasis on Methods Applicable to Developing Country Environments. UNFPA/NIDI, The Hague, ISBN 90-70990-67-9.
Deming, W. Edwards (1986). Out of the Crisis. Center for Advanced Engineering Study. Cambridge, MA: MIT.
Dopita, Patricia (1999). Population Census Evaluation, 1996 Census Data Quality: Occupation. Canberra: Australian Bureau of Statistics.
Figueiredo, José and Ana Lucas (1999). Potentials and Pitfalls of INE-P IS/IT
Strategy on the Past Ten Years. Proceedings of the strategic reflection colloquium on IT issues for
statistics. Eurostat, Luxemburg, September 1999.
Giessing, Sarah (1999). Transferable software for automated secondary cell suppression. Seminar
on the Exchange of Technology and Know-how (ETK), sponsored by Eurostat,
Prague.
Haug, Werner, and Marco Buscher (2000). E-census, the Swiss Census 2000 on the
Internet. INSEE/Eurostat Workshop, “Census beyond 2001”, Paris, 20-21
November.
Jacob, Michel, and Jean-François Royer (1999).
Le recensement de la population de 1999. In Les
actualités du Conseil national de l’information statistique.
Jain, Anil, et al., eds. (1999). Biometrics: Personal Identification in Networked Society. Kluwer International Series in Engineering and Computer Science, Volume 479. Kluwer Academic Publishers, Dordrecht, the Netherlands, ISBN 0-7923-8345-1.
Kambayashi,
Yahiko, Mukesh Mohania and A. Min Tjoa, eds. (2000). Data Warehousing and Knowledge Discovery, Second International
Conference, DaWaK 2000, London, UK, 4-6 September 2000, ISBN 3-540-67980-4.
Keller,
Wouter (1999). Preparing for a new era in
statistical processing: how new technologies and methodologies will affect
statistical processes and their organization. Proceedings of the Strategic Reflection
Colloquium on IT Issues for Statistics, Eurostat,
Luxemburg, September 1999.
Keller, Wouter, and Ad Willeboordse (2000). Statistical Processing in the Internet Era:
the Dutch View. Conference on Network of Statistics for Better European
Compliance and Quality of Operation, Radenci, Slovenia, 13-15 November 2000.
(This paper can be retrieved from the web site of the Statistical Office of
Slovenia at http://www.sigov.si/zrs)
Laan, Paul van der, and Peter Everaers
(2001). The Dutch Virtual Census. Meeting
66, ISI 53rd Session, Seoul, 2001.
Meyer, Eric,
and Pascal Rivière (1997). SICORE, un
outil et une méthode pour le chiffrement automatique à l’INSEE.
International Blaise Users Group, Paris.
Prewitt,
Kenneth (2000). Prepared Statement before the Subcommittee on the Census,
Committee on Government Reform, U.S. House of Representatives (8 March 2000).
Statistics
Sweden (2001). Q2001 – International
Conference on Quality in Official Statistics. Organized by Statistics
Sweden and Eurostat, Stockholm, Sweden, 14-15 May 2001. (Web site at.
http://www.q2001.scb.se)
United Nations, Economic and Social Commission for
Asia and the Pacific (2001). Report on
the Workshop on Population Data Analysis, Storage and Dissemination
Technologies, Bangkok, 27-30 March 2001. (This report and other workshop
papers can be retrieved from the ESCAP web site at
http://www.unescap.org/stat/pop-it/pop-wdt/pop-wdt.htm)
United Nations, Population Fund (UNFPA) (2000). Report of Joint Interagency Coordinating
Committee on Censuses for sub-Saharan Africa and PARIS 21 Census Task Force
Meetings. Eurostat, Luxemburg, October
2000.
United Nations, Statistics Division (1998). Principles and Recommendations for Population and Housing Censuses, Revision 1. Statistical Papers Series M, No. 67/Rev. 1.
United States Bureau of the Census, Public Information Office (2001). Census Bureau breaks new ground with release of DVD products. News release dated 6 February 2001.
Whitford, David, and Jennifer Reichert (2001). Quality Assurance Challenges in the United States’ Census 2000. Q2001 - International Conference on Quality in Official Statistics, Organized by Statistics Sweden and Eurostat, Stockholm, Sweden, 14-15 May 2001.
Willenborg, L., and T. de Waal (2001). Elements of Statistical Disclosure Control. Springer Verlag, Berlin/Hamburg, ISBN 0-387-95121-0.
Artificial intelligence, neural networks, fuzzy logic: various forms of innovative software techniques that often depend on non-deterministic (heuristic) methods
ATM transaction: a transaction through an Automated Teller Machine, or money dispenser
Automatic coding: the conversion, by
unassisted computer, of verbal texts into applicable codes
Biometric identification: identification of individuals through one or more of their physical characteristics
Bulletin board: digital information service, often operated independently from the Internet
CASI (Computer-Assisted Self-Interviewing): the technique whereby respondents independently complete electronic questionnaires, assisted only by specially-designed computer programs
CATI (Computer-Assisted Telephone Interviewing): Respondents answer questions by telephone, interviewers key the responses directly into computers
Computer-assisted coding: coding activity whereby
human coders decide and computer systems provide assistance
Data cube: multi-dimensional structure for storing
statistical information
Data warehouse: the assembled data capital of enterprises or institutions, stored and managed in a way that favours access and analysis
Digital data vault: a space on the Internet where citizens can safely store, and eventually provide access to, personal data
DVD: Digital Video Disk, the more capacious successor of the CD-ROM
GIS (Geographic Information System): an information system designed to capture, store, update, manipulate, analyze and display all forms of geographically referenced information
GPS (Global Positioning Systems): by now common instruments that show the geographic location of the carrier
ICR (Intelligent Character Recognition): the art of interpreting written or printed characters through image scanning and computer analysis. Formerly called Optical
Character Recognition when the role of recognition engines was less crucial.
ICT: Information and Communication Technology
ISCO: International Standard Classification of Occupations
ISIC: International Standard Classification of all Economic Activities
Knowbot: (from Knowledge Robot) intelligent agent gathering information on the Internet; more specific than search engines
Meta-information: ancillary information clarifying statistical figures (definitions, standards, units, collection method and so forth)
NACE: Nomenclature Générale des Activités Economiques – statistical classification of economic activities used by the European Union
Object-oriented languages: languages for computer programming that attach code to objects and classes. Different from more monolithic procedural languages.
Outsourcing: delegating (part of) activities to an outside contractor
Pervasive computing: the omnipresence of computer power and associated sensors and controls in daily life
Point-to-point communication: (as used in the present text) electronic data communication by direct connection, not using the Internet
Push technology: using the Internet to deliver specific but unrequested information to selected e-mail addresses
Quality Assurance: a planned and systematic pattern of all the actions necessary to provide adequate confidence that a product will conform to established requirements.
Relational storage model: currently the most popular data model for general-purpose database systems; theoretical foundation formulated by E.F. Codd
Remote sensing: monitoring from a distance, as from aeroplanes or earth satellites
Satellite telephony: telephone communication through geo-stationary satellites, no land-based relay stations
Smartcard: electronic card carrying a computer chip, and providing (much) more than memory functionality
Statistical disclosure control: the complex of measures preventing unauthorized access to sensitive statistical information
Voice print: A stored digital model of an individual’s voice, used for identification purposes
* This document was reproduced without formal editing.
** Specialist in Census Technology, The Netherlands. The views expressed in the paper are those of the author and do not imply the expression of any opinion on the part of the United Nations Secretariat.
[1] For a definition of this and
many other terms used in this paper, refer to the Glossary.
2
For example, recognition rates of handwritten characters might drop under 90
per cent. This value should always be considered in connection with the
security level, a pre-set parameter that decides how “confident” the
recognition engine(s) must be before accepting a character as representing a
particular symbol. Among these accepted characters there are usually mistakes
(the “errors”). On the other hand, the rejected characters contain “confirms”,
which are characters that would have been correctly recognized at a lower
security level. The remaining rejects are “corrects,” always requiring operator
action.
[3] Automatic coding can be seen as a form of translation, and uses methods similar to those applied in the popular but even more difficult research area of machine translation of natural languages
[4] A particularly narrow square grid of 100 by 100 meters has been used since 1968 (!) by the Federal Statistical Office of Switzerland, principally for environmental and agricultural statistics.