Symposium 2001/28

10 July 2001

 

                                                                                               English only

 

Symposium on Global Review of 2000 Round of

Population and Housing Censuses: 

Mid-Decade Assessment and Future Prospects

Statistics Division

Department of Economic and Social Affairs

United Nations Secretariat

New York, 7-10 August 2001

 

 

 

 

 

 

 

 

Report on the Workshop on Population Data Analysis, Storage and Dissemination Technologies *

ESCAP**

 

 


            FOR PARTICIPANTS ONLY

                                                                                              STAT/WDT/Rep.
                                                                                              23 May 2001

                                                                                              ENGLISH ONLY

 

ECONOMIC AND SOCIAL COMMISSION FOR ASIA AND THE PACIFIC

 

 

 

 

 

 

Report on the Workshop on Population Data Analysis, Storage and Dissemination Technologies

Bangkok, 27-30 March 2001

http://www.unescap.org/stat/pop-it/pop-wdt/pop-wdt.htm

 


ContentS

Chapter                                                                                                                              

Report on the Workshop on Population Data Analysis, Storage and Dissemination Technologies 2

I.       Organization of the Workshop. 1

A.      Attendance. 1

B.      Opening of the Workshop. 1

C.      Workshop arrangements. 1

D.      Documentation. 2

E.      Participants' evaluation of the Workshop. 2

II.      Introduction to project RAS/96/P12. 2

III.     Technological lessons from the 2000 round census data collection. 2

IV.    Converging data storage and data analysis. 4

V.      Translation of data users' needs into dissemination strategies. 5

VI.    Innovative technologies for data dissemination. 6

VII.   Conclusions and recommendations. 8

General, IT management 8

Data collection and capture. 9

Data storage and analysis. 9

GIS. 10

General aspects of data dissemination. 10

Data dissemination via the Internet 11

Annex I.      List of participants. 13

Annex II.     Tentative time schedule. 20

Annex III.   List of documents and presentations. 22

 

 






Abbreviations and descriptions

 

 

API

Application Program Interface

CATI

Computer Assisted Telephone Interviewing.

CSPro

Survey Processing System

ESCAP

Economic and Social Commission for Asia and the Pacific

GIS

Geographic Information System.

HTML

HyperText Markup Language

ICR

Intelligent Character Recognition.

IMPS

Integrated Microcomputer Processing System

ISSA

Integrated System for Survey Analysis

OCR

Optical Character Recognition.

OMR

Optical Mark Recognition/Reader.

PopMap

Integrated geographical software providing maps and a graphics database.

REDATAM

Retrieval of DATa for Small Areas by Microcomputer

SIAP

Statistical Institute for Asia and the Pacific

TIGER

Topologically Integrated Geographic Encoding and Referencing

UNFPA

United Nations Population Fund

UNSD

United Nations Statistics Division

URL

Uniform Resource Locator

XML

Extensible Markup Language


I.                   Organization of the Workshop

A.            Attendance

1.                  The Workshop on Population Data Analysis, Storage and Dissemination Technologies, funded by the United Nations Population Fund (UNFPA) under the project RAS/96/P12, was held in Bangkok from 27-30 March 2001.  It was organized by the secretariat of the United Nations Economic and Social Commission for Asia and the Pacific (ESCAP) with active support of the Working Party on the Application of New Technology to Population Data.

2.                  The Workshop was attended by 38 participants from 20 selected countries in the Asian and Pacific region: Armenia, Bangladesh, Brunei Darussalam, Cambodia, China, India, Indonesia, Japan, Kiribati, Malaysia, Maldives, Mongolia, Nepal, Pakistan, Papua New Guinea, Philippines, Republic of Korea, Samoa, Sri Lanka, Thailand and Viet Nam.

3.                  The members of the Working Party, consisting of nine experts from Australia; Bangladesh; Indonesia; Japan; Macao, China; New Zealand; Philippines; Singapore and Thailand; and representatives of the Statistical Institute for Asia and the Pacific (SIAP), the UNFPA Country Technical Services Team in Bangkok, and the United Nations Statistics Division (UNSD) also participated as resource persons.  Invited private sector companies also participated as observers and made presentations.

4.                  The list of participants is attached as Annex I.

B.            Opening of the Workshop

5.                  The Workshop was inaugurated by Mr Kim Hak-Su, the Executive Secretary of ESCAP.  He welcomed the participants and thanked UNFPA for funding the project, under which the Workshop was organized.  He noted with appreciation that the collaboration between UNFPA and ESCAP would continue with a number of multi-year projects scheduled to start later in the year.

6.                  Thanking the resource persons, the Executive Secretary commended the role of the members of the ESCAP Working Party on Application of New Technology to Population Data in putting together the programme for the Workshop, and in delivering presentations and moderating discussions.  Mr Kim also expressed his appreciation to the resource persons from the Australian Bureau of Statistics, the United States Census Bureau, the UNFPA Country Support Teams in Bangkok and Kathmandu, the United Nations Statistics Division and the Statistical Institute for Asia and the Pacific.  Finally, he thanked the representatives from the private sector for their role in making demonstrations of state-of-the-art software applications for analysing and disseminating census data, which was in accordance with the Secretary-General’s Guidelines on Collaboration between the United Nations and the Business Community.

7.                  Noting the general importance of population censuses and surveys as a foundation for socio-economic statistics and for timely and targeted policy action by governments, the Executive Secretary encouraged the participants to make census data as easily available as possible to the clients, which goal could not be achieved without application of modern information technology.

8.                  Outlining some changes that the evolving information technology had caused at ESCAP, the Executive Secretary indicated that the main challenge for the secretariat was to mainstream the response to the development challenge created by information technology.  The first step taken was to ensure that ESCAP’s programme planning always incorporated IT considerations when the technology could add value to the projects.  The Executive Secretary informed the Workshop that larger programmes addressing sectoral and national IT development goals had been initiated.  He also indicated that after the upcoming fifty-seventh session of the Commission, the secretariat intended to analyse whether any organizational adjustments might be warranted in order to respond more effectively to the challenges and opportunities that IT created in the region, particularly at the policy level.

C.            Workshop arrangements

9.                  The Workshop adopted the following agenda:

                            1.               Opening of the Workshop.

                            2.               Adoption of the agenda.

                            3.               Technological lessons from the 2000 round census data collection.

                            4.               Data storage - from inactive to dynamic.

                            5.               Latest innovations in methods and tools for census data analysis.

                            6.               Translation of data users' needs into dissemination strategies.

                            7.               Innovative technologies for data dissemination.

                            8.               Other matters.

                            9.               Adoption of recommendations

10.              The Workshop noted that the tentative time schedule (see Annex II) prepared by the secretariat was based on the provisional agenda, and agreed to proceed accordingly in five sessions as follows:


Sessions

Chair

1.          Introduction to the project RAS/96/P12 (Item 2).

ESCAP secretariat

2.          Technological lessons from the 2000 round census data collection (Item 3).

Mr David Archer and Ms Carmelita N Ericta

3.          Converging data storage and data analysis
(Items 4 and 5).

Ms Rosemary Crocker

4.          Translation of data users' needs into dissemination strategies (Item 6)

Mr Edward Lim

5.          Innovative technologies for data dissemination (Item 7)

Mr Sihar Lumtantobing and Mr David Archer

11.              The Workshop acknowledged with thanks the following presentations and support by private sector companies:

Topic

Presenter

5.3

Demonstration of PC-Axis

Mr Lars Nordbäck, Statistics Sweden

5.4

Demonstration of Beyond 20/20

Mr Jean E Carr, Beyond 20/20 Inc.

5.5

Demonstration of SuperSTAR System

Ms Ursula Hoult, Space-Time Research

D.           Documentation

12.              The documents presented and presentations made at the Workshop are listed in Annex III to the report.

E.            Participants' evaluation of the Workshop

13.              The evaluation questionnaire of the Workshop was completed by 37 participants.

II.                Introduction to project RAS/96/P12

14.              On the basis of document STAT/WDT/3, the secretariat made a brief introduction to the project RAS/96/P12, “Application of New Technology in Population Data Collection, Processing, Dissemination and Presentation”, and its outputs.  The document highlighted the significant role that the ESCAP Working Party on the Application of New Technology in Population Data had played in the implementation of the project activities, including the organization of two workshops and four technical expert meetings. The Workshop noted that the members had not only provided strategic and pragmatic guidance, but also themselves produced a large number of high-quality technical documents and guidelines on using selected new technologies in population census and survey operations.

15.              The Workshop noted that the presentations, demonstrations, hands-on sessions with computers and the participants’ interaction during the sessions were expected to generate a set of recommendations, to be adopted at the end of the Workshop.  The adopted recommendations are included in section VII of this report, starting from page 8.

III.             Technological lessons from the 2000 round census data collection

16.              The first day of the Workshop was dedicated to the sharing of recent experiences in the application of new information technology to the collection of population census data.  Powerpoint presentations were made by the Working Party members from the National Statistical Office of the Philippines, the Statistics and Census Service of Macao, China, Statistics Indonesia and the Singapore Department of Statistics.  In addition, there was a moderated session based on country papers that had been prepared by the participants.

17.              The data capture strategy of the 2000 population and housing census in the Philippines was based on optical numeric recognition in four regional data capture centres, each having the following hardware: Windows NT network with five mid-volume scanners (Kodak 3510), fifteen Pentium III workstations, three magneto-optical disk drives, three CD-writers, a network printer and a 500 MHz Pentium III server with 90 GB hard disk capacity.  The software components were Kodak MVCS for scanning, Eyes and Hands for Forms for ICR, and a tailor-made Census Progress Monitoring System. The four data capture centres were operated by a total of 146 persons, in two shifts, six days a week.  A work shift was staffed by a shift supervisor, four data controllers (preparing forms for scanning and checking the validity of geographic codes), five scanner operators, four verifier operators and an operator for file preparation and transfer.  In comparison, the staff required for capturing the data for the 1995 population and housing census had been more than 600 persons.

18.              The Workshop noted that the recognition rate in the Philippines for OCR fields had been nearly perfect but that for handwritten fields, a much lower rate had been achieved, giving an average recognition rate of 90-95 per cent. Altogether over 15 million forms had been scanned and the average speeds for interpretation and verification were 3,400-3,500 and 270-320 forms per hour, respectively.

19.              The Workshop heard that the Philippine configuration had too few (only four) software licences for data verification; 8-10 verification licences would have been optimal.  Other problems included an uneven quality of the printed forms and illegible or too faint handwriting entries, which increased the work needed before scanning and at the verification stage.  Some forms had to be enhanced or rewritten before scanning.

20.              Encouraged by the overall success, the National Statistical Office of the Philippines had decided to use the ICR equipment and software again in the census on agriculture and fisheries; it was also considering using them in the processing of foreign trade documents.

21.              The pilot project for the 2001 census in Macao, China, had also given very promising results.  A client-server software of the OCR system had been developed in-house from the following components: Microsoft SQL Server 6.5, MS Access 97, Delphi 4.0 Enterprise Edition, ImageEN 1.6, and a dual recognition module consisting of a commercial API recognizer and an in-house developed neural network recognizer. The hardware included an image server, six document scanners and 24 Pentium workstations.  The pilot runs had given an average recognition rate of 95.6 per cent, with error rate of 0.28 per cent.  Within the reject rate (4.4 per cent), the confirmation rate and correction rates were 75.6 and 24.4 per cent, respectively.   Compared to manual data entry, the Statistics and Census Service estimated that it would save 50 per cent in cost and that the time needed for data capture, validation and correction would be cut down from six months to one month.

22.              The data capture of the Indonesian census, enumerated in June 2000, was decentralized to 41 centres, having a total of 79 scanners at their disposal. The 55 million double-sided household forms (representing the number of households in Indonesia) created a huge number of individual files during the capture process, which required robust file management features of the recognition software.

23.              The Workshop noted that despite careful advance preparation, Statistics Indonesia encountered a number of problems in the recognition.  The quality of the drop-out colour varied too much in the printed forms, and sometimes the guiding colour marks were not omitted as expected, requiring manual entry of the data. Like in the Philippines, optical mark recognition was nearly perfect while the recognition of numbers encountered problems caused by illegible or too faint handwriting and by the use of unapproved or dull pencils.  The problems were partially caused by training the enumerators too far in advance; the inflexible regulations prevented use of the allocated training budget at an optimal point of time.  The three-month gap between the training and enumeration had led to understandable memory lapses and ignorance of instructions; also a number of recruited enumerators had become unavailable in the meanwhile and had to be replaced by inadequately trained enumerators.

24.              The Workshop heard that in Indonesia’s experience, human intervention by enhancing the quality of numbers did not markedly improve the recognition results. Besides, the manual editing process left rubber particles and other dirt on the forms, which increased the frequency with which the scanners needed cleaning.

25.              The Workshop noted that in all three census offices (Philippines; Macao, China; and Indonesia), data were validated by using software to run predetermined logical tests. Tests were also run to detect systematic wrong recognition results, such as number 2 being recognized as 5, or number 0 incorrectly becoming 6 or 8.

26.              The fourth presentation, by the Singapore Department of Statistics, was on its ground-breaking Internet census information submission, which was one of the three ways to collect census data in the Singapore Census 2000.   The Workshop noted that the system, which was in the English language only, represented the second generation of Internet data collection systems in Singapore.  (The first one, for the Business Expectation Survey, was launched in March 1998.)   Of all Census respondents, 15 per cent chose to submit their information through the Internet while others responded either to computer-aided telephone interviews (CATI) or to person-to-person interviews.

27.              The Workshop noted that the Internet data collection system had been designed keeping in mind nine target features, namely (i) fast performance, (ii) user-friendliness, (iii) security, (iv) stability, (v) compatibility with a large number of browser platforms, (vi) possibility to continue form completion in another user session, (vii) integration with other data collection modes, (viii) intelligent branching of questions, and (ix) verification during and after completion of the form.  It further noted that given the existing technology, many of those requirements were in obvious contradiction with each other.

28.              Based on prototyping and intensive user-acceptance testing, the front page of the Singapore Census site was made small in size (kilobytes) and the form was split into many parts in order to achieve satisfactory performance for users.  For the same reason, the number of automated checks, which were first built into the form, had to be reduced and moved to the server side.  Special attention was paid to the clarity of the form layout, questions and definitions.  During the enumeration period, hotline telephone support was available, and in response to the feedback, frequent system upgrades were made. High-level security was maintained at all times, with escalation procedures and plans for contingencies in place.

29.              Reviewing data capture technologies in the 2000 round of censuses in participating countries and areas, the Workshop noted that twelve of them relied on keyboard entry (Brunei Darussalam, Cambodia, Indonesia, Kiribati, Malaysia, Mongolia, Nepal, Papua New Guinea, Republic of Korea, Samoa, Sri Lanka, Viet Nam), two on OMR (Bangladesh and Pakistan), nine on OCR/ICR (Australia; Bangladesh; China; India; Indonesia; Macao, China; New Zealand; Philippines; and Thailand) and one on tri-modal capture (Singapore).  Although technological challenges and problems still existed in using modern OCR/ICR, such as minimizing and detecting false positive recognition, the Workshop was pleased to note that with technological progress, the maturing technology had significantly lowered the total cost of census taking and improved the timeliness of the release of the results compared to the previous round of censuses.

30.              The Workshop then made several recommendations regarding census data collection and capture (see page 9), including on

-             the use of cell phones and emails to support data collection (paragraph 79)

-             the establishment of a web site to provide information to census respondents (paragraph 80)

-             the importance of careful questionnaire form design in successful character recognition (paragraph 81)

-             just-in-time training of enumerators in filling out OCR/ICR forms (paragraph 82)

-             the use of proper pencils or pens in filling marking OCR/ICR forms (paragraphs 83 and 84)

-             the maintenance of scanners (paragraph 85)

-             the robustness of the file management component of the data capture chain (paragraph 86)

-             the testing of the proposed data capture configurations in real situations and making necessary modifications to them (paragraph 87)

-             bandwidth, security and other considerations in Internet data collection systems (paragraph 88)

-             the testing of Internet data collection forms in different bandwidths and improving  the real and perceived performance (paragraph 89)

-             data collection control when Internet collection was accompanied by other collection methods (paragraph 90)

IV.              Converging data storage and data analysis

31.              Noting that data storage and data analysis had become increasingly closely related with each other because of technological innovations (such as data warehousing, data mining and the Internet), the Workshop decided to discuss them under one agenda item.<