Symposium 2001/27

10 July 2001

 

                                                                                                           English only

Symposium on Global Review of 2000 Round of

Population and Housing Censuses: 

Mid-Decade Assessment and Future Prospects

Statistics Division

Department of Economic and Social Affairs

United Nations Secretariat

New York, 7-10 August 2001

 

 

 

 

 

 

 

 

The role of information technology (IT) in disseminating statistics:

Focusing user needs and expectations*

Sten Bäcklund**

 

 


Table of contents

 

Introduction. 1

Dissemination on the user’s terms – the user in focus. 2

1.1       Identifying users and stakeholders. 2

1.2       Meeting the users’ needs and expectations. 3

1.2.1       Provide quality data. 3

1.2.2       Present your information in a clear and easy-to-use way. 3

1.2.3       Give your users a choice. 3

1.2.4       Recognize and adapt to new technologies on the user side. 4

Agency concerns. 4

1.3       Outlining a dissemination strategy. 4

1.3.1       Conventional methods. 5

1.3.2       Using electronic media. 5

1.3.3       Internet: a mix of both?. 5

1.4       Choosing the Internet as the main platform for disseminating statistics. 6

1.4.1       Deciding on an Internet policy. 6

1.4.2       Technology concerns. 7

1.4.3       The web site and services offered. 8

1.4.4       Analysing information. 9

1.4.5       Testing and monitoring. 10

1.5       Sharing information. 11

1.5.1       Connecting offices and security. 11

1.5.2       What about XML?. 12

Additional remarks. 13

1.6       Legislation. 13

1.7       Home pages of selected NSOs. 13

Notes... 14

References.... .... 16

Annex 1. Some indicators on how IT relates to quality. 18

Annex 1. Some indicators on how IT relates to quality (cont.). 19

Annex 2. Conceptual firewalling. 20

Annex 3. The process of publishing. 21

Annex 4. Importance of web design features. 22

 


Introduction

In this paper we will make a non-technical approach to the present and future use of information technology (IT) for disseminating statistics. We will focus on user needs and expectations and how this will decide on how dissemination strategies are formed and implemented. We will discuss recent IT practices for dissemination but also point at trends that will have an impact on how a statistical organization must adjust in the future. We will also concentrate on how the use of Internet techniques and methods can facilitate meeting public demands and what is needed to do so.

 

Identifying ways of marketing and disseminating information has always been important to statistical agencies. With the advent of the Internet it became possible to reach the public within a totally new framework, and the technologically most advanced national statistics offices launched their first web sites in the mid-1990s. Since then the Internet has grown beyond imagination and you can hardly attend any conference, seminar or workshop where Internet solutions are not discussed or promoted.

 

In this context we will continue from the findings of two events held under the auspices of the UN Statistics Division (UNSD) and with a bearing on our further discussions on the combination of dissemination and IT.

Referring to the conclusions from the UNSD Seminar on User Relations, Marketing and Dissemination of Official Statistics (UNSD, 2000) directed to CIS (Commonwealth of Independent States) countries, we can identify some salient points:

q       The importance of quality output in terms of relevance, accuracy and timeliness;

q       Good media relations;

q       Identification of important user groups;

q       Pricing policy; and

q       Information sharing on good practices and examples.

The seminar focused on reaching and servicing customers, addressing the needs of the private sector and on ways of approaching media and the public at large.

The second ESCAP workshop (United Nations Economic Commission for Asia and the Pacific) on Census Data Processing, Storage and Dissemination (UNESCAP, 2001) was held in Bangkok in March 2001[1]. Of special interest here are item 4 of the agenda titled “Translation of data users’ needs into dissemination strategies” and item 5 of the same agenda, “Innovative technologies for data dissemination technologies”. Papers were written and introduced by representatives from the Australian Bureau of Statistics (Hardy, 2001a and Hardy, 2001b) and Statistics New Zealand, (Archer, 2001), while two country papers on census dissemination were presented (Viet Nam and Cambodia).

From the vendor side Beyond 20/20[2], Space-Time Research[3] and Statistics Sweden[4] were invited to show their products.

The workshop recommendations were numerous and, even if quite a few are relatively detailed, the sheer number shows how important the area is considered. The following list gives a snapshot of major topics on IT/dissemination and refers to the numbered paragraph in the final report:

q       Use public-domain software, mainstream solutions, off-the-shelf packages [76];

q       Start small, think big in Data Warehousing and On-Line Analytical Processing (OLAP) [94];

q       Provide analytical flexibility, client customization [95];

q       Metadata management [97];

q       Low-cost GIS solutions [101];

q       User contacts/consultations [102];

q       Monitoring changes in user population [104];

q       Prototyping for user acceptance testing [105];

q       Attractive packaging, advertising, public relations and other promotion [106];

q       Security concerns [107];

q       Link to other agencies in the region [108];

q       Adapt disseminating media to user/community profiles [109];

q       Internet is strategic [111];

q       House policy and guidelines [112];

q       Improve customers’ ability to self-service [118];

q       Monitor web site traffic to identify key user groups [119]; and

q       Explore XML [125].

We will refer later on to the findings of the two meetings in their context[5].

Dissemination on the user’s terms – the user in focus

1.1         Identifying users and stakeholders

Traditionally, main users of official statistics are considered to be different governmental bodies at the central and local levels, large businesses, regional or international agencies and research institutions. To some extent, this can be recognized as a result of how means have been allocated and channeled. But things are changing. The fact that statistics are now being regarded as a public good is partly an outcome of the change in dissemination of statistics. The presence of the Internet makes the difference.

 

In this seminar in Vientiane, we have seen how marketing strategies can be employed in order to identify current and potential customers and then reach them (Spar, 2001). The role and influence of media and the public at large have also been discussed (Ostergaard, 2001). The government has been identified as a key user and stakeholder, while the importance of the private sector has been stressed. Recent practices of dissemination in Asian countries have been presented (Marzan, 2001).

Reading through the country reports to the ESCAP workshop five major user categories stand out by area of activity:

q       All levels of government;

q       International agencies;

q       The private sector;

q       Research institutions; and

q       The public.

 

It is worthwhile noting that the media were never recognized as a user group.

A different way of categorizing is suggested by Archer (2001). He divides the users into:

  1. General Data Users

2.      Analysis Users

General data users are, for example, students, teachers, libraries and small businesses who have simple data requirements but from a great range of information. Needs are normally not known in advance. Analysis data users, on the other hand, are identified from their complex data requirements on detailed variable and regional breakdowns, often based on many datasets. Such users are, for example, governmental departments, local authorities, researchers and VIP clients.

 

Another grouping is suggested by Blanc et al. (2001) in a recent paper to the Quality Conference in Stockholm in May where the authors classify users by type of demand according to table 1:

 

Table 1. Public demand vs. private demand in official statistics

public/social demand

individual demand

“citizen”, “society”

“customer”

long-term demand

short-term contract

Expressed by the political representatives

free bargaining (market)

socio-political dialogue; government setting priorities

contract between NSI and individual partner

terms are not (explicitly) specified

terms are explicitly specified

 

At the same conference Linacre (2001) makes a distinction between

  1. Members of the public, students;
  2. Individual organizations who seek a tailored service, e.g., on a commercial basis, or as members of the media, etc.;

C.     Sophisticated users, e.g., researchers, financial analysts, analysts in policy departments; and

D.     Key users: central banks, governmental bodies, international agencies.

(This is also is similar to the Archer classification).

The main clustering characteristics are, then, the level of statistical capability, level of interest, ways of access and partnership willingness.

But in whatever way we choose to describe and classify our beneficiaries, the objective is to meet their needs and expectations in the best possible manner, not only once but repeatedly. This is also the IT challenge: how we will set about to use IT in order to reach our goals.

1.2         Meeting the users’ needs and expectations

1.2.1        Provide quality data

As has been emphasized earlier in the seminar, users expect quality information. If this cannot be provided the user will certainly stop asking for your data and try to find it elsewhere. A well-functioning IT environment will without question make things easier. Quality is normally[6] defined in terms of accuracy, relevance, timeliness, coherence and availability without specific order.

Every now and then there will be conflicts between factors. It is not always easy at the same time to provide for accuracy and timeliness. Another example is coherence and relevance: it would be convenient if corresponding statistics from different surveys were the same or if aggregated monthly estimates and annual statistics coincided, but this is rarely the case due to variations in definitions and methodology.

In Annex 1 some underlying factors related to IT are laid out. These factors are often encountered in less developed or emerging countries.

1.2.2        Present your information in a clear and easy-to-use way

 

There are certain important aspects of disseminating information that are valid and do not depend on the media of choice, whether it may be hard copy, CD or the Internet, in order to meet the user’s expectations. One of the tools for doing so is to provide for a good navigation system.

If we use a statistics web site as an example, this could be accomplished through standard navigation for the general public and through a set of alternate navigations that are available from a user’s perspective catering to various target audiences. Most advanced are enterprise portals that are database-driven and often expensive to implement.

 

Likewise a retrieval application should preferably accompany data disseminated on CD-ROM. Statistics Sweden distributed its 1990 census tables on CD at the same time providing the first PC AXIS version as an exploratory tool. Often a GIS (geographic information system) application for mapping statistics will be included. And even a paperbound report or a statistical yearbook should at least come with a table of contents, search index, table and diagram references.

 

There are other matters relating to how information is packaged. When considering a statistics web site, you might compare a first-time user with a person visiting a store. You would certainly want him[7] to look at what products you have to offer, walk through the shop, buy and in the end, most important, come back a second time!

Thus, try to locate what you want to trade as high in the navigation structure as possible and highlight news and press releases. Keep your web site alive! Place related information close together or on the same navigation level and provide a site map.

 

Users want links to related information and you should give the links but at a reasonable level. Sometimes you may find (scientific) web sites overloaded with links to other sources of information, and the risk is high that the user will wander away in cyberspace and not return to the site from where she started.

1.2.3        Give your users a choice

Taking all potential customers of statistical information into consideration, we must remember that they constitute a heterogeneous group from, for example, governmental bodies to enterprises, researchers and the public. This means that any national statistics office must be prepared to offer optional ways of retrieving information.

Preferred media

Even if the web already is the major dissemination alternative for many agencies, there are still users who will prefer hard copies, CD-ROM or even old-time floppies. To date national libraries and archives often have had specific requirements on the delivered media—e.g., microfiche, tapes or MO disks. Information exchange over computer networks could also call for special handling in accordance with agreed-on protocols.

 

With no access to the World Wide Web, the user may still benefit from other Internet services, e.g., SMPT for receiving information via electronic mail. In the end this is one of the factors that the national statistics office must keep in mind when designing, evaluating or changing its dissemination strategy. 

Print-on-demand

Many users do not want or need to have all the information that comes with a product. Some just want to receive a summary or a set of basic tables while others have more specific demands. Instead of having to buy or download, e.g., a complete report or statistical yearbook, the user may prefer to specify want he wants and then have it printed either by the agency and then mailed or downloaded and printed by the user himself.

Statistics for further analysis

Researchers or planners are often interested not only in macrodata but also in the underlying micro-information. Provided that measures for ensuring integrity are taken, especially when dealing with sensitive data, these data could be disseminated through electronic networks such as the Internet. Additional security may then also be needed—for example, SSL encryption to prevent unauthorized access.

The user should also be given the opportunity to specify preferred data formats[8] at the time of delivery to facilitate further analysis.

1.2.4        Recognize and adapt to new technologies on the user side

New technologies are emerging all the time. One of the latest trends in disseminating data is by Instant Messaging systems based on, e.g., SMS[9] and WAP[10]. Many statistical organizations are already using these methods to disseminate critical business information to the private sector and the media—for example, price statistics or financial markets indicators. So now if the user community wants these services (which you will know from your focus groups or user contacts) you should be prepared to provide for them.  And in the future, solutions based on advanced and high-speed connection (GPRS[11], HSCSD[12], Bluetooth[13]) will grow in importance.

Agency concerns

1.3         Outlining a dissemination strategy

On the assumption that current and potential stakeholders and users are identified, in outlining a dissemination strategy for a national statistics office, the following questions must be asked:

A.     What type of information should be provided?
Examples: data, metadata, administrative information, methodology, statistical activities, research results

B.     What are the possible means and ways of dissemination?
Examples: paper, hard copy, diskettes and the like, CD, web pages, traditional mail, electronic mail, discussions, workshops, seminars, conferences, bi- or multilateral projects, networks

C.     What will be the interventions and what are the assumptions?
Examples: web-site creation, equipment and application costs, maintenance, coordination of activities, security and legislation

Chapter 2.2 on users’ needs and expectations stresses the importance of quality information and how data should be presented in a clear way and in a medium of the user’s choice.

 

Based on the country reports on census activities to the ESCAP workshop the following simplified “translation” of users’ needs into possible dissemination strategies was made:


Figure 1. Outlining dissemination based on country reports on census activities

 

 

User groups are dimensioned the traditional way, user needs relate to census information, primary data are obtained through censuses and surveys and, finally, contents and media are linked together. Any dissemination strategy could then be considered as a realization of a subset of users, needs, contents and media, and with the time dimension added.

If the national statistics office considers its web site as the primary target for disseminating activities, as is the case in many countries, users will naturally be categorized in accordance with what has been discussed earlier in chapter 2.1.

1.3.1        Conventional methods

We will not elaborate on conventional methods for disseminating statistical information since this is outside the scope of this paper. It should be mentioned though that most national statistics offices have in the past installed printing offices to provide for printed output, e.g., reports, press releases, yearbooks and so forth, with often high costs for acquisition and maintenance of equipment.

Today the needs for in-house publishing capacity are diminishing since there are dissemination alternatives now available that were not available in the past. Outsourcing printing when called for has also often proved to be cost-effective.

Other ways of disseminating information have been and still are workshops, seminars and conferences as well as through regional and interregional projects.

1.3.2        Using electronic media

Electronic media have been a first choice for dissemination for a number of years. We could say that this was becoming a true alternative with the introduction of personal computers and networking, even if data had been transferred via magnetic tape between mainframe installations since the 1950s.

Exchanging data on floppies (5 1/4 “, 3 ½ “) and in the most popular PC formats became common during the 1980s. In 1989 a process was developed with which a CD could be directly written by means of a laser beam. The way for the self-creation of CDs was opened. The CD is an ideal medium for distribution of information of any kind (text, images, sound, data and programs). Equipment for burning CD-ROMs is becoming more and more inexpensive and this method is therefore within reach even for national statistics offices on a very low budget, as well as for customers. Along with relatively low production costs, it provides high memory capacity with direct access and a long lifespan, and is still without competition, even when you only have low production numbers. (For this reason, a majority of all PCs today are already equipped with a CD-ROM drive).

Thus CDs are frequently used for dissemination of statistics since not only can (compressed) data be stored but also retrieval applications, e.g., for tabulation, drill-down analysis and GIS.

1.3.3        Internet: a mix of both?

In the mid-1990s national statistical agencies started to create their first web sites. In the beginning, and with the still undeveloped software available, the ambition was often limited to “being noticed” and this was accomplished through static HTML pages organized in strict hierarchical navigations structures. Pages were often arbitrarily updated except for statistics that were published on a regular basis, such as price indices or short-term business indicators.

 

One common characteristic was that the information provided on the web site was primarily what had been decided by the producers. The web site was not user-driven in the first place. The reasons behind this were often scarce staff resources and a “wait-and-see” view on the new technique.

With the unprecedented growth of the Internet during the last half-decade it has become obvious that it will be the main channel for dissemination for any statistical agency. The Internet will encompass both the conventional methods and the new ones and will change the strategies of dissemination in a substantial way[14].

1.4         Choosing the Internet as the main platform for disseminating statistics

1.4.1        Deciding on an Internet policy

1.4.1.1       What should be part of an Internet Policy?

It must be remembered that establishing and maintaining a web site as the main dissemination platform for a statistical agency will be a long-term undertaking. It is therefore important that an Internet Policy be outlined well in advance of implementing any kind of web infrastructure.

 

First, we refer to chapter 3.1 where the dissemination strategy is discussed. The answers to questions A (on contents) and C (on interventions and assumptions) will form the cornerstone of the agency’s Internet policy.

 

In the next step security issues should be addressed. You should not go for web-site hosting prior to a thorough security assessment, since this will decide on, e.g., if the hosting should be outsourced with an ISP (Internet service provider) or done in-house and what security layers should be implemented in order to secure your data.

Following these steps a plan for implementing the web site should be designed. It will now be decided what resources will be needed, how funds should be provided and so forth.

 

Furthermore, you must decide on how the web-site support should be organized and how staff training should be done.

Finally, production rules must be set wherein updating, upgrading and maintenance are covered.

1.4.1.2       A dissemination strategy: the case of Statistics Sweden in brief

At the end of 1999 a board decision was taken, stating that from 1 April 2000

q       All official statistics produced under the responsibility of Statistics Sweden should be stored at the macrolevel in Sweden’s Statistical Databases. These statistics should then be made available to the public. One of the information channels for this purpose should be the Statistics Sweden web site;

q       All official statistics under the responsibility of Statistics Sweden and printed in Statistiska Meddelanden (a series of reports) should be available on the web site. They should be standardized by the use of templates and print-on-demand should be offered (through download over the web);

q       All other press releases and a set of the most wanted statistics should also be published on the web site by subject matter area/product in a standardized form; and

q       Under conditions outlined above, official statistics may also be disseminated in publications and yearbooks.

 

Statistics Sweden’s Internet Policy sets out to achieve these goals by maintaining a dynamic web site and engaging staff at all organizational levels in keeping the site up to date. Strong measures have been taken in order to secure the in-house environment and the statistical databases. Extensive training schemes have been introduced and the agency applies mainstream technology and methods. See also chapter 3.2.3.4.

1.4.1.3       Contents

What should be published on the web site is first and foremost an issue for the top-level management of the national statistics office even if, e.g., the subject-matter departments will be the executing units. It must also reside with the leaders to decide on data integrity, security matters related to data transfer and what rules to be followed while authorizing the site.

1.4.1.4       Infrastructure

The infrastructure for the web site is to a great extent dependent on what your ambitions are. In a modest situation you might well be satisfied by outsourcing everything with an ISP, retaining control only over updating the web site remotely. This is not uncommon for small agencies that cannot afford either the equipment (server, firewall) or the backup and 24-hour availability. In any case, before starting to establish a complete in-house environment for your web site you should always make a cost-benefit analysis not excluding such important things like the needs of redundancy (e.g., backup servers) and staff competence.

1.4.1.5       Security

 

Security is always an important concern. It covers the extent to which you are prepared to retain data integrity, e.g., by way of implementing redundancy and firewalls. A risk analysis should always be done in order to identify steps needed to prevent unauthorized or unintentional data damage.

1.4.1.6       Training

A staff development programme should be established covering how to implement and use mainstream software for creating web pages and multi-layered web applications. Even with apt staff, you cannot lay the responsibility on them to learn things on their own. Commonly those involved are occupied with day-to-day work and you need to provide the time needed to attend courses, workshops or seminars.

1.4.1.7       Sustaining a quality web site

Without strict rules on how the web site should be designed and maintained—for example, what software should be used or what standards should be applied—there is a great risk that it will deteriorate quickly. It is therefore important that there is a tight web-site management with restrictions on what is allowed and by whom. In most cases the webmaster or web coordinator will be solely responsible for what is published on the site and how it is done. To assist them, you might want to attach experts in different areas, e.g., language, design and layout, multi-media and application developers. Steering groups or committees will also be needed to decide on contents and to guide the webmaster in her work.

1.4.2        Technology concerns

1.4.2.1       Hosting the site

The Internet Policy will point out in which way the web site should be hosted. As already mentioned the site may well be outsourced with an Internet Service Provider (ISP) if this meets the dissemination strategy needs, the level of security and possible cost restrictions. There are also a number of free web servers available on the net that provide limited disk space and some facilities to create dynamic web pages with extracts from databases.

 

In any case you will need an ISP if you intend to host your web site with your own equipment in-house. There are, of course, a number of initial steps to be taken. For example, you will have to implement the Internet Protocol (IP) in your LAN (Local Area Network), since you will have to register an IP network and a DNS domain name in order to access the web.

 

Virtual hosting can be an important add-on alternative when launching web-driven multi-layered applications where, e.g., electronic forms are used for collecting statistical data from respondents like administrative units, schools or businesses or where results/analyses from surveys, which are not part of the officially published statistics, are disseminated to designated users, for example, government bodies.

1.4.2.2       Established products

As general advice, you should always choose mainstream products when you implement your website. There are no good reasons to go for cutting-edge technology or inexpensive but unstable solutions.

For hardware this implies brand servers and workstations for web developing, testing and launching the site. You should choose well-known firewall and proxy techniques. You should also study how other national statistics offices have done, especially in your region.

The same principle holds when you are deciding on web-enabling products. If you are already on the Windows platform the natural choice would today be Windows 2000 Advanced Server that contains the components needed. If you are on UNIX or LINUX platforms other products will normally be first in line. Likewise you must make sure that your database solutions will fit into your concept.

You will also need software for web design and creating web pages. There are a number available, from simple HTML authorizing tools to Adobe Dreamweaver or Microsoft FrontPage. Still, the ambitions of your dissemination strategy, as expressed in you Internet Policy, should be able to guide your way.

1.4.2.3       Securing the environment

In Annex 2 the principle behind securing the Statistics Sweden’s web site is shown as an example. A firewall is implemented to protect the LAN from unauthorized access resulting in the formation of three security layers that are physically separated:

  1. The unsafe network or what is here the same as the Internet
  2. The services network located behind the (hardware) firewall

3.      The safe network which is the corporate LAN

All incoming traffic from the unsafe network will have to pass through the firewall. On the services network, all Internet services offered are located, e.g., HTTP, SMTP etc. There is no way for a visitor to reach the safe network. By the same token in-house staff cannot compromise resources on the services network since they are only accessed through dedicated applications that often are automated

In reality the situation is more complex. So-called virtual private networking is implemented in order to allow staff to access the intranet from the unsafe network. All traffic to the official website must also go via the firewall.

1.4.2.4       Site testing

Put your site under stress! Establish a schedule for testing your site and deal with latent risks before unwanted problems arise. There are third-party providers who can be contracted if you want to and can afford outsourcing these services.

1.4.2.5       Monitoring activities

Keep track of the events occurring on your web site! Monitor activities and obtain data on hits from your ISP. Your own access log file can be very useful to you when identifying your clients and which pages they visit. Especially subject-matter departments show a great interest in how their statistics are used, not only in the number of successful hits but also by whom and how often he returns. Try to find out if you are really providing what customers want or if in some cases it would be sufficient with, e.g., automatic e-mail or other solutions. To analyse the log files you don’t need any fancy programs since log data come as plain text and you can use your favorite statistical software to process them. But remember that log files occupy disk space. If you want to archive log data you should first compress it using Winzip or any other convenient software for this purpose.

1.4.3        The web site and services offered

1.4.3.1       Web site design

In designing your web site you will have to consider who your intended users are and adapt to their needs. This will hopefully result in providing your visitors a nice portal to your organization. Virtual web sites as described earlier may be one possible solution for meeting special demands. Statistics Sweden, as an example, provides virtual hosting of web sites to meet specific target groups.

Figure 3 is borrowed from the Yale Style Manual (Lynch and Horton, 1999)[15]. Here users’ needs are classified in terms of “complexity” and “linearity” and then mapped to navigation structures. 

According to this you will have to design your structure(-s) in a way that best corresponds to the characteristics of your user groups, the ultimate goal being to give everyone what she wants.

It is, of course, important that you also try to standardize your web pages to conform to layout decisions. There is a learning situation with all web sites. The user will appreciate it if your navigation structure remains unchanged and if he can expect related items to be found in similar locations throughout the site and over time.

 

Figure 3. Mapping users by “complexity” and “linearity” to navigation structures

 

Annex 4 gives the results from a recent survey on visitors’ ratings of welcome page features by Knowledge Systems and Research Inc., and published in the June 2001 issue of Internet World. Strikingly, ease of use and navigation is considered most important by nearly 80 per cent of the respondents.

1.4.3.2       Static or dynamic HTML

Dynamic web pages are often server-script based using CGI (Common Gateway Interface), ASP (Active Server Pages), JSP (Java Server Pages) or other methods. The problem is that, e.g., ASP-generated HTML (HyperText Markup Language) code, contrary to static HTML, does not cache. This means that whenever a client returns to a previous ASP page it will be re-created instead of returned from the client’s cache or the proxy. This will take time, and if too many users are addressing your site concurrently, it may occasionally break down.

The right balance between static and dynamic pages should be established. It is also important to strip generated HTML code from all unnecessary formatting in order to speed up the traffic.

1.4.3.3       Downloads

A vital part of your services should provide optional downloads of text information or pre-processed data—for example, tables. The best way is to use the PDF (Portable Document Format) format for text, which will retain all formatting within the document. Other popular choices are Word documents and Excel worksheets. If you use software developed in-house for data retrieval or tabulation, you should always make sure that output formats can be read or converted. Also remember to give file sizes explicitly and next to download buttons so the user can estimate download time and if it is worthwhile to go through this procedure.

1.4.3.4       The process of publishing

Annex 3 is taken from an internal unpublished paper from Statistics Sweden and shows how the agency organized its publishing based on the decision made in late 1999. To begin with all statistics produced are stored into databases. Thereafter documents are created in various formats using different tools (e.g., dynamic database retrieval), converted to suitable formats depending on usage, disseminated and archived.

1.4.3.5       Instant messaging

Instant messaging (IM) is becoming an essential form of communication, for both social and professional exchange and, according to recent statistics, for over 140 million Internet users. In chapter 2.2.4 the notion was introduced and some implications described.

With IM you can

q       Target information to specific users, or to groups of users, based on their preferences;

q       Promote your products; and

q       Alert users when updates of specific web pages are available.

In societies with an established infrastructure for electronic communication, IM may well be considered a strategic issue for any national statistics office. It should be kept in mind that IM services may be outsourced, thus eliminating the need for an in-house IM-application development capacity.

1.4.3.6       The importance of metadata

Establishing an intermediate layer of metadata is an essential precondition for dynamic retrieval of macrodata (or microdata) over the web. Users or visitors to a statistics web site will search for information by choosing from lists of indicators and information about those indicators. They will request tailor-made tables and graphs based on metadata.

Metadata will also make it easier to shield the underlying (primary) data from damage, since they will be accessible only through applications. Another feature of metadata is that they will be needed when creating data warehouses.

As an example, PC-AXIS, which has been developed jointly by the Nordic countries and used by many statistical agencies for disseminating statistics, is metadata-driven.

1.4.4        Analysing information

1.4.4.1       Establishing a Data Warehouse for OLAP (On-line Analytical Processing)

The concept of data warehousing (DW) has been around for quite some time. There is a well-known and often-quoted statement by R. D. Rogers: “We’re drowning in information and starving for knowledge” (Rogers, 1985), that would justify DW and at the same time puts it in a nutshell. DW is founded on relational database theory and is normally delivered as a “black box” containing tools for creation, administration, maintenance and output. High-quality metadata is one of the prerequisites. Unfortunately it has been and is still expensive to set up a DW for exploratory analysis and data mining, and this may be the reason that only the best-situated national statistics offices have been able to do it. A DW is by definition structured data optimized for storage while the corresponding term, Data Mart, refers to data that are optimized for output. Often they are treated as one concept, and we will use DW only in that sense. Hence the main purpose of a DW is to

q       Join or merge data in advance from a large number of underlying sources instead of doing this at run-time;

q       Provide an environment for data mining, i.e., searching and identifying patterns that are not otherwise readily observed through easy-to-use statistical analysis and reporting techniques in a client/server or tiered enablement; and

q       Facilitate responding to ad hoc requests from end-users and customers, preferably over the Internet.

A web-enabled DW for OLAP could be described visually in accordance with figure 5. You could also include the underlying sources in the DW.

 

Figure 5. Implementing a web enabled Data Warehouse

 

In the ESCAP workshop a paper was presented by Lim of Statistics Singapore that  related to census data (Lim, 2001). In the paper he distinguishes between a conventional DW and a statistical DW. In particular, a statistical DW is used by professionals, since the key purpose is exploratory data analysis and it should thus provide maximum flexibility.

An interesting trend in data warehousing is that the latest versions of RDBMS systems—e.g., Oracle, Sybase or MS SQL Server—have built-in facilities for web- driven OLAP (On-Line Analytical Processing) applications, where users can utilize the Internet to retrieve and download key information on-the-fly from large databases in common formats for further analysis. The statistics software vendors SAS and SPSS have long offered DW/OLAP solutions.

There are also in-house benefits to consider:

q       Common data definitions are encouraged;

q       A single point of control;

q       Metadata layers are established; and

q       Rapid OLAP application development could be done.

Therefore at least an assessment of advantages and disadvantages of implementing a DW should be part of a statistical agency’s dissemination strategy.

1.4.4.2       Geographic information system (GIS)

GIS software and applications are often key tools in communicating information. Many vendors today target specific user groups, e.g., interactive map creation/display over the web for monitoring traffic or for locating shops and stores in the cities.

Many national statistics offices have developed their own GIS software in order to meet their specific demands, while others rely on off-the-shelf products like MapInfo[16].

1.4.5        Testing and monitoring

1.4.5.1       Acceptance testing

Whether you use “old-fashioned” media for dissemination or the Internet, you still must make sure that you actually provide what the user wants and in a way that is suitable for him. It is often a good idea to test your product with a small group of intended users or stakeholders to determine what adjustments need to be done before launching in a larger scale. This may be easier said than done, since end users tend to materialize only when there are complaints to be made.

But if a good user/producer situation is established, acceptance testing should be encouraged and experiences from this activity will be of great future value. 

1.4.5.2       Monitoring user community

In chapter 3.2.2.5 the importance of logging the web site was stressed. In this way you will also be able to identify user segments to some extent by the way they move around your site. But the web site should itself provide valuable information. It is very easy to design a user questionnaire for real-time response[17]. It may even at times be made mandatory to fill out a questionnaire for the right to access and use a statistics product.

You should always try to have ongoing communication with your key users since at the end of the day good user relations are what counts. 

1.5         Sharing information

1.5.1        Connecting offices and security

An important part of your dissemination strategy will be how (internal) information is communicated to close partners. There are a number of ways in which branch offices can be connected to the head office—e.g., provincial statistical units to the national statistics office or the connection between the national statistics office itself and others, such as governmental bodies or the central bank. We will here discuss the most common at present.

Using Dial-Up-Networking (DUN)

Remote access of different kinds to branch offices has long been used for transferring data. The requirements are modest. A stand-alone modem-linked PC on the one hand and a receiving/transmitting server (using a modem pool) on the other are sufficient. DUN will normally provide for RAS validation and gaining access to and browsing network resources through some login procedure. There is a security problem since most communication is performed over ordinary phone lines with limited or no authentication and encryption. (In Windows NT/Windows 2000 this is a facility that comes with the operating system.)

Using Internet Services

Transport protocols – HTTP and FTP

The Hyper Text Transport Protocol or HTTP is used for exchanging data over what we in everyday terms call the Internet or the World Wide Web. It is via HTTP that web pages are requested and retrieved. The File Transport Protocol (FTP) is preferred if you are interested only in down- or uploading files since it is normally faster and more reliable. An FTP server is typically installed at the same time as the web server. Enabling SSL[18] or IPSec[19], data can be encrypted prior to transmission. It is also possible to enforce authentication/authorization through different types of certificates.

Electronic mail – SMPT

Simple Mail Transfer Protocol (SMPT) is another Internet service that is provided for moving mail from one host to another. In order to connect offices by using SMTP you will need hardware on which mail services can be installed, e.g., Microsoft Exchange Server. Mail clients, e.g., workstations will also need software to administer in- and outgoing mail, such as Microsoft Outlook.


Figure 6. Using SMPT for exchanging mail over the Internet

 

Note: An application proxy should always be implemented to protect the internal network even if it does not appear in this figure.

 

Establishing secure WANs: Virtual Private networking

A Wide Area Network or WAN is logically (and often technically) a concept for connecting different Local Area Networks (LANs). A WAN based on TCP/IP can be considered a corporate intranet and Internet techniques and methods can be applied.

Virtual Private Networking (VPN) technology allows an agency to connect to branch offices or to other agencies or organizations over a public IP network like the Internet, while maintaining secure communications. To the user, VPN is a “point-to-point” connection and how it works behind the scenes is to most users irrelevant. The main advantage is that you only have to connect to local ISPs to establish VPN, thus reducing the need of, e.g., modem pools and remote dialing. VPN should therefore be considered as the major alternative for data exchange over long distance.

1.5.2        What about XML?

The interest in XML, or eXtensible Markup Language, is constantly growing. Since it presents a standardized way for storing and delivering highly structured information over the web, the implications on data interchange are apparent.

XML’s structured syntax lets you describe virtually any type of information—from a simple recipe to a complex business database—and sort, filter, find and manipulate that information in flexible ways. It separates data and metadata and is excellent for archiving information during long periods of time without losing the possibility of recovering the data at any specific point. The latest versions of databases include XML as an option, the latest generation of browsers provide XML parsing, and statistical software giants like SAS and SPSS support it.

As an example, among SAS programmers, the number one reason to use XML is importing XML-formatted information into SAS datasets. With systems receiving more and more information in XML format, SAS programmers will use this facility to access and analyse it.

The Meta Group[20], one of the major analysts, predicts a continuous increase in the worldwide use of XML. It will become the “conventional syntax for application development within a year and the primary business-to-business Internet interaction within two years”.


Figure 7 sketches how XML is used as an intermediary in exchanging data between a national statistics office and different users over a network connection.

 

Figure 7. XML for exchanging data

 

 

There is ongoing work within EUROSTAT to swap the current use of GESMES[21] for XML in order to facilitate the exchange of statistical data.

Additional remarks

1.6         Legislation

In a number of countries, lack of adequate legislation may prevent or at least hamper web enabling of official statistics. This situation was recognized in the UNSD Vienna seminar but no recommendations could be formulated to address and possibly solve the problem.

1.7         Home pages of selected NSOs

For this section, the home pages of four statistics offices will be examined and we will locate things that are meant to serve many of the purposes that have been discussed. The sites have been chosen without any objective other than for demonstration. As a comparison, the UNSD home page is also selected.

The welcome page

 

Text Box: http://www.singstat.gov.sg 		Statistics Singapore
http://www.stats.govt.nz 		Statistics New Zealand
http://www.bps.go.id 			Statistics Indonesia
http://www.scb.se/ 			Statistics Sweden

http://www.un.org/Depts/unsd/ 	UNSD

 

 

 

 

 

 

 

Some statistics

Table 2 contains some facts about the welcome pages:

q       Number of “Page Down” keystrokes needed to see the whole page with different screen resolutions;

q       Approximate number of objects that can be selected/pushed; and

q       The size of the page in bytes.

The plus (+) sign indicates that an extra PgDn is needed to read (a few) lines at the end of the page.

Table 2. Basic welcome page characteristics for selected NSOs

 

Agency

No. of PgDn

(800x600)

No. of PgDn

(1024x800)

No. of objects that can be pushed (buttons, pictures, text etc)

Page size

UNSD

2+

1+

19

15725

Indonesia

1

0+

22

13273

Singapore

2+

2

48

25602

New Zealand

0+

0

30

18766

Statistics Sweden

1

0

24

12079

 

What can be noted is that all pages are relatively small and therefore fast to download even via a slow modem and that the designs differ to a great extent in how much is stuffed into the page and how much you then have to scroll to see the whole page. There are studies (for example, Nielsen, 1996)[22] that show that only 10 per cent of first-time visitors ever scroll beyond the top of web pages.

There are many articles and books written on web design that cover most of the topics that relate to user habits and expectations. Here we will comment on just a few that could be of interest for statistical agencies.

Features and utilities

Some of the most important features are a site map, a feed-back option and a search facility which allows searching the site by keywords and/or text strings or searching the layers of metadata. If you provide file downloads using the PDF format, you should also give the user a chance to download the free Acrobat Reader if he wants to.

What’s new?

Most probably your visitor comes to get the most recent information on your statistics. This means that you should up-thrust news in your navigation structure and locate news links at the top of the welcome page.

Contacts

Your visitor will also need to know how to contact you. Therefore you would locate primary contact information on the welcome page, e.g., an e-mail link to your webmaster and a link to a page with more comprehensive contact and feedback possibilities. It is also a good practice to give contact information on your product pages.

Job opportunities

Why not give your occasional visitor a chance to get a job within your agency? She may actually have gone to your site in order to look for exactly that kind of information. It will also save you money since you won’t have to rely on advertising or other channels. Just provide your link where you might think that potential future employees would look.

Courses, seminars

Your welcome page gives a golden opportunity to promote your activities and your statistics to key users and stakeholders by inviting them to courses and seminars.

Links to other websites

Links to other websites are useful, especially if they identify your partners or point to other statistical sources within the country. Don’t put too many links on the page. The user’s temptation to follow a link may actually conflict with your more basic interests—to keep him as what is often called “a stayer”.

Trivia?

Why not lighten up your welcome page with something that encourages your occasional visitor to return? As an example, Statistics Sweden (in the Swedish Edition) offers a database of all first or last names in the population by year and sex (lots of parents come) and a calculator that discounts prices by index (find out what you really paid for your old Toyota in current prices).

NOTES

 

This paper was prepared for a seminar entitled The Uses and Users of Official Statistics: Seminar on User Relations, Marketing and Dissemination of Official Statistics, Vientiane, Laos, 25-27 June 2001.

 

There are many hyperlinks referring to external web pages in this document. In order to use this facility you must be connected to the Internet. There are also internal hyperlinks referring to chapters in the current paper. On the screen hyperlinks are marked in blue.

 

Including links has been done intentionally to show how the use of advanced navigation may facilitate comprehension of disseminated information. You will need the latest Word versions to be able to follow the links.

 

Normal reading and printing of the document will not be affected. The printout format is set to US Letter. Changing the output format may, of course, result in unwanted text or diagram locations on the page.


References

 

Archer, D. (2001). Responding to changes in users’ expectations. Paper presented at the ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March.

 

Bäcklund (2001). Going on the net with your national statistics: What is there to consider? Paper presented at the ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March.

 

Blanc, M., A. Derosières, W. Radermacher and T. Körner (2001). Quality and users. Paper presented at International Conference on Quality in Official Statistics, organized by Statistics Sweden and Eurostat, Stockholm, 14-15 May.

 

Hardy, S. (2001a). Maintaining relevance in an environment of change. Paper presented at the ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March.

 

Hardy, S. (2001b). 2001 census dissemination: a world side web transition. Paper presented at the ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March.

 

 

Lim, E. (2001). Setting up a data warehouse—salient points for consideration. Paper presented at the ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March.

 

Linacre, S. (2001). Understanding users and managing quality in a statistical agency. Paper presented at International Conference on Quality in Official Statistics, organized by Statistics Sweden and Eurostat, Stockholm, 14-15 May.

 

Lynch, P., and S. Horton (1999). Web Style Guide: Basic Design Principles for Creating Web Sites. Yale University.

 

Marzan, C.S. (2001). Discussion paper on assessment of the Asian practices on dissemination. Paper presented at seminar on The Uses and Users of Official Statistics: Seminar on User Relations, Marketing and Dissemination of Official Statistics, Vientiane, Laos, 25-27 June.

 

Nielsen, J. (1996). Top ten mistakes in web design. Sun Microsystems.

 

Østergaard, L. (2001). Development and implementation of a media policy. Paper presented at seminar on The Uses and Users of Official Statistics: Seminar on User Relations, Marketing and Dissemination of Official Statistics, Vientiane, Laos, 25-27 June.

 

Rogers, R.D. (1985). New York Times, 25 February.

 

Spar, E.J. (2001). Customers. Paper presented at seminar on The Uses and Users of Official Statistics: Seminar on User Relations, Marketing and Dissemination of Official Statistics, Vientiane, Laos, 25-27 June.

 

United Nations, Economic and Social Commission for Asia and the Pacific (2001). Final Report. ESCAP Workshop on Population Data Analysis, Storage and Dissemination Technologies, Bangkok, 27-30 March.

 

United Nations, Statistics Division (2000). Summary report. Seminar on User Relations, Marketing and Dissemination of Official Statistics, Vienna, 11-14 July.

 

 

VENDORS

 

Beyond 20/20. http://www.beyond2020.com

 

SuperStar. Space-Time Research. http://www.str.com.au/index1.html  

 

PC AXIS. Statistics Sweden. http://www.scb.se/eng/databaser/ssd.asp

 

 


 

Annex 1. Some indicators on how IT relates to quality

 

QUALITY FACTORS

INDICATORS

Accuracy/

Completeness

Reliability

Relates to IT mainly in the way of how data are collected, edited and stored

·         Variables included in survey questionnaires are left unattended and only major indicators are computed. This results in loss of information that could be used for e.g. within-record comparisons or later macro editing or analysis.

·         Short planning perspective regarding designing or changing surveys does not give enough time for testing e.g. the questionnaire through pilot studies.

·         Since statistical information often is collected from administrative aggregate sources compiled by e.g. line ministries, NSOs have limited ways of controlling the quality of the underlying primary data which in turn will affect final reporting. This fact makes it almost impossible to do in-depth statistical analysis of data. The same might be true for in-house reporting systems where they rely on aggregate reporting from underlying bodies e.g. provinces and districts.

·         Unintentional modification/destruction of data may occur when there is a lack of good data management e.g. different versions of the same data are used for editing without coordination.

Timeliness

Relates to IT in how data are captured, edited and stored into ready-to-go statistical databases and in what way these data are later processed and disseminated

·         Staff shortage in certain sectors means a heavy workload on a few employees, which in turn will affect timely reporting.

·         Financial constraints limit the staff number in different departments, units, divisions or sections. Other budget considerations hamper computer equipment acquisitions and training in IT areas.

·         Statistical data are normally stored in computer data files residing on different media instead of central databases.

·         Output databases are not created since no deadline for data editing is given.

·         Lack of application developers and suitable software makes it difficult to process and disseminate data in a timely manner.

Relevance/

Contents

A good IT environment in itself cannot provide for relevant statistics but it can make life easier for statisticians.

·         NSOs are not using channels of exchanging information to the extent that could be done e.g. the Internet for sharing experiences and best practices

 

 


Annex 1. Some indicators on how IT relates to quality (cont.)

 

QUALITY FACTORS

INDICATORS

Availability/

Interpretability

Accessibility

Relates to a great extent to existing IT infrastructure and how it has been implemented

·         Output databases are not available, instead input databases are used for tabulation and analysis

·         Dissemination plans cannot be met due to lack of competence

·         Ad-hoc self service is not available i.e. doing your own analysis on data

 

 

Coherence/

Comparability

Consistency

Relates to definitions, to what data are collected, how it is stored and if time series can be created

·         Data from different sources are often used to describe and analyze a statistical problem. Basic definitions e.g. object, population, variables, statistics and time references should then be the same in all sources. If not the outgoing statistical quality will suffer and in worst cases make future analysis impossible.

·         Time is an important factor. When definitions change over time depending on end users’ conflict it will affect consistency

·         Databases containing similar information are not comparable from the use of different models or RDBMS systems

 


 

Annex 2. Conceptual firewalling


 

Annex 3. The process of publishing

 


 

Annex 4. Importance of web design features

 



* This document was reproduced without formal editing

** United Nations Statistics Division, New York. The views expressed in the paper are those of the author and do not imply the expression of any opinion on the part of the United Nations Secretariat

[1] The final report from the workshop is available at

http://www.unescap.org/stat/pop-it/pop-wdt/wdt-rep.htm

[2] Go to http://www.beyond2020.com/ for more information.

[3] Space-Time Research, developers of SuperStar; visit http://www.str.com.au/index1.html

[4] Information about PC AXIS can be found at http://www.scb.se/eng/databaser/ssd.asp (password on request).

[5] The reader is also referred to a contributed paper to the workshop (Bäcklund, 2001), also available at the ESCAP web site http://www.unescap.org/stat/pop-it/pop-wdt/pop-wdt.htm) that discusses what must be considered when designing and maintaining a web site and how to deal with obstacles on the way.

[6] The European Statistical System ESS lists relevance, accuracy, timeliness, accessibility, comparability, coherence and completeness.

[7]  In order to facilitate writing, he and she, his, him and her will be alternately used instead of the semantic construction he/she, his/her to avoid making the text a gender issue.

[8] This includes, e.g., common database formats (dbf, mdb), worksheet formats (xls, wk*), formats of statistics software like SAS (sd2) or SPSS (sav) or ASCII.

[9] SMS (Short Message Service) is available on digital networks allowing messages of up to 160 characters to be sent and received via the network operator's message center to a mobile phone.

[10] WAP (Wireless Application Protocol) is a free, unlicensed protocol for wireless communications that makes it possible to create advanced telecommunications services and to access Internet pages from a mobile telephone. WAP is a de facto standard that is supported by a large number of suppliers. See also http://www.wapforum.org

[11] GPRS (General Packet Radio Service) is a packet-linked technology that enables high-speed (115 kilobit per second) wireless Internet and other data communications.

[12] HSCSD (High Speed Circuit Switched Data) is a circuit-linked technology for higher transmission speeds primarily in GMS systems.

[13] Find out about Bluetooth at http://www.bluetooth.com

[14] It should be noted that electronic networks have been available before the World Wide Web emerged.

[15] Chapters are also available on http://info.med.yale.edu/caim/manual/  

[16] Visit http://www.mapinfo.com/ for more information or go to http://www.spss.com/map/ to see how it is liaised with SPSS.

[17] Go to the Statistics New Zealand website to see how it works: http://www.stats.govt.nz/domino/external/web/feedback.nsf/feedback?OpenForm

[18] Secure Sockets Layer, protocol developed by Netscape for encrypting TCP/IP transmissions, used in e.g., Netscape Navigator and Internet Explorer.

[19] Internet Protocol Security, developed by the Internet Engineering Task Force (IETF) for encrypting messages transported over the Internet, often used in Virtual Private Networks (VPN).

[20] http://www.metagroup.com/cgi-bin/inetcgi/index.html

[21] A United Nations EDIFACT  (ISO 9735) message for exchanging statistical multidimensional arrays in a generic but standardized way.

[22] Nielsen’s article may be accessed at  http://www.sun.com/columns/alertbox/9605.html