Statistics and science; some of the issues
1. Statistical offices collect a lot of data that are a potential rich source for social and economic research by universities and other research institutions. In particular, researchers are keen on re-using microdata for their own research purposes. Arrangements allowing statisticians and external researchers to share microdata have considerable advantages, from various points of views:
*) It is efficient use of tax-payers money: statistical data collection is costly and maximum use of such data is cost-effective.
*) It is good for the data providers, because researchers don’t have to do their own, additional data collection: data have to be collected just once and the respondent load is in principle reduced by multiple data use.
*) It is good for the researchers that they don’t have to bother about data collection, but instead use high quality data that have already been collected.
*) It is good from a point of view of comparability of outcomes: when statisticians and researchers use the same data sets, it is easier to compare products than in the case of products created on the basis of different collections (discussions about methodological issues, such as definitions, samples etc. may largely be avoided).
*) It is good for statistical offices that not just aggregated statistical data, but also microdata are used intensively: feedback from researchers on the basis of their analyis of microdata may enhance the quality of statistical data collection.
*) More intense contacts between statistics and the scientific community stimulates research activities within statistical offices as well, and consequently raises the scientific status of statistical offices, making them a more attractive workplace for university graduates (which in the long run may enhance the quality of statistical operations).
2. However, arrangements for data sharing as indicated in para. 1 generate quite a few problems. The main problem areas are the following:
*) Which categories of reseachers should have access to statistical microdata?
*) Which data should be made available and in what form? What are the confidentiality risks and how can they be kept under control?
*) What costs have to be made to arrange data sharing and who should pay these costs?
*) Who should take care of the organisational arrangements around data sharing (the NSI, an established data archive, another institution)?
*) What legal provisons are needed to institutionalise data sharing?
3. Categories of researchers. In Holland, the Statistics Law defines the categories of researchers that have -in principle- access to statistical microdata: universities, some specific government research insitutions, Eurostat and other -non profit- research institutions that are approved -on a case by case basis- by the Central Commission for Statistics (a body that oversees Statistics Netherlands).
4. Categories of data. There is a fundamental difference here between data on households and persons and data on businesses.
*) Household data are relatively easy to protect against disclosure: anonymising, light forms of aggregation, distorting etc. may be sufficient to reduce disclosure risks quite considerably. It should be noted here that there is well-tested software available to check disclosure risks, e.g. the Argus software that has been developed by Statistics Netherlands and some other parties (universities in various European countries), in the framework of the 4th European Research Programme. The most intensively used household data sets in Holland are: the Labour Force Survey, the Household Budget Survey, the Mobility Survey, the Housing Survey, as well as security, elections and health surveys.
*) Business data are much more difficult to protect against disclosure. Moreover, they are generally considered to be more ‘sensitive’. In addition, in Holland there is a law on Business Statistics that stipulates quite clearly that data of individual companies may not be divulged. Therefore, the arrangements that had to be made for business microdata are very different from those for household data.
5. When the first discussions about sharing microdata between university researchers and Statistics Netherlands were initiated, the research community felt that they should get the microdata practically free of charge (or at marginal cost), because these data had been collected with public money; in other words the tax payers had already financed them and why should institutions that were also public have to pay once more. SN on the other hand thought it was unfair to hand out the data for free. In addition, it had the obligation from the Ministry of Economic Affairs to earn money. In short, this was a clash between two ministries and the deadlock was only broken when the Ministry of Education and Science agreed to pay SN an annual lump sum ( 1 million Dutch guilders) on behalf of the collected Dutch universities.
6. The best organisational arrangements are of course very dependent on the national situation. In the past, SN made some -very much secured against disclosure- ‘public data sets’ available to the Dutch national data archive (Steinmetz Archive of the Royal Dutch Aacademy of Sciences). Later on, SN began to deal with more sensitive -not so well secured against disclosure- data sets itself, on the basis of individual contracts between SN and universities. However, this was not a good experience; dealing with microdata is not the core business of a statistical office. There is insufficient routine, and it is costly and burdensome. Therefore, we were very happy when the present solution was set up: microdata are distributed by a specialised agency that acts as an intermediary between SN and the researchers. (The reasons why this agency is a different one from the Steinmetz Archive are too typical Dutch to discuss).
7. The legal provisons for data sharing in Holland are in a way rather simple: in the new (1996) Statistics Law the categories of entitled researchers and the basic procedures are clearly set out. Nevertheless, there was still a lot of legal nitty-gritty to be worked out. It should be mentioned that we were assisted in this regard by the so-called Registration Chamber (Datenschutzbeauftragter). As indicated before: this applies to household data only. For business data, we have very different arrangements: a so-called micro-lab inside Statistics Netherlands, where researchers can work with business microdata, under very strict conditions. To achieve this, we have done a lot of work to get the support of the Dutch unions of businesses. And again, the Registration Chamber gave very valuable advice and support.
(Willem de Vries, 121099)