Confidentiality of individual data is considered to be one of the essential policy elements in official statistics. In the Fundamental Principles for Official Statistics, endorsed by the United Nations Statistical Commission, this is explicitly mentioned. Principle 6 of the Fundamental Principles reads: ‘Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes’. Apart from general philosophical issues and specific legal provisions (which may vary from country to country), there is one overriding, practical reason for this strict adherence to confidentiality: the idea is that only if providers of data for official statistics can trust that the information they give is used for statistical purposes only will they be prepared to supply full and truthful information.
The Dutch Statistics Law (1996) says: ‘The data that the Central Bureau of Statistics receives in the framework of its legal task shall only be used for statistical purposes. These data shall not be given to others than those who are charged with carrying out CBS tasks’. A much older draft legal text about the confidentiality of statistical data in The Netherlands puts it in more simple terms: ‘In the statistical results based on the data that the CBS has processed, no names shall be mentioned’ (1901).
Clearly, there is a long and strong tradition in The Netherlands that information collected by the CBS shall only be used for statistical purposes and that no individual data whatsoever may be given to anyone (including other government services) outside the CBS, without explicit written permission from the data provider. However, statistical confidentiality philosophies and practices, as well as public and political sentiments about what is confidential or secret in statistics and what is not, have developed and changed a lot over time, as will be discussed in some detail in this paper.
After the Introduction (paragraph 1), paragraph 2 of the paper is about ‘the general philosophy of confidentiality’. In paragraph 3, confidentiality issues for business data will be discussed. Paragraph 4 goes into confidentiality aspects regarding personal and household data. Paragraph 5, finally, is about developments over the last two decades, which have led to certain arrangements between the CBS and the research community (as well as legal provisions), aiming on the one hand at safeguarding confidentiality, but on the other hand at making statistical micro-data available for academic research purposes.
2 The general philosophy of statistical confidentiality
2.1 Why confidentiality?
Statistical confidentiality is the principle that no identifiable individual information is disclosed during the process of compilation of statistics, or in the dissemination of statistical results. As mentioned before, this is one of the foundations of official statistics, for reasons of principle and practice. The principal reason is the general legal principle of ‘goal binding’ of government activities or in other words: the means the government uses must be compatible with the objectives it wants to achieve. Specifically: if data are supplied to a government institution for statistical purposes, so for aggregation into tables etc., it is not right to use them for other purposes. (2)
The practical reason for confidentiality is: cooperation of data providers is essential for statistical work and the providers of basic data will only cooperate if they feel they can trust the statistical agency completely. They must therefore be sure that their individual data are not disclosed in publications, nor handed over or revealed to third parties, unless they have given the statistical agency their explicit or implicit permission to do so. (3)
In a more general legal context: apart from the issue of the ‘secondary’ use of individual statistical data for research purposes, conflicts of interest between the statistical confidentiality principle and other public interests may occasionally arise. In the Netherlands, for example, there have been a couple of incidents in which a public prosecutor, in the course of a criminal prosecution, believing that the CBS could provide certain documents which were considered to be relevant for the case, requested their delivery. The CBS has always refused to do this. So far, it has sucessfully argued that the general interest of official statistics outweighed the interests involved in that particular criminal case, and that there ought to be sufficient other possibilities to produce incriminating material.
2.2 Administrative immunity
As mentioned before, the CBS, in principle, never publishes figures that may reveal information about individual persons, firms or institutions, without their consent. There are some cases of publications containing this kind of individual information, e.g. when a certain activity is exclusively carried out by one or a few large enterprises and these enterprises have agreed to the publication.
However, there are also cases where the problem is of a more subtle nature. In particular, this happens when state-owned or state-financed entities or activities are involved. It could be argued that what these entities do can not be confidential, because they are ‘public’. In this respect, the Ministries who supervise local or regional authorities tend to maintain that the CBS does not have to protect the confidentiality of certain details of the financial dealings of cities or provinces. Yet, the CBS has usually done so, arguing that if ministries need such details, they must ask the involved parties themselves. The principle underlying this is really an extension of the principle of confidentiality. It could be called the ‘principle of administrative immunity’. It is based on the reasoning that information supplied for statistical purposes must not have any direct consequences for the respondents. The aim here is to ensure, obviously, that the respondent need have no worry about supplying the information. The guarantee of ‘administrative immunity’ may result in getting more accurate information at the level of the individual than would be the case were it to be collected as administrative information.
2.3 Potentially vulnerable entities
There is another concept that is more difficult to define and apply. It concerns the principle of protection of ‘potentially vulnerable entities’. It is an extension of the principle of administrative immunity. Sooner or later, statistical information may have consequences either for all or a subgroup of the respondents, or, possibly, for a third party. Since statistical information generally plays a part in governmental policy, many examples can be given of indirect consequences which can be linked to information supplied by respondents. In itself, this is not a problem. However, respondents may be wary in some cases. Resistance to population censuses can to some extent be seen as resistance to a government whose activities permeate the daily life of citizens. The situation is rather different where certain small groups (e.g. members of a certain profession or inhabitants of a distinct small area) oppose statistical surveys, because they feel that the statistical information may give rise to measures aimed particularly against them. Although no direct consequences result from the statistical information supplied, a certain group nevertheless could become the ‘target’ of government officials. A comparable situation would exist if statistical research were to reveal that certain (small) subgroups (e.g. ethnic minorities) were especially liable to be involved in particular -for example criminal- actions. A real danger then exists that the statistical information concerned may result in discriminatory treatment based on the mere fact that someone belongs to such a subgroup (e.g. gypsies). In this case, too, statistics may ‘generate’ a potentially censurable entity.
In some cases, a third party, itself not being a respondent, may nevertheless become vulnerable. As an example, a statistical survey of examination results of students may be given. The Ministry of Education will be inclined to think that the CBS is fully entitled to publish details about, for instance, study results of students in an individual faculty of one of the universities. The CBS, on the other hand, feels that this would make the faculty a vulnerable entity and as a rule would not lightly do so. Statistics on causes of deaths may be taken as another example. If such statistics are split up into small regions, differences between these regions can become apparent. Let us now assume that in one particular respect a certain region compares very badly and that only one doctor works in that region. In this case, this doctor is the 'third party' and a potentially vulnerable entity.
2.5 Factors determining disclosure risks
Disclosure risks depend on a variety of factors. Two of the most important aspects are: the nature of the statistical units that the data relate to and secondly the aggregation level of the statistical results.
As to the matter of statistical units, there is a distinction to be made between business data and data about persons or households.(4) From a confidentiality point of view, there are some important differences between these two types of statistical units. First of all the population of people is many times bigger than the population of businesses. Moreover, the variables that ‘indirectly identify’ people (sex, age, municipaility of residence) are usually less detailed and more evenly distributed than the equivalent business variables (activity code, size category, location of operations). Consequently, breakdown of variables in the case of enterprises more often leads to ‘uniqueness’ and therefore disclosure risk. Disclosure protection is even more difficult because of the fact that larger enterprises are usually included in all statistical surveys and because general information about these companies is fairly widely known.
Another important aspect is, of course, the level of aggregation of data. In simple terms, there is tabular information on the one hand, and individual records on the other. Tables have always been the most important product of statistical offices. In the past, it was necessary to define the tables to be produced before the statistical process was started and there was relatively little flexibility to change tabulations during the process. To prevent disclosure of individual information, tables were normally checked on the occurence of recognisable data. Rules differ from country to country, but the standard rule is often that no information is released relating to three or less enterprises, or information about an industry group in which one company has a very dominant position. To prevent disclosure, information was suppressed (CBS used and still uses an ‘x’ in cells to indicate that information is ‘secret’) or combined.
However, modern computer technology has completely changed this situation. It is now much easier for statistical offices to produce ‘ad hoc’ tabulations. At the same time, it has become much easier for the users of statistics to further process and manipulate such tables, to combine different data sets etc. Obviously, this new situation has increased the disclosure risks considerably. Another and even more far-reaching aspect (from the confidentiality point of view) of modern computerisation has been that it has generated an increased demand from the users’ side to have access to individual records, in order to do their own tabulations and analysis. To meet such a demand of course creates serious problems. Even if individual record are fully anonymised (i.e. if all direct identificators such as names, addresses, telephone numbers and registration numbers have been removed), it is often relatively simple to trace rare or unique elements in a data set, particularly if the user of the file happens to have ‘response knowledge’(i.e. knows that a certain individual element is present in the data set).
2.6 Registers; a case in their own right?
From a confidentiality point of view, registers containing names and addresses are a special case. Such registers are an indispensible tool for statistical work.
In the case of The Netherlands, the General Business Register is one of the most important instruments for economic statistics. For social statistics, the most important registers are the so-called Basic Geographic Register (postal addresses) and the Municipal Basic Personal Data Register. The General Business Register (to be discussed in more detail elsewhere in this paper) is maintained by CBS itself and is used for statistical purposes only. The other two registers are maintained by third parties and serve primarily administrative, but also statistical purposes.
Over time, the ideas about confidentiality of register information have changed. In the beginning of the century, the CBS published various lists of names and addresses, including labour unions and organisations who gave financial help to the poor. The aim was to keep the underlying registers complete and up to date and consequently to enhance the quality of the statistics that were based on them. It was thought that the mere existence of enterprises and institutions and their activities were public information.
Later the CBS came to regard all such information as confidential and secret, including information that was extracted from public annual accounts of enterprises, municipalities etc. The idea was that the information was processed according to statistical criteria and that this processed information was not by definition exactly the same as the original information; therefore the data providers were entitled to confidentiality. Moreover, it was argued that anyone who was interested in the original information could get it elsewhere, either from the providers themselves, or from public archives.
2.7 The European dimension
A short note about the European dimension of the confidentiality issue may be useful here. In the past, National Statistical Institutes of the Member States of the European Union have often used confidentiality rules as an argument not to supply certain detailed statistics to Eurostat. This situation, which sometimes made it impossible for Eurostat to compile aggregate data at the European level, was in principle ended with the adoption of a Regulation (known as 1588/90) of the European Council. The aim of the Regulation was to enable countries to supply data to Eurostat regardless of national confidentiality rules. Of course there were some strings attached: first of all the supply of the data had to be compulsory on the basis of a specific statistical Regulation, and secondly, Eurostat had to make sure it did have the necessary security measures in place. An official committee of representatives from Member States (Committee for Statistical Confidentiality) was established to oversee the implementation of these measures. At present, it may be said that security at Eurostat has now reached an advanced level.. In addition, it should be mentioned here that Eurostat has long played a leading role in the methodological aspects of disclosure control. It organised various international conferencers on the subject and the European Commission (Framework programs for Research and Development) has also subsidised a number of research projects in this area. In this framework Statistics Netherlands has developed its so-called Argus software for automated disclosure control.
Meanwhile a new problem at the European level has cropped up: there is an increasing demand from researchers for access to micro-data that are available at Eurostat and a legal framework to make this possible has to be developed.
3 Confidentiality and business data
3.1 The first business census
Confidentiality became a real important issue when the CBS began collecting business data through business censuses. While the Business Census Act of 1930 was discussed in parliament, the Minister assured parliamentarians that: ‘It is not our intention to publish information about any individual enterprise, but to publish aggregated data for groups of enterprsies only. Therefore, there is no need fo fear that business secrets will in any way be revealed’.
It is interesting to note that for once the statisticians, at least in those days, were less strict than the politicians. When the statistics were finally ready for publication (in 1933), the then CBS Director proposed to also publish information pertaining to just one or a small number of enterprises, because, he wrote: ‘These data could not be regarded as industrial secrets, even more so because they relate to the situation of nearly three years ago’. The responsible Minister, on the other hand, requested the CBS to strictly adhere to what he had once promised in parliament.
Remarkably enough, it seems that the CBS was reluctant to adhere to this commitment. In an internal memorandum (1932) it was even suggested to start selling names and addresses of manufacturers of and traders in certain commodities. The reasoning was that this type of information was not confidential. On the contrary, it was believed that the business community would be grateful for this kind of free publicity. Later on, when the Business Census of 1940 (which did not take place because of World War II) was being prepared, these arguments were repeated. It was mentioned that some other government institutions also published information about individual enterprises. In other words, the notion that a statistical office is different from other government agencies was not specifically recognised.
3.2 Later business censuses
When the Business Census Act of 1930 was being revised (1939), an article was included saying that the collected information could only be published with the consent of the responsible Minister (i.e. the Minister of Trade and Industry, later renamed Economic Affairs). Again, this was meant to prevent the publication of business secrets. It is interesting to note that obviously this responsibility had to be given to a Minister, not the CBS Director-General. When the Business Census Act was once more revised (1959), the then Director-General of the CBS tried to change the rules: his proposal was that the Minister would only have a say in the general format of tables that were to be published. This was accepted.
When later on the tabulation programmes for the 1963 and 1978 Business Censuses were developed, it became clear that it would be impossible to prevent publication of data about single enterprises, without seriously damaging the usefulness of the statistical information, in particular as to regional detail. Considering the nature of the information to be published (regional distribution of number of establishments by type of activity and size-class), it was felt that this was acceptable. Publication of turnover indications for individual establishments, however, would be prevented.
3.3 Building up a General Business Register
In the sixties, the CBS began to build up a General Business Register (GBR). From the start it was underlined that this Register would be used for statistical purposes only. It would contain the names, addresses, activity code and size-class (number of staff) of all enterprises and institutions (excluding agriculture and the public sector). Oddly, confidentiality issues were not even mentioned when the GBR plans were officially proposed to the Central Commission for Statistics.(5)
Soon outsiders (e.g. ministries, the postal authorities, regional administrations etc.) began to show a keen interest in obtaining information from the GBR. The CBS, however, was very restrictive in responding to such questions: only if enterprises explicitly gave their permission were data about individual units released to third parties. However, when ‘competing’ business registers began to crop up and business surveys by others (resulting in an increased response burden and un-coordinated statistics and consequently adverse reactions of the business community) were planned to be held, the CBS had to reconsider this point of view. In 1974, e.g., the Dutch Central Bank wanted to obtain activity codes from the GBR in order to supply these to commercial banks, ultimately for statistical purposes mainly, the CBS -reluctantly- agreed. It was, however, recognised that this was not really consistent with confidentiality rules and that there was a need for a sound legal basis for the external release of individual business data from the GBR.
3.4 External release of business register data
The external release of GBR data turned out to be a complex issue. In 1978, the Ministry of Economic Affairs submitted a draft Bill about ‘Rules on the supply of data about enterprises, institutions and individual professionals by the Central Bureau of Statistics’ for advice to the Central Commission for Statistics. An important role in the new structure was envisaged for the recently established so-called Databank of the Union of Chambers of Commerce. (6) The CBS would on the one hand be entitled to supply some basic data from its GBR to government institutions, for statistical purposes only, and would on the other hand give data to the Databank, which would be responsible for the supply of data to the business community (regardless of the purpose for which the information would be used). In the Explanatory Memorandum to the Bill it was stated explicitly that ‘the Bill would lift the confidentiality rules of the CBS for a restricted number of enterprise data’, but that individual enterprises would be in a position to forbid the CBS to release data about them.
Ultimately, it was not until 1988 that the ‘Law on the release of CBS data for statistical purposes’ was adopted by parliament. However, for various reasons the Law never became effective. One reason was logistics: in order to be sure that enterprises agreed with the release of their individual data, the CBS had to ask them explicitly (a very heavy and costly procedure for more than half a million units). Another reason was lack of effective demand for data needed for statistical purposes. The third reason were developments in computerisation and registration initiatives and activities elsewhere.
Early 1995 the CBS established a steering committee, jointly with three other parties that are responsible for the maintenance of registers of enterprises: the Tax Administration, the National Institute for Social Security and the Union of Chambers of Commerce. The mandate of this steering committee was to investigate the posibilities for closer cooperation and data exchange, in order to reduce the respondent load for the business community. However, one of the starting poiunts of the exercise was that the statistical confidentiality of the CBS would not be touched. In other words, the CBS would only be allowed to give information from its GBR to others if the registered enterprise would give its explicit consent.
3.5 Production statistics
For the statistics about production and utilisation (make and use) of enterprises, as well as for trade statistics, the CBS needed a lot more detailed information from businesses than was necessary for the Business Census. The need for such statistics became first manifest during World War I. The ‘Law of 1917’ enabled the CBS to collect information on the production and utilisation of commodities, including information on previous years. Enterprises were obliged to supply the information and on the other hand the CBS had a strict obligation to keep the individual data confidential: ‘CBS officials shall use the information they collect for their work only’. And the penalties for infringement were stiff: ‘He who willingly violates his confidentiality duties, shall be punished by imprisonment of a maximum of six months, or a financial penalty of a maximum of 600 guilders’, for those days a lot of money.
Nevertheless, quite a few politicians were not convinced that things would be handled properly. In particular, the right of the CBS to look into the accounts of enterprises appeared to be a delicate issue. How can we be sure, they wondered, that CBS officials would not use ‘delicate and detailed information’ for their own profit, for example by selling it to the competition? Or just be negligent and tell their relatives or friends? Who, they wondered, is ‘so unimpeachable and close-lipped’? The Minister, however, considered these objections as theoretical and overdone. Finally, the Law was adopted by a small margin. Now it was a matter of implementation. To that end, comprehensive and complex confidentiality measures were taken to ensure that CBS staff who were processing the data could not know to which enterrpises they related. A commission representing the Union of Manufacturers was invited to inspect the measures and they were impressed with the efforts the CBS had made to protect ‘business secrets’. A sceptic in the Central Commission for Statistics who dared ask what would happen if some ‘State Commission’ were to visit the CBS and ask for information about particular enterprises and who believed it would not be possible to ‘set fire to the papers’ in such a case, was rebutted by the then Director-General: ‘Of course that is possible!’. In very exceptional cases enterpreneurs were allowed not to supply information on certain products on the questionnaires, but to give this information either orally or in a sealed envelop, after which the CBS could decide to publish the information under the heading ‘other materials’ or ‘other products’, or to put an * in cells that would have contained such highly sensitive information.
3.6 Law of 1936
After about twenty years, the Law of 1917 was ripe for replacement by something better. A need was felt to extend the obligation to supply data to the CBS to all economic statistics (in principle). Although there had never been real difficulties with respect to confidentiality, the Central Commission for Statistics deemed it necessary, taking into account that the CBS would have a broader mandate, to also more explicitly lay down the confidentiality rules. Therefore, it proposed wording saying that ‘statistics compiled by force of this law shall not be published in such form that information pertaining to individual persons, enterprises or institutions may be revealed, unless these persons, the head of an enterrpise or the board of an institution has stated that he does not object agianst such publication’. In addition, to protect bank secrecy, the collection of data from banks became the prerogative of the Dutch Central Bank, which had to forward these to the CBS, in aggregated form.
After World War II, most economic statistics were brought under the umbrella of the Law of 1936. As to confidentiality policies, the Law of 1936 has long been the model for all CBS statistics, including statistics that were not ruled by this law.
For industrial statistics, the interpretation of the rule of the Law was as follows: it was assumed that the Law was respected if
*) published data or data that were supplied to others in different ways related to groups of at least 4 enterprises (for data on employees at least 3)
*) in such groups of at least 4 (or 3) enterprises there was not one enterprise with a dominant position (which was interpreted as: representing 75-80% of the total).
About some sectors, no statistics were published, or only with the permission of the enterprises involved. This had to do with the very dominant (or even monopolistic) position of some enterprises in a number of industries (e.g. the National Railroad company and the State Post, Telephone and Telegraph company). What was also taken into account were existing forms of cooperation in certain sectors and industries. For example, there was a union of a producers of a certain type of construction materials. It was known that most, but not all, of the enterprises in this sector were members of this union and that the union also collected statistics of their own. At a certain point in time, the publication of statistics about this industry was stopped in order to prevent that information about the non-members was revealed. Only when it became known that all enterprises had joined the union, the publication of statistics was resumed.
3.7 Trade statistics and ‘passive confidentiality’
In 1916, Statistics Netherlands took over external trade statistics from the Ministry of Finance. These statistics were based on copies of customs documents.(7) Both the nature and the enormous volume of these documents created special confidentiality problems: it was in fact impossible for the statistical agencies to apply an active confidentiality policy. The point was that hundreds of thousands of documents were processed each month, and that the people who completed the customs documents (and whose name was mentioned on them) were not necessarily the owners of the goods and therefore the subjects of protection against disclosure. On the other hand, the external trade statistics were published in great detail: thousands of commodity groups and a large number of countries. Therefore, a confidentiality approach was follewed in which the initiative to ask for confidentiality (=non-publication) was left to the interested parties. This system is sometimes called ‘passive confidentiality’.
The users of statistics were often not happy with this, because it could happen that the entire export of a certain commodity, which everyone knew or suspected had to be there, was not mentioned. A problem was that in the early years the number of parties who asked for non-publication of certain data was increasing steadily and that this led to serious deficiencies in the external trade statistics. In this context, it was felt that there was a need for objective rules about when to publish and when not. This was difficult. One of the solutions proposed was therefore to set up a committee to advise on individual cases of this kind, but this was rejected. In 1935, however, a new proposal in the same vein was made and accepted by the Minister of Econmic Affairs.
After World War II the rules were slightly changed. Uniqueness was no longer accepted as a ground for suppression of data: the interested party that asked for secrecy had to make clear that publication of information on his trade transactions was harmful. In the sixties, another criterion appeared: if the information to be suppressed could also be found in statistical publications of another country (in the case of exports from The Netherlands as imports in country X), it was thought to be ineffective to suppress the data in the Dutch publication only. However, in 1983 the rules were again made more strict and uniqueness once more became the dominant criterion. Suppression of data, on the other hand, was always only granted for a restricted period, mostly for one year, after which the interested party had to ask for renewal.(8)
4 Confidentiality and data about persons and households
4.1 Population censuses
Of course population censuses are the richest source of individual data on persons and households. In old Dutch census legislation, confidentiality of data was not mentioned at all. The population censuses in those days were not only held for statistical purposes, but also to check and improve population registers. In fact, in the early 19th century the census was used to build up municipal population registers. The protection of privacy and confidentiality of personal data were not an issue in those days. In old census publications many data were published that must have been easy to relate to individuals.
Later on, the census data were used for checking and correction of these registers. The Central Commission for Statistics and the Director-General of the CBS have long been active in promoting better population registration methods, and this contributed to improved quality of population registers. Consequently, the need to use population census data to check and correct these registers diminished. This in turn enabled the CBS to gradually emphasize confidentiality of individual census data. Among other things, it prohibited municipalities (who played an important role in the data collection process) to use ‘individual census data for the completion of population registers, in particular with regard to religious affiliation’. It should be noted that this was done with the full consent of the Government Chief Inspector of Population Registers.
Especially after World War II confidentiality of census data became an important issue. The first post-war census was also a census of dwellings and there was fear that the government might use the individual data on dwellings for the redistribution of dwellings, which were very scarce. However, it was not until the 1960 population census that formal provisions about confidentialty were introduced and even then it was not entirely clear whether violation would constitute a criminal offence. In practice, however, the CBS was very active in trying to secure confidentiality. E.g. in 1960 large numbers of questionnaires were confiscated by the CBS when it became clear that regional sociographic services were using census questionnaires to quickly process and tabulate census data for their own purposes.
4.2 Is Big Brother watching us?
Around 1970 the preparations for the population census to be held at the end of that year, created some public commotion. In that same period Parliament was discussing a plan to introduce a central, computerised population register, in combination with a uniform personal administration number. This fuelled fear for abuse of information. A ‘Population Census Vigilance Committee’ was leading the extra-parliamentary opposition, which was at times fierce. The original ideas about the census, which the Central Commission for Statistics and the CBS had been discussing for years, had to be revised on a number of points. In Parliament several discussions about points of principle were held, such as penalties for infringements on confidentiality rules, the necessity for certain questions (such as disability and other private issues), the destruction rules for individual data etc. In a special meeting of the Central Commission for Statistics (17 December 1970) it was concluded that there were stronger adverse feelings towards the census than was anticipated and that this had much to do with plans to create a computerised central population register. It was also noted that the independence of the Central Bureau of Statistics and its complete separation from all registers was not generally recognised. It was also felt that action committees not only displayed a violent anti-establishment attitude, but that there was also genuine popular discomfort because there was no general privacy law in the Netherlands.
In the meantime and quite on the other hand, representatives of the research community were active to promote the idea to use the population census data more intensively for social research purposes, including -for longitudinal analysis- linkage with future census data - at the individual level. This meant, of course, that among other things census data would have to be archived for longer periods. CBS, CCS and the government proposed a compromise: a 10% sample of the census data would be kept on an individual basis. Ultimately, two important concessions were made to take away privacy fears. First of all, it was agreed that names and addresses would be removed from the files as soon as the data would have been checked for completeness. In addition, there was a firm government promise that corrections in population registers that would take place on the basis of census data, were not to have any negative consequences for the people involved (e.g. illegal immigrants). Moreover, in practice it was decided that no legal steps were taken against those who had explicitly refused to cooperate (about 30.000, mostly concentrated in big cities).
4.3 The last Population Cenus
The 1971 Population Census experience led to some important changes of the rules for the Population Census of 1980. The new Law said that the target poupulation for the census were no longer the people who should have been registered in population registers, but the population that was actually registered. In addition, no longer were individual data given to the municipalities. Clearly, the intention was to make sure that population census data were only meant for statistical purposes, and not for checking administrative systems, in this case the population registers. In parliament, the population census was described as a ‘closed statistical information system’. Finally, the obligation to respond was retained in the law, but the penalties for non-response were taken out. The general idea behind this was that cooperation and ‘propaganda’ would be more effective than coercion in making the population comply. And to remove any remaining fears for Big Brother, there was a provision in the law that matching population census data with any other data was forbidden.
However, all these good intentions were overtaken by the realities of life. During trials for the 1980 census it became clear that the Dutch population did no longer accept the idea of having a census: there was an overall non-response of 26% (and considerably higher in certain urban areas) and consequently the then Director-General proposed to cancel the census. The costs were high and the risks of failure were big. Meanwhile, the necessary data (including those for Eurostat) were compiled by a combination of data from population registers and some sample surveys. Obviously, this did not produce all the detail that a population census would have produced, but for most purposes it appeared to be sufficient. After a while, the population census was also officially abolished (Law of 29 May 1991 to abolish the Law on Population Censuses).
4.4 General Privacy Law
In the political arena, including pressure groups, one of the arguments used against holding a population census (and various other data collection initiatives) had long been that there was no ‘General Privacy Law’ in the Netherlands. It took quite some debate and time before such a Law was finally adopted. The General Privacy Law (Law on Registrations of Persons) did not become effective before 1988. An independent institution called Registration Chamber (‘Privacy Commissioner’) was set up to oversee the implementation of the law. It is beyond the scope of this paper to discuss the law in any detail, but some of its characteristics should be mentioned here. First of all, the scope of the law is fairly wide. For example, and this is very relevant in the area of statistics, registrations of businesses (e.g. the General Business Register of Statistics Netherlands, but also subsets of this register, as well as data-files about specific industries) which include a substantial number of ‘one-person-businesses’ are considered to be ‘registrations of persons’. Secondly, the Law stipulates that there has to be a ‘regulation’ for each individual registration, which must be submitted to the Registration Chamber and has to be accessible for the public. In the case of Statistics Netherlands, this means that a hundred or so such regulations are available in the (public) library. Regulations must describe the purpose of the registration, its content, who has access to the information, whether any matching with other data takes place etc.
Fortunately, the law contains a few exceptions from the general rules for those registrations that are kept for statistical and research purposes. It should be noted here that it has taken considerable fights to get this done. In particular, the general right of registered persons to have access to their personal files and to correct data that they consider to be incorrect, does not apply to ‘statistical’ registrations of persons.
5 Confidentiality and research
5.1 General considerations
It has been obvious for a long time that users of statistics and in particular researchers had serious problems with the confidentiality rules that Statistics Netherlands used. From time to time there have been intensive internal and external discussions about the subject. In the sixties, in CBS discussion papers on the subject, parts of a doctoral thesis were quoted extensively. A brief summary of what the author of the thesis said may be illustrative. ‘The way CBS publishes regional employment data, suppressing many details, is unuseful and therefore regrettable. It is also quite unnecessary, because it is relatively easy to find out the number of employees for individual companies. Often, only meaningless aggregates are published and this will get even worse considering concentration trends’. And the CBS author of the discussion paper adds to this: ‘There is a lot of truth in what the author writes. A sensible policy in this regard should try to take both the interests of the data providers and the users into account’.
There are various kinds of needs for detailed information that cannot be published under rigid confidentiality rules. Regional bodies obviously want a lot of regional detail. Business analysts want detailed information about industries or commodities. Academic researchers like to do their own kinds of analysis on sets of individual data. Holders of registrations (e.g. the Tax Authorities and the Chambers of Commerce) would like to use CBS information to improve the quality of their own registers.
In conclusion: the CBS has been under some pressure for many years now to provide sets of individual survey records (about private persons and households) for research purposes. Universities and research institutes have argued persistently that it is a waste of public money not to make full use of the rich data-sources the CBS has.
5.2 Microdata for research purposes
Ever since the sixties, the demands to make use of statistical microdata (individual records) for research purposes has been growing steadily. Of course this has a lot to do with the development of computing facilities. For some time the Royal Dutch Academy of Sciences (in particular the so-called Steinmetz Data Archive) has been acting as intermediary for the archiving of and providing access to CBS-microsets. In the eighties, however, methodological and other difficulties began to arise. Advanced analysis techniques had shown that making individual records anonymous was as such insufficient to prevent disclosure. Consequently, while demand was growing, the CBS became more and more reluctant to give access to its microdata. However, it was impossible to stop supplying microdata completely. Therefore, different techniques were developed to reduce disclosure risks. Some of these techniques were data-intrinsic, in the sense that data were suppressed, scrambled, randomised or stripped. This resulted in the creation of so-called standard research files. They were supposed to contain sufficient information for relevant social research. However, researchers labelled these CBS policies with terms as ‘mutilation’ and ‘patronising’.
In addition to technical solutions, legal ways to circumvent the problems were sought and found. What it all boiled down to was that the CBS opened the possibility to obtain data-sets under protection of a contract between the CBS and the employer of the individual researcher, usually a university. These data sets were somewhat richer in detail than the ones mentioned before, but their structure nevertheless made the disclosure of individual data unlikely. The contract between the CBS and the research institution contained several safeguards for data protection, as well as serious penalties in case of infringements of the contract.
Confidentiality, incidentally, was not the only issue in the conflicts between the CBS and the research community where it came to the supply of micro-data sets. Another point was that the government had imposed the obligation on the CBS to apply a kind of cost-retrieval; it had to generate one million guilders of annual income from this activity, which was meant to cover part of the data collection cost, but also to make the real cost of research more visible. Consequently, the price of a complete micro-set for the Labour Force Survey surged from 200 guilders in 1973 to 160.000 guilders in 1985. Not surprisingly, this was not appreciated by the reserach community and also it created higher demands as to quality, timelines and ‘after-sales service’. Various organisations and lobbyists’ groups tried to make the CBS change its (pricing) policy.
5.3 Steps towards a solution
To find a way out, a committee was formed by the Central Commission for Statistics. In 1989 this committee produced a preliminary advice. It is interesting to note here that the committee and the CBS disagreed on one point of principle. While the committee thought (in agreement with the reserach community) that it was part of the ‘mission’ of the CBS to supply microdata for research purposes, the CBS believed that this was not the case. It argued that its core task was the production of statistics, i.e. aggregates, and that microdata-sets were only an intermediary product and that it could, but was not obliged to make them available for outsiders. This argument may sound theoretical, but ultimately parliament, when it adopted the new Statistics Law (1996), chose the CBS-side in the issue, which had certain consequences for the relevant legal provisions.
Before the matter was finally settled, the preliminary advice of the CCS committee led to new contacts between the CBS and the research community, aiming at finding a practical solution. After fairly long discussions, a memorandum was drafted entitled: ‘Towards an agency for statistical microdata-sets’. The idea was that the Netherlands Organisation for Scientific Resarch (NWO) would provide the necessary infrastructure to set up this ‘agency’. Among other things, an advice of the Registration Chamber was sought. Ultimately, an agreement (1994) between the CBS and NWO about a ‘Scientific Statistical Agency’ (SSA) was concluded for a period of four years. The main points of this agreement were:
*) CBS committed itself to supply (anonymised) microsets of its surveys among households and persons to the SSA.
*) NWO (which is financed by the Ministry of Education and Science) supplied the one million guilders income that the CBS had to generate.
*) SSA would be the broker between the CBS and researchers and the CBS.
*) CBS would concentrate on the production and distribution of microdata-sets.
When the SSA construction was evaluated in 1998, all parties agreed that it had been successful and that the agreement was to be continued.
5.4 And the final (?) steps
As mentioned above, the SSA deals with microsets about persons and households exclusively. Data sets about business surveys are excluded. In this regard, more strict confidentiality rules apply, also from a legal point of view (Law of 1936). Nevertheless, economic researchers have developed a growing interest in mciro-data as well. In recent years, several government advisory committees have been pointing out how unsatisfactory it was that CBS microdata could not be used for research purposes.
Therefore, as a kind of extension of the SSA construction, and after long consultations with the Central Commission for Statistics, the Ministry of Economic Affairs and business associations, the CBS has decided to create CeReM (Centre for Economic Microdata Research), for a trial period of three years. CeReM is a facility for ‘on site’ analysis of microdata relating to business surveys. Some of the most important CeReM rules are the following:
*) The Central Commission for Statistics defines the general criteria for who (which categories of researchers) is allowed to work ‘on site’.
*) Users must sign a confidentiality statement.
*) Output can only be exported after a ‘disclosure protector’ of the CBS has checked it.
*) Users work on stand-alone machines only, so they cannot e-mail from their work station.
So far, CeReM functions satisfactorily and there is a growing interest to work ‘on site’.