2270 JM Voorburg
Data in the Polder
Data in the polder
Statistical information is part of the infrastructure of modern society. One can not imagine an economy and democracy performing up to modern standards without basic statistical figures. Every citizen and every business is part of the public that enjoys, often even without being aware and having a choice, the benefits of official statistics. Political responsibility for this type of public good dates back to the nineteenth century. In the slipstream of the industrial revolution and the social question national statistical institutes were established throughout the western world.
It is interesting to look at some of the analogies between official statistics and other components of the societal infrastructure. In the Netherlands’ context it is only natural to look at what is called water management. The struggle against the rivers and the sea has moulded the country. This struggle has been a responsibility for all social groupings and political parties and every single citizen has profited from the polders, the dikes, the Afsluitdijk, and the Delta works. Pro-active maintenance of this infrastructure is a recurrent task for politics. We need to know when in the short run (rivers) and in the long run (oceans) the levels of the water will be rising. The importance of this physical infrastructure for the Low Countries by the sea, the Netherlands, can hardly be overrated.
The same goes for official statistics. Government can not do without them. In fact statistics have been part of the glue that held together the - social and religious - pillars constituting the nation. We may disagree about policies but we can not disagree about the facts underlying them. Employees and employers alike have to trust the quality of, say, the figures on inflation and unemployment.
The analogy between water management and official statistics can be stretched a bit further. We must be careful with the inflow, but also with the throughflow and with the outflow of water. If a dike is destroyed, the area of land flooded and the resulting damage may be disproportionate. We must cherish the dikes, even if we do not want to spend too much money on them. The very same goes for our statistical data. We need data like we need water. We want to apply data for all kinds of fit purposes. But we must think, even worry, about the continuity of our infrastructure in the long run.
Data are dear. Data are dear in the first place because a lot of our budgets go into collecting and compiling them. And data are dear because we love them. But in a more profound sense they are dear, too. It is impossible to produce and publish statistical information without basic data. For this reason we need to foster and cherish our data and our systems. And especially we need to reassure ourselves, as well as our masters and our clients, that the long-term continuity of the data and the systems is warranted. It is my position that the proper treatment of statistical confidentiality is at the heart of this continuity.
Now, if the water system changes, because of the pollution of the rivers, because the sea level is rising, or because more rain is falling in a shorter period, its management has to adapt to these changed circumstances. The same goes for statistics. Our data architectures are changing, our political and social environment is changing. Our users are growing in numbers - fortunately - and in diversity of demands. We have to construct new devices to deal with these changes in our environment. And these new devices will still have to preserve the essence of statistical confidentiality as the cornerstone of official statistics.
In this paper I review the new data architecture of Statistics Netherlands, the evolution away from a stovepipe approach to a focus on micro-level databases. The recent reorganisation of Statistics Netherlands is intended to reflect this evolution to a large extent. A government paper on the position of Statistics Netherlands has accompanied this reorganisation. The legal implementation of this government paper entails legislation as explained in paragraph three. Of course the new law adapts of the principle of statistical confidentiality to modern circumstances. I review both the principle itself and two important formal exemptions in the paragraphs four through six. Two important dilemmas have to be dealt with in the near future, to wit, our relations with public opinion at large, and with other organisations, especially government departments that intend to copy (parts of) our data architecture.
2 New architecture
A number of colliding reasons have led Statistics Netherlands to revise its data architecture quite dramatically. In the past, each statistical project, from the Population census right to the annual prison statistics was devised as a separate entity, beginning with its own questionnaire and ending with its own publication. Later, the results of these projects were integrated on a macro- or even meso-level in the system of the National Accounts. Then, sharing definitions, et cetera, facilitated this integration.
The extension of the integration of data from the macro- to the micro-level has been pushed forward by respondents and users alike. Administrative burden has been a major political and social issue since the seventies. It is only fair to add that response percentages on social sample surveys have fallen to incomparably low levels in the Netherlands. Sharing data has been one of the logical measures to tackle this issue. Users in turn have put forward a host of new questions, which often require extensive micro-data: questions of social cohesion, of longitudinal change, et cetera. Computing powers have increased to such an extent that one could now process a moderate Population Census at home while only three decades ago mainframe computers entered statistical offices and universities. Integrated data systems have advantages for various societal groups, i.e., respondents, users, and taxpayers:
· Results are more consistent across statistical domains
· Results are more consistent across users.
· Administrative burden can be reduced.
· Costly internal data processing can be reduced.
· Researchers can concentrate on analysing rather than collecting data.
The new architecture is best described as the combination of administrative files with each other and with additional sample surveys. The advantages and disadvantages of both types of files for statistical production and research (e.g., coverage, quality, timeliness, continuity, and cost) are well known. The combination of both types of files into one grand design offers new advantages and challenges. The design itself may be conceived as in the figure below, a huge datamatrix that may be sparse at certain co-ordinates.
The architecture is applicable in itself to both economic and social surveys. It is obvious that the architecture is very dependent on matching and merging files. The existence of a high quality population administration and a statistical business register facilitate this architecture considerably. The analogy is apparent from the names given to the datamatrices, the Economic viz. the Social Statistical File. There are important differences, however, if only when statistical confidentiality is at stake. In a three-dimensional table on basic background variables, there will be more uniques amongst businesses than amongst persons, and they will be much more publicly known and visible as well.
3 New legislation
The data architecture sketched above was introduced internally both for quality and for efficiency considerations. Furthermore, to increase the effectiveness and efficiency of the office management processes were reviewed in isolation from the primary processes. This interdepartmental review led the government to conclude that Statistics Netherlands should have an even more independent position within the government apparatus. The introduction of this status as an executive agency, is accompanied by a number of other measures, such as giving Statistics Netherlands legal personality by itself, the introduction of accrual accounting, and making its head the one formally in charge of all personnel. These changes necessitate a process of formal legislation, even though the principal substantive and methodological autonomy of Statistics Netherlands has been well founded within the current (1996) law, already. At any rate, Statistics Netherlands saw and took the opportunity to innovate on other issues as well. Three of these issues were addressed at the invitation of the government after the interdepartmental review mentioned above: the matter of free access to administrative files, the impact of the European Community, and the maintenance of the response obligation for businesses. The issue of data access for research purposes was addressed in the slipstream, as a more technical matter. Also the relation between Statistics Netherlands and the Dutch National Bank will be enhanced in the new law.
Since a general framework law on executive agencies is almost completed by Parliament, the structure of this draft is followed largely in formal terms. It is good to note, furthermore, that the new CBS law is meant to supersede all earlier statistical legislation. Three formal laws will be withdrawn: the 1936 Law on Economic Statistics, the 1988 Law on the external use of the General Business Register, and the 1996 Law on the Central Bureau and the Central Commission for Statistics. On the other hand, once the new law has come into force, a number of new implementing measures will have to be adopted straightaway, if only to warrant continuity in the statistical processes and publications.
The new CBS law is being drafted between the ("mother") Department of Economic Affairs and Statistics Netherlands. Its status at present is that the Council of Ministers have adopted the text at the proposal of the Minister of Economic Affairs. The Council of State, the highest advisory body, has been asked for its advice, according to standard procedure. Two kinds of problems will have to be tackled in the time to come: in political circles executive agencies are not as popular as they used to be a decade ago, and there will be elections for parliament within half a year. Also, the timing of all implementing measures is not fully sure yet. In order to tackle both problems at the same time, the current Minister is bound to submit the draft to Parliament, after having received the Council of State’s advice to Parliament before the election round. Until such submission, the text of the draft is confidential, however.
4 Basic principle
The data received by the Director-general under this law are used exclusively for statistical purposes. They will not be transmitted to anyone not charged with performing the task of Statistics Netherlands. They will be published only in such a way that no recognisable data on a separate person, household, business, or institution may be derived, unless there is a reason to assume that the business or institution involved will not object. Thus, the principle of statistical confidentiality is stated quite firmly within the law of 1996. From the explanatory notes it is clear that this obligation forbids the access to, and use of, individual data collected by Statistics Netherlands by legal, fiscal, and other authorities.
The motives underlying the principle of statistical confidentiality are multiple. It is a matter of informational ethics and transparency that data shall not be used for other purposes than they were collected for. One can view the data collection as a deal between the interviewer and the interviewee, one of the terms of the deal being that the data will not be used for other purposes. And in view of the low response percentages in the Netherlands statisticians have to go a long way to satisfy their prospective respondents. The same argument applies to the use of administrative data for statistical purposes. The administrations have to be convinced that the data of the units in their administration, and therewith their identities, will remain in safe hands once they are transmitted to the statistical office.
As far as the collection and processing of data is involved it is important to note that Statistics Netherlands is fully authorised to process individual identification numbers and sensitive data as defined by the recent Dutch law implementing the Community Data Protection Directive. Furthermore Statistics Netherlands is legally authorised to process sensitive data as defined by the same law.
Internal processing of data collected for statistical purposes has to comply with government standards. Technical and organisational measures have to be taken to safeguard data against their loss or violation and against unauthorised disclosure, mutation and distribution of such data. This important principle has been stated at large and explicitly in government instructions since 1994.
Apart from the usual consultation with other departments a number of advisory bodies have been asked to comment upon issues of statistical confidentiality and data access:
· The CBP, the Netherlands’ privacy authority, is acquainted with the high standards applied by Statistics Netherlands and does not object to the draft. Its most serious doubts concern the services delivered by Statistics Netherlands to the government departments. This issue will be dealt with below.
· ACTAL, the advisory committee on administrative burden, agrees with mainlines of the draft and asks for the possibility - in the long run - to transmit individual business data to other government agencies. It agrees with the proposed system that data may be collected only if they are not available otherwise.
· NO-NCW, by far the largest employers’ organisation, on the other hand, insists that data collected exclusively for statistical purposes shall not be used otherwise. It supports the access to administrative data.
· NWO, the Netherlands Science Foundation, thus far has been enthusiastic about the current system of restricted access to microdata. The demand to access data outside the standard package of sample social surveys is growing exponentially, however.
5 Exception I
If the principal obligation of statistical confidentiality is one of the cornerstones of official statistics, it must not be applied without further thought. In order to promote and preserve the safe and useful exploration of the data, two important exceptions are mentioned within the law. The first refers to the use of microdata for research purposes. The configuration introduced in the Law of 1996 is repeated, and extended in two ways. The configuration entails the following:
· An explicit legal authorisation as an exception to the general principle
· To release microdata
· Under contract
· To specified users.
The microdata stem from social sample surveys (because for the compilation of economic statistics another, stricter law is dominant) and are anonymised and otherwise protected against spontaneous recognition and disclosure.
The contract implies that the data concerned will not be matched to other data, and will not be transmitted to other researchers. Researchers will show the draft of their results (aggregates only) to Statistics Netherlands for a disclosure check. And they will destroy the data if they do not use them any longer. Employees of research institutes may be fired and the institutes themselves will be excluded from future releases in the case of violation.
Four categories of users have been specified within the law. They include the universities, the Dutch government planning offices (think tanks), and Eurostat. Research institutes not specified within the law may be authorised by the Central Commission for Statistics. The Commission has formulated its policy: applicants have to have legal personality and continuity, have to do research as a core activity, and have to publish (rather than sell results to one private party). In practice these criteria are not always easily applied, however.
The extensions to this policy entail the range of data and the methods of access to such data.
· The integration of two legal regimes, the laws of 1936 for economic statistics and 1996 for social sample survey microdata, into one new system make possible an integrated policy with a full legal underpinning. Sample survey data and administrative files for persons and businesses will all be covered by the same system of legal provisions. The pilot project for on site analysis of business microdata, that was astarted for a three year period in 1998, will thus receive a legally supported sequel.
· Only five years ago the release of microdata on tape, disk or CD rom was the dominant method of distribution of microdata to researchers. For several reasons new methods of accessing and analysing microdata are rapidly taking over, such as facilities for on site and remote analysis. The new text is not prejudiced towards one particular mode of access, and therefore implies sufficient flexibility for the years to come, so it seems.
6 Exception II
The paragraph above implies a technical improvement of a current situation. A more principal extension has been made to allow for the implications of European unification. If the exchange of microdata is required for community statistics, the Director-general is given the authority to supply microdata to community and national statistical authorities. The current leeway is considerably increased. In this way it will become easier for Statistics Netherlands to participate in pilot projects if there is no Community legislation. It will become legitimate to exchange microdata for mirroring trade statistics. And it will become easier to compile data on multinational firms. But there still remains the legal task for the Director-general to verify ex ante and ex post with his partners that statistical confidentiality will remain intact. There is no general obligation to transmit microdata.
The Community exchange of microdata fits in with other parts of the draft. The Dutch component of the Community production of statistics is seen as an integral part of the task of Statistics Netherlands. This will make it easier for Statistics Netherlands to apply for additional budget if there are new Community tasks to be performed. And the explanatory memorandum to the law addresses the involvement of Statistics Netherlands in the choice of position of the Dutch government on draft Community statistical legislation.
7 Dilemma I
The advantages of the new data architecture are convincing. In fact, they are so convincing that several Ministers want to use or even copy the approach. To some extent this is easily feasible because there are many administrative data around, say on health or social security, and analysing data is considered part of the policy cycle. Often data analysis refers to quick rough and dirty estimates, forecasts and qualitative assessments that do not belong to the realm of official statistics. Also, empirical data are politically important in the new budgetary system which includes an evaluation ex post on selected key indicators, à la the Community structural performance indicators. The choice and definition of such indicators is considered to be a political prerogative. But there is a tendency with the government departments to want to be master of the measurement as well. One of the challenges for Statistics Netherlands is to strike the right balance between the principles of official statistics (independence, privacy, quality) and the interests, budgets and data of the government departments.
The preservation of statistical confidentiality becomes complex when on the one hand Statistics Netherlands is involved in data processing, especially data matching, and on the other hand the Ministers are entitled in their own right to this kind of data. Two parts of the solution to this strategic dilemma are strategic partnerships and superior quality.
· The most important ally for Statistics Netherlands on issues of uses of data and confidentiality is the national privacy authority. Mutual relations are close and there is professional respect from both sides. Statistics Netherlands will not engage in activities that would not be approved by the privacy authority.
· With superior quality I mean that because Statistics Netherlands has fuller data and more methodological expertise and experience, we must be able to outperform other statisticians and researchers. If these statisticians and researchers want to access microdata for their own purposes, we should offer them excellent facilities and stimulate them to use our microdata rather than to try to duplicate our own works.
8 Dilemma II
Another strategic dilemma is relation with public opinion. In an extreme interpretation of informational privacy statements proxy interviews would become illegal and citizens would have the right to make their data not disposable for official statistics. But it is the exchange of microdata that makes (costly and intrusive) personal interviewing superfluous. This is the statistical specimen of a general problem of e-government, viz., how to reconcile the interests of efficiency and privacy in the eyes of the population. Here as well, the solution has at least two components, and here as well, the strategic relation with the national privacy authority is one of them. The second part consists of maximum transparency. We want Statistics Netherlands to be a stronghold of data but the public may never have the impression that we are doing something behind their back. And, in fact, we are not doing something behind their back. Processing voluminous and sensitive data requires safe settings but it also requires public knowledge about and support for these safe settings.
In my paper I have discussed how Statistics Netherlands uses legislation to adapt itself to changing circumstances, in order to keep on playing a pivotal role in the informational infrastructure. I hope to come back next year and to be able to tell to you that Parliament has adopted the new CBS law. In the meantime you have had a taste of discussions and considerations. As the draft is with the Council of State at present I have to ask you to keep the attached text and the quotations within this paper amongst statisticians only.