The Microlab: micro-data revived
The growing need for high quality statistical micro-data, both for purposes of analysis as well as to be able to make better and more extensive publications, has led Statistics Netherlands to aim at a substantial improvement of the coherence and quality of its micro-data on industry.
The need for micro-data
The process of making statistics is sometimes a somewhat introvert one. Each survey seems to focus mainly on one or a few particular objects of interest, and the connection with other surveys is usually considered to be less important. As a consequence data which in principle could tell us something about similar populations of enterprises, have to be retrieved from different statistical series, and it is difficult to compare them.
In view of the fact that in real life many collected data within the business statistics are interrelated, for instance, staff and production, rate of automation and productivity, innovation and employment and so on, we can only conclude that in this sense statistics
can be improved. Also, the information on business dynamics is not used systematically, while it is essential for many kinds of analysis based on business statistics.
In order to meet the demand for such information, Statistics Netherlands is developing a Microlab, a unit in which analysts will be able to go beyond the individual surveys and relate micro-data from different surveys, and data from different statistical periods to each other.
As well as gradually becoming a tool for tailor-made statistics, statistical publications based on more than one survey, time series in new classifications and new indicators, the Microlab will give us more information on changes in enterprises. Research based on micro-data will be more valuable than ever before. All publication data will in future be derived from the Microlab and at aggregated levels be combined within the publication database StatLine.
Levels of recording micro-data
Most of our business statistics are based on the concept 'kind of activity unit', as defined within the General Business Register. This kind of activity unit is an enterprise or part of an enterprise which independently engages in one, or predominantly one, kind of economic
activity, without being geographically restricted, and for which data are available or can be meaningfully compiled to calculate the operating surplus. This is the lowest level at which data are available which can be used in a statistical sense. As a consequence this will also be the level at which information is recorded within the Microlab. For the sake
of convenience the 'kind of activity unit' will be referred to as 'enterprise' in the remainder of this article, and the data on this level will be called 'micro-data'.
At a lower level we have data on the observation units. These may differ between surveys as they depend on the level of units on which our respondents are able to supply the data in question. Naturally data at this level have no specific meaning for purposes of economic research, and can only be of use to make aggregations to the level of the enterprise. At Statistics Netherlands considerable care is taken to make sure the data on observation units are always addable to data on enterprises, and will as a consequence be fully comparable at this level.
While up to now many of the micro-data were not or only poorly inter-related, the Microlab will pursue optimum linking. This will apply to linkage of statistical periods, as well as data from different surveys. In practice this means that links will be made between the data
definitions used in different statistical periods. The same holds for enterprises. As in this method data are not brought together on a permanent basis, we have a set of unbalanced panels, derived from various surveys with information on the connection of the data-sets.
This will give a great flexibility for research purposes. The Microlab will be able to supply balanced or unbalanced panels as required, thus serving research objectives as well as the needs of the statistical departments.
In order to use the Microlab to make publications, computer estimates will be made for missing data. Research is now being done to refine this imputation method and to improve the quality of micro-data for research purposes.
Depending on which survey the micro-data have been derived from, the database covers the years from 1972 up to the present. The recording of micro-data from the larger, mainly annual, surveys will be completed in the course of 1997, while quarterly and monthly surveys will follow within one year.
While in the future care will be taken to adjust the process in such a way that micro-data can be easily transferred to the Microlab, this year and next, work will be done to include a major part of our historical data too, mostly electronically stored data. Thus figures from as far back as 1972 for a few surveys, and the 1980s for most others will also be recorded in the Microlab. This work is carried out under the appropriate project name 'Save Our Data'. This has been proving to be a somewhat laborious task, as not all data have been kept in a well-documented and well-structured way. So in many cases this will come down to a painstaking process of bringing the micro-data back to life, in a manner of speaking. If lost, or no longer suitable documentation will have to be restored where possible. Sometimes the same holds for the data-files themselves. Due to many conversions in the course of time, some files are not readable at all, in others data have been lost or destroyed. Also there are sometimes doubts about the completeness of data-sets, and how links can be made to formerly published data.
Revision of the statistical process
If we are to achieve a smoothly operated, high quality Microlab, some parts of the statistical process will have to be adapted or revised in the near future. This revision project will define which mutual agreements and measurements are necessary, a process in which at least two major conditions have to be met:
1. a broad acceptance from the statisticians involved;
2. an relatively easy implementation of the revisions or adaptations in the statistical process.
Some relevant aspects in this respect are ownership and maintenance of the micro-data, data storage and documentation, relations between micro-data and publications, date and method of sampling.
Whereas rules concerning data storage and technical documentation, for example, record and file descriptions, can be easily made and agreed upon, purportual information on micro-data will be a different matter.
The way in which meta-data have to be structured may differ per kind of usage. Within the Microlab this is clearly the case, as it acts as a publishing tool and a coordination instrument as well as a database for research purposes.
Other developments at Statistics Netherlands also have to be taken in consideration: meta-systems are under development both for StatLine and EDI (Electronic Data Interchange). To make the most of Microlab the sampling methods have to be examined seriously to achieve a better coordination of methods, as well as coordination of the enterprises to be sampled in a given statistical period. Coordination in respect of updating or revising data-files and publications is also required.
We could consider including all the sometimes very complicated methodological information in the Microlab. However, as this methodology differs greatly between surveys and variables, it would make matters rather complicated if we had to use them every time we wanted to publish. Instead we shall make sure that in the future results on the level of the enterprise are also calculated within every survey; these results will then be recorded in the Microlab.
In addition we will have to make estimations for the non-sampled or non-responding enterprises, not only at an aggregate level, but also on the level of the enterprise. Having done all this, it will be easily possible to make many publication tables by a simple process of adding data together, without having to worry about methods used within the particular survey. Imputing data on an enterprise level gives rise to discussions on the desired refinement of the method used. Some people might feel there are possibilities for adding some quality to the micro-datasets with regard to usability or data analysis, and might therefore want to make weighted imputations using the same variables that have been used before to raise data to a higher level. We have not yet reached a decision in this respect.
Statistics Netherlands has just started creating an automated system for meta-information and data retrieval output. With the help of future Microlab users, prototypes will be tested and evaluated. The tools will developed in Delphi under Windows95. The data themselves will be stored in an ORACLE database, together with all the necessary linkage
information. Care will be taken to have the possibilities of exporting output data to all major statistical applications.