Notice that the statistical system has shifted slightly to the left; it is indeed not unusual that some concessions are done towards accounting concepts. The figure illustrates in what respects the scope of edification projects tends to reach beyond information technology: the contents of the questionnaire are affected, as well as the statistical processing and the statistical system.
In summary, edification projects, while primarily oriented towards reduction of response burden, tend to imply a process of critical reconsideration with respect to the questions whether:
=> users really need what they use to get. Obviously, this question holds in particular for data which do not show in business accounts. Putting forward this question is not mere form. The growing tendency to compile "reality-statistics" in stead of artificial constructs, has created a favorable climate for adaptation towards the perception of businesses as actors in the economic processes;
=> we should really collect all the data we need. Calculation, imputation or estimation, either on the basis of nearby or related business accounting concepts, or obtained by confrontation with data from other surveys or by statistical integration, can be a worthwhile alternative.
Next to this critical reflection on the contents of the questionnaire, a third question arises with respect to the organization of data collections. The boundaries between surveys - and consequently questionnaires - are traditionally delineated without taking into account the way accounting systems are organized within businesses. EDI more or less forces surveying statisticians to tune their surveying strategy to the latter, as we will see when discussing the project running at the moment. On the one hand, EDI logically demands integration of those different questionnaires, drawing on one and the same (sub-) accounting system. This leads to banishing one of the most irritating manifestations of response burden, i.e. redundancy, caused by overlapping questionnaires. On the other hand, EDI requires segregation of questionnaires for the completion of which data have to be taken from different (sub-) accounting systems.
2. Basic EDI modes
The nature and complexity of edification projects will vary among surveys. The most relevant discriminating factors are in this respect:
*) the distance between business accounting systems and information needs with respect to the technological and conceptual dissimilarities as mentioned above;
*) the degree of standardization of business accounting practices.
When applying these criteria, three main modes of EDI emerge:
A.) direct tapping. This mode refers to the situation as pictured in the first figure on page 6. There is no translation needed at the collection side. Edification contains the selection of the relevant items of the business accounts and installation of communication provisions. This mode applies when demand of the statistical office - as reflected by the questionnaire - fully corresponds with business accounts supply, which logically implies that all businesses use the same accounting system, both technically and conceptually. These ideal conditions seldom occur. A Dutch example is the survey on expenses by municipalities; the gap between statistical and accounting systems is fully bridged by the statisticians themselves;
B.) standardized translation. One electronic questionnaire is designed, as well as one or a limited number of standard translation modules. This mode applies when there is a moderate distance between demand and supply and when accounting practices are highly standardized. Part of The Netherlands’ labor surveys are edified according to this mode;
C.) unique translation. In case of non-standardized business accounts, each respondent has to establish its own unique translation module.
Although strictly spoken not satisfying the definition of EDI, it is nevertheless useful to add a fourth category:
D.) semi-EDI. The (technological or conceptual) distance between accounting and questionnaire concepts remains too large to be bridged by automatic translation: the electronic questionnaire is filled in manually.
With respect to the modes A and B, a further distinction applies by introducing the type of respondent as an additional criteria:
1. administrative bodies, keeping registers for other than statistical purposes.
2. book keeping offices, representing clusters of businesses;
3. individual businesses.
The sequence of this listing is not coincidentally: the higher the ranking, the more favorable the impact on response burden. Therefore, edification projects should in principle aim at attaining the highest possible level in the list.
3. EDI and response burden
The EDI mode ranked as A will bring response burden practically back to zero. For mode B, it stands to reason that the Statistical Office supplies standard software packages to the respondents. If these are provided for free - which is indeed the policy of Statistics Netherlands here too the burden will practically disappear. When mode C applies, the lack of standardization makes it unavoidable that the respondent himself does the job. This burden is, of course, of quite a different nature than the paper burden it replaces. Recurrent efforts for questionnaire completion are replaced by initial investment to implement the translation rules, followed by - probably minor - recurrent maintenance cost. The complexity of the rules and the stability of the accounting practices ánd of the statistical system will determine the size of the remaining response burden. Although difficult to estimate in advance, Statistics Netherlands is confident that the remaining burden will amount to a fraction of the paper burden only.
4. EDI and meta data
The figures show that after edification the data flow from accounting to statistical systems will become significantly more disciplined. In order to establish reliable translation rules, the respondent must be informed precisely about the meaning of the data items he has to report on. Therefore, definitions should leave no room for deviating interpretation; questionnaire items must be defined exhaustively, i.e. in terms of inclusions and exclusions of items from the book keeping records. Moreover, there may be a need for a variety of questionnaire types, each of which is designed to fit in with the language of a specific, homogeneous group of respondents.
These complex relationships on the one hand, and the growing need for coherent statistical data over the whole range of business statistics on the other hand, ask for a highly disciplined and coordinated processing of data. A central input meta data base, listing all relevant concepts and their definitions, is an indispensable tool to effectively manage and control these processes.
Actually, two meta data bases apply: next to the input meta data base there is a need for an output meta data base as well. While the former defines questionnaire concepts in terms of accounting language, the latter in turn defines publication concepts in terms of questionnaire items. Where the first supports the respondent in establishing "his" translation rules, the latter supports the statistician in setting the rules for translation of input data to output data.
At the moment, all of the surveys listed are managed by separate organization units. Altogether the seven surveys account for half a million questionnaires annually, while generating a response burden of 400 man-years. Main purpose of the projects is to eventually reduce this burden to a negligible level.
From the beginning it was obvious that the strategy could not be identical for the whole businesses population. Three broad categories are distinguished:
*) small businesses (less than 20 employees). For this group, a separate project has been started to investigate the feasibility of using an administrative registration, i.e. the Company Information System of the Taxation Office, as the data source for the statistics mentioned. In terms of the listing of EDI-modes presented in the previous section, we aim at level A1, the highest attainable level in terms of response burden relief. The main reason for this choice is that information needs for this category are supposed to deviate only slightly from the supply from this source. Full relying on accounting concepts will, however, inevitably imply some loss of statistically relevant information. This is considered acceptable; after all this category accounts for only a small share of the total economic performance;
*) large and complex businesses. Because of their tailor made software systems, and their complex unit structure, both with respect to legal and statistical units, these businesses require an individual approach. They will be dealt with in the context of the regular profiling activities by the Large Business Unit in order to keep statistical unit structures up to date;
*) middle sized businesses. This category, which makes use of more or less standardized software, is the core of the project. This core consists of two subsequent pilot surveys, the first of which has been finished in 1995 and the second is presently running. In the following we will discuss the findings of the first pilot and the progress of the second.
2. First phase: a small scale pilot
Because Statistics Netherlands has a strong preference for an incremental approach of automation projects, the EDI project started fairly modest with a small pilot among 13 middle sized enterprises, mainly in rubber and plastics industry, each with automated financial accounts. It was clear that a direct link between business accounts and questionnaire items would be unattainable. Therefore, the main purpose of this pilot was to find out whether it would be possible at all to develop a translation module, connecting business financial accounts with one electronic questionnaire which satisfies the data need of all surveys mentioned. This question had to be answered at the level of both the technological and the conceptual aspect of the translation. The results were positive enough to continue the project and to maintain the highly ambitious goals as stated above. All of the respondents welcomed the idea of edification, which they considered a promising way to substantially reduce their response burden.
Main findings of statistical interest were the following:
*) the integration of questionnaires from different surveys revealed a considerable amount of overlap or almost-overlap in concepts. It was felt that the latter would be difficult to justify when presented to the respondent in one combi-questionnaire. Standardization gave rise to some dispute. Discussions were complicated by deviating terminology, as well as the lack of precise and exhaustive definitions;
*) problems became apparent when linking the two systems with respect to product specifications. Firstly, coding and names appeared to differ, so that translation required additional efforts. Secondly, for a number of items there appeared to be a 1 : n or even m : n relation between business accounts and statistical nomenclatures;
*) it was, of course, not a surprise that not all of the data items showing in the paper questionnaires could be taken explicitly or implicitly (i.e. by translation) from business accounts. Examples were number of employees, and volume of energy use;
*) with the exception of pay roll accounts, book keeping systems proved to be poorly standardized, both technologically and conceptually;
*) moreover, pay roll accounts proved to be held rather independently from the other sub-accounts; both conceptually and physically there were only weak ties with the other sub-accounts as listed above.
The two latter findings led to the decision to leave the labor surveys out of the list of surveys to be integrated. For labor surveys, a separate EDI project was designed. This project, which can be labeled as level A2/3 (a standard translation module, preferable directed to book keeping offices) is now in the stage of introduction with respondents.
Although the small pilot gave positive findings, it was of course too early to actually introduce the EDI-concept for the whole range of businesses. Apart from its small scale, there were two more important restrictions. In the first place, the translation module was fully implemented by staff of the statistical office, while eventually the job must be done by the respondent himself. Secondly, the combi-questionnaire was based on the data needs of only part of the surveys to be integrated. In particular the statistics on balance sheets and profits and foreign trade were missing. Therefore, more testing with a broader scope was considered necessary.
3. Second phase: a broader pilot survey
At the moment Statistics Netherlands is preparing for a second pilot, to be held among 350 businesses in manufacturing, trade and services industry. Willingness to cooperate and availability of automated financial accounts are trivial selection criteria. The pilot is preceded by a field test among 22 businesses, in order to test the usefulness of the questionnaire and the software, developed for the translation module. The translation rules are to be implemented by the respondents, with full assistance on the spot by field staff. The findings will lead to introduce improvements before the real pilot survey starts.
Main purpose of this pilot, which will be held next September, is to find out to what extent respondents are able and willing to implement their translation module on their own. This does not take away that a help desk is needed, providing telephone assistance on request. A basic element in Statistics Netherlands EDI policy is that eventually respondents are assumed to take care themselves for initial implementation as well as maintenance of the translation rules. The reason for this policy is that accounting systems are not standardized, and statistical office staff is not supposed to be familiar with individual accounting systems. This policy implies that, as long as accounting systems are not standardized, response burden for the business surveys involved in the project will not completely fall back to zero.
4. A set of tools
In order to prepare for the second pilot survey, a set of tools is presently being built, which comprises:
*) the software for the translation module;
*) a system of questionnaires, so as to tune the contents maximally to business profiles;
*) a meta data base of input data items;
*) a help desk to assist respondents with implementation and maintenance of the translation rules.
Each of these tools will be discussed now briefly.
The software for the translation module
This tool is primarily of a technical nature. Therefore we confine ourselves to a few remarks. The user friendliness of the software module is of crucial importance when it comes to the acceptability of EDI by the respondent. We are well aware that he should not have any trouble at all in understanding and accessing this tool, when starting to implement the translation rules. A first version has recently been developed and tested, with positive results.
A system of questionnaires
The wish to comply as much as possible with the business accounting system ánd the fact that different (categories of) respondents use different systems led to the decision to develop a variety of questionnaire types. Indeed, the more tailor made the questionnaire, the less complex the translation rules can be. In the present stage of the project, about 20 more or less homogeneous respondent groups are recognized, and consequently 20 different questionnaires are being developed. In delineating the categories, kind of activity (SIC) and size (employees) were major criteria. A further, rather specific criteria was the organization of the financial accounting within a business. For those businesses with physically deconcentrated sub-accounting systems, the combi-questionnaire must be split up accordingly. Thus, while on the one hand EDI forces to integrate questionnaires drawing on the financial accounting systems, on the other hand some deconcentration applies for businesses holding separate sub-accounts. Anyhow, the reshuffling will result in a better tuning of surveying strategy to the organization of respondents.
While terminology varies among the 20 questionnaires, it is very important that they refer to identical concepts. After all, coherence of statistics is one of Statistics Netherlands’ major strategic issues. To ensure consistency in this respect, all questionnaires derive their data items (names, definitions and explanatory notes) from one centrally maintained meta data base. It is not allowed to apply concepts, terms or definitions which are not specified as such in this data base.
A central meta data base of input concepts
The data base contains 700 data items, to be used in the 20 questionnaires. Because it relates to questionnaire concepts and not to publication concepts, it is in fact an input data base. Each concept is laid down in a term, an (operational) definition, a question text and explanatory notes. A particular concept is expressed in as many different terms, definitions, texts and notes as there are questionnaire types using it. Next to these characteristics, also relations between different concepts are recorded. In most cases these relations are of an arithmetic nature.
In the database all data items must be listed, defined and explained in such a way that the respondent can fully understand the meaning of the questions asked. There is "full understanding" when the respondent can unambiguously connect the items questioned to his own financial account items, in order to successfully perform the translation. Therefore, definitions of questionnaire items should as much as possible be expressed in terms of book keeping items, so that no room is left for different interpretations.
Eventually, the meta data base of input concepts can be said to serve three goals, two of which refer in some way to consistency:
*) it is the central information source from which questions, definitions and explanatory notes can be selected; as such it supplies the respondents with the information needed to correctly define the translation rules;
*) by defining concepts consistently, the questionnaires selected are mutually consistent as well;
*) by linking the (questionnaire) concepts to those of a corresponding output meta data base, there is an unambiguous relation between questionnaire items and publication items.
A help-desk and a field service
As mentioned earlier, in The Netherlands the state of standardization of business accounting practices is (still) poor. For this reason alone, it will be difficult to achieve in all cases full understanding in the above mentioned sense. Therefore, a (telephone) help desk will be established to assist respondents when they have problems with implementation of the initial translation rules. For more serious problems, field service staff is available. As traditional field staff is primarily experienced in accounting and not so much in automation, special training courses have been set up.
The extent to which the help desk will be consulted can be considered a measure to determine the feasibility of data collection by EDI. As mentioned before, we hold the assumption that respondents should be able to implement and maintain the translation rules without substantial help. The second pilot will reveal to what extent this assumption is realistic.
5. First experiences
Filling the input meta data base
When filling the meta data base it would appear obvious to start from the various paper forms as a reference for discussion. However, in order to avoid laborious discussions between surveying staff and to suppress the temptation of clinging to traditional, often non consistent and redundant concepts, a more fresh approach was preferred. The discussions took place on the basis of new concepts and definitions, drafted by Standards and Methods division. These proposals were decided on in reference groups, consisting of surveying staff involved and of National Accounts specialists. The discussions in these groups also provided an excellent opportunity to test the applicability of the concepts of the new European standard for National Accounts (ESA ’95).
When developing questionnaire concepts and definitions to be stored in the meta data base, correspondence to accounting concepts was obviously the prevailing criteria. This does, of course, not mean that user needs were denied. However, only when - after ample discussion - it became clear that these needs could not be reduced to the level of accounting concepts, or be calculated or estimated on the basis of such concepts, it was accepted to define a concept independently of accounting counterparts. The latter implies that such a data item either has to be filled in by manual data entry on the electronic questionnaire (in Section B indicated as semi-EDI), or must be removed from the EDI project. Up till now, the number of items not fitted for edification is fairly small, when compared to the 700 fitting concepts. The "outcasts" mainly refer to non parametric variables, such as economic activity and book year. Energy use is a major example of a parametric variable which has been deleted from the project.
Impact of EDI on statistical concepts
The results of the discussions and negotiations in the reference group give a first indication with respect to the actual impact of edification on statistical concepts:
*) a number of (slightly) deviating concepts were standardized, in particular as a result of the confrontation and integration of data items from production statistics with data items from financial statistics. Major examples are turnover and breakdown of wages;
*) a number of duplications were eliminated , such as number of employees;
*) user needs were adapted to a certain extent, in particular with respect to national accounts concepts. An example is "wages in kind". It appeared impossible to collect or calculate data according to the ESA definition, so that National Accounts modified their concepts;
*) in particular with respect to product specifications, break downs in business accounting are often less detailed or different from those in statistical nomenclatures. Traditionally, it was up to the respondent to estimate the missing detail. Edification of the questionnaire will shift this work to the statistician. E.g., most bakeries cannot break down their turnover as required by PRODCOM. Before edification the respondents had to guess; after edification statisticians will calculate the data on the basis of input specifications;
*) the opposite also occurs: in such cases edification enables to increase the detail. Rubber and plastics industry appeared to record their products at a far more detailed level than originally asks on the paper questionnaire., We are, however, cautious in making use of such opportunities, in order not to suggest that we (mis-) use EDI to enlarge data collection;
*) in a number of cases it proved possible to bridge dissimilarities between accounting systems and statistical concepts by translation rules. Examples are the conversion of fob to cif basis of foreign trade transactions and the calculation of volume data from value data by means of unit values, to be determined by the respondent.
Of course, the solutions and decisions still have to prove their worth in practice. At the time of writing the first electronic questionnaires for the field test have been returned. Both speed and quality are encouraging.
6. Prospects
As pointed out before, different edification strategies apply for different categories of the business population. For the smaller respondents, we expect to move from direct collection to administrative registers in 1997. For the large and complex businesses, the speed of transition eventually depends on the willingness of individual respondents. The introduction starts in 1997 as well.
When the pilot, to be held next autumn among 350 middle sized businesses, proves to be successful, the electronic data collection will be introduced from January 1997. Preparations have been started for the establishment of a new surveying unit within the Data Collection and Coordination Division, where the data collection for a number of different statistics can be concentrated. It is still under consideration whether also the paper data collection for these surveys will move to this unit. The total introduction period is planned to amount to five years. Of course, respondents will always remain free in their choice between paper and electronic reporting.
Summary and conclusions
Statistics Netherlands considers EDI for business statistics as a must. Therefore, the question is not whether it will be introduced, but how and at which speed. Whereas drastic reduction of response burden is the first and foremost objective, the business world is not the only party to benefit. The direct or indirect electronic linking of the business accounting world and the statistical world will also have positive effects on the processing of data and on the quality of the output. The statistical office can increase efficiency and productivity, while users will be served better with more timely and more consistent statistical information.
In principle, every business survey should be considered with respect to its possibilities for edification. Initial incompatibility of questionnaire items with accounting concepts is no sufficient reason to abstain from edification: user needs can be reconsidered and remaining dissimilarities can be bridged by translation rules, both to link accounting and questionnaire concepts, and to link questionnaire and publication concepts. Only if neither of these solutions appears to apply, ongoing paper surveying may be considered.
Statistics Netherlands is presently running a cluster of projects, aiming at the simultaneous edification of seven business surveys. Experiences so far are positive enough to justify the expectation that from 1997 EDI can be introduced on a large scale.