B.  Imputation for filling data gaps and for data editing purposes

17.8.         Data gaps may arise in preliminary stages of compilation for various reasons, including non-responses from establishment surveys, a lack of timely reporting or missing entries in secondary data sources. Compilers should consider taking the following steps to identify data gaps, missing replies, and suspect outliers, and impute data:

(a) The first step is to check and load data from primary and secondary sources. The data should be standardized to fixed formats in advance (produced in separate processes). When loading data, checks on codes, the completeness of necessary fields and value ranges should be run automatically. For primary source data, or directly reported data, only incorrect records should not be loaded.  For secondary data sources, or data that are not submitted directly by the reporter (including administrative sources), the complete file should not be loaded until reviewed and fully corrected;

(b) The second step in imputing data gaps and missing replies is the integration and processing of all data. After the data files have been checked, corrected and loaded, as described above, the data should be tabulated and processed. Data that are missing or otherwise not reported by survey respondents can be imputed using statistical procedures such as the mean value reported by respondents with similar characteristics or the average change for such variables from the prior period. Imputation procedures are especially important for universe (benchmark) surveys that are designed to provide results for the entire population and that form the basis of future annual or quarterly surveys.  In some cases, missing values, if not provided by respondents during regular follow-up procedures, can be obtained from financial statements and commercial databases;

(c) The third step is to check the data, which involves data analysis. As part of the analysis, significant increases and decreases in time series, e.g., of imports or exports, also as a share of net exports, of a particular service or with a particular country (or group of countries), are examined. The analysis should be done step-by-step, using a top-down approach, meaning starting from total services with the partner to more detailed levels of imports, exports and net figures. The aim of macroediting is to focus on suspicious values that influence publication totals. It also leads to gains in efficiency. Analysis is also done to trace a dubious enterprise or source that may need to be edited and possibly adjusted. A similar approach can be used for other statistics on the international supply of services;

(d) The fourth step is the editing of primary and secondary sources.  After having traced an enterprise or other source with dubious data, the unit or source should be assessed by the editing team and contacted by phone or e-mail to edit or adjust the data, if deemed necessary. That will be done depending on the issue identified and the result of the investigation. Various methods can be used to adjust or edit, for example using an average of previous periods or  growth rates for available data, etc.

17.9.        Another issue related to filling data gaps is the application of a threshold for reporting/collecting data. In many ITRS, thresholds are established for reporting transactions to prevent undue reporting burdens and processing costs, which could result in the absence of a considerable number of transactions, especially small-value service transactions.

17.10.        Different approaches to dealing with transactions falling below the threshold could be applied, such as carrying out a small, ad hoc sample survey, or using such other sources as credit card data. Also, estimation can be carried out through an analysis of small transactions that occurred before the threshold was raised, or through empirical research.

17.11.        For example, as described in the country experiences of chapter 8, research conducted in Japan suggests that the frequency of transactions increases exponentially as the value of transaction decreases and that, statistically, a Pareto distribution fits the data well.[1] Compared to using information on transactions from before a threshold increase, that approach to estimating below-threshold transactions offers the advantage of the estimations can be updated periodically with recent data. Further, statisticians are able to choose another statistical approach if the implemented method does not fit the data well.


Include page:

Country experience: Netherlands (Chapter 17)


Next: C. Forecasting, back-casting and revising time series

[1] Japan will employ statistical methods starting with the implementation of BPM6 (see chap. 8,  paras.  8.25-8.29 of the present Compiler’s Guide) for the full description of Japan’s country experience with its ITRS.