Adding value to statistical data production through machine learning: potential use of machine learning for official statistics
Tuesday, 28 July 2020, 9:00-10:15 AM (EDT)
Digital techniques, such as machine learning (ML), have the potential to add value to statistical data production by not only increasing efficiency in statistical business processes but also allowing the use of new data sources
In the production of official statistics, ML techniques could be used for imputation of missing data, reweighting, prediction, and/or calibration to standard classifications. For example, sources of sampling frames, such as administrative data, censuses and other surveys, may be combined through record linkage processes by capitalizing on clustering algorithms to improve the quality of design information on the frame. ML methods, such as regression algorithms, can also be used to predict the probability of response for individual units using information for the entire sample to manage data collection efforts efficiently. These techniques offer many opportunities to improve the efficiency in producing official statistics.
National Statistics Offices are also expanding to use of non-traditional sources, such as social media data and mobility data, however, the use of these big data sources come with challenges as they may not meet the requirements of traditional quality frameworks. ML techniques, such as Deep Learning, can be used to tackle this problem, but should be used with caution since these methods can introduce biases due to their low sensitivity to outliers and erroneous data compared to that of classical statistical methods.
This webinar will:
- Showcase examples of ML methods to improve the efficiency of statistical data production;
- Highlight the use of ML techniques to expand the use of big data sources in official statistics and;
- Present challenges in using these techniques.
This webinar is part of the UN World Data Forum series and highlights the use of non-traditional methods and techniques to add value to official statistics.
Moderator: Steven Vale, Regional Adviser in Statistics, United Nations Economic Commission for Europe (UNECE)
Panelist: Jenny Pocknee, Principle Data Scientist, Methodology Transformation Branch, Methodology Division, Australian Bureau of Statistics
Panelist: Alejandro Ruiz, Researcher at National Institute of Statistics and Geography (INEGI)