Purpose

A cloud-based data system where Statistics Korea, leveraging its unique database assets such as the integrated statistical register, uses state-of-the-art cryptography to link data dispersed across government departments and public institutions.

Datasets

Various kinds of data. Examples include statistical registers held by Statistics Korea.

PETs usedHomomorphic Encryption, Secure Multi Party Computation, Differential Privacy
ApplicationMultiple applications envisioned
Details of computation

In the pilot project, two datasets are linked in their encrypted state, and two computations performed: the generation of descriptive aggregate statistics related to shop space and turnover, and a logistic regression to model the impact of the Covid-19 pandemic on small businesses.

Parties and trust relationshipCentral and local governments, as well as private and public companies and academic institutions (input); individuals, private companies, public organisations (output). In general, no trust relationship among the parties is assumed. However, the same entity can play multiple roles, e.g. as both input and output party.
Implementation statusPilot
Resources


Background

Statistics Korea is promoting the establishment of a public big data system that leverages cutting-edge privacy-preserving techniques to enable the safe linkage and use of scattered governmental data. Enabling safe access to linked, high-quality, large scale datasets can drive innovation that enhances both economies of scope and scale. Accordingly, Statistics Korea aims to maximise the potential value of data by facilitating the linkage between governmental data through statistical registers on population, households, and establishments.

To achieve this, Statistics Korea is promoting the development of privacy-preserving techniques such as homomorphic encryption, differential privacy and synthetic data through national R&D projects, which will be leveraged to construct the Statistical Data Hub Platform between 2021 and 2024, in cooperation with the Ministry of Science and ICT. The development of the Statistical Data Hub Platform aims to incentivise academia and industry to advance and commercialise these technologies.

Case Study description

As a pilot project, Statistics Korea linked its statistical business register with small business information from Gyung-Gi Province, and performed analysis in their encrypted state to confirm the practicality of the Statistical Data Hub Platform. The two datasets have the following common fields: name of establishment, corporation registration number, and administrative district code. By linking data in its encrypted state, Statistics Korea was able to confirm that it is possible to combine and use data without exposing sensitive information, thereby validating a critical premise of the Statistical Data Hub Platform.

Additionally, Statistics Korea tested the accuracy and efficiency of statistical analysis performed on homomorphically encrypted data. Firstly, a descriptive statistical analysis was conducted on turnover and shop space information from encrypted linked data. No difference in accuracy was found between the results of the plaintext and ciphertext analyses, and the ciphertext analysis was efficient in terms of operation time and storage space.

Secondly, a logistic regression was performed on the encrypted linked data. Independent variables were chosen related to the shop’s turnover, shop space, type of business district, amongst others (see Table X for full list of variables). These were used to predict a dependent variable representing the survival of the establishment. An approximate function was derived for the logistic regression, which is necessary to perform logistic regression analysis on homomorphically encrypted data. 

Comparing the results of the logistic regression with plaintext results verified the accuracy and efficiency of the analysis run on encrypted data.

Outcomes and lessons learned

Statistics Korea’s pilot project provided accurate results of descriptive statistics and logistic regression performed on homomorphically encrypted data. The project has also demonstrated the potential of homomorphic encryption to facilitate data cooperation between government agencies who do not necessarily have a trusted relationship.

We expect that if data linkage is encouraged through the Statistical Data Hub Platform, it will enable the production of high-quality statistics and analyses using pension, childcare, and employment data, amongst others. This could help enable timely, informed, and effective responses to important national issues. 


  • No labels