Purpose | Enable multiple national statistical offices (NSOs) to perform reconciliation and joint analysis on independently collected trade datasets |
Datasets | The datasets involved were originally from the UN Comtrade Datasets and are now being extended to integrate third-party data sources |
PETs used | Differential Privacy, Secure Enclaves, Secure Multi Party Computation |
Details of computation | Each NSO maintains an independent record of their international trade, such as imports, exports, re-imports, re-exports, and so on, at varying levels of granularity. The computations aimed to identify erroneous recordings of trade between pairs of countries and enable broader international trade analysis. |
Parties and trust relationship | Multiple input parties (each NSO involved, including Statistics Canada, US Census, UK ONS, Statistics Netherlands, ISTAT Italy) with shared outputs. There is no assumption of trust between parties. |
Implementation status | Proof of concept (ongoing) |
Resources |
Background
International trade information is an important data source used to better understand the flow of commodities in and out of each country, measure the level of competitiveness of a country, and track economic growth. However, these figures have been typically tracked and maintained at a national level, and as such ambiguities and errors can appear when comparing recorded levels of trade on an international level.
The United Nations has aided in these challenges for a long time, in particular via the Comtrade portal, which publicly shares international trade statistics of each country on a monthly basis following the Harmonised System of commodity categorization (H2 through H6). Comtrade data provides each country’s imports, exports, re-imports, and re-exports. As such, there is a unique pairing of data in opposite directions for each pairing of countries. For example, the United States will have recorded its exports of maize to Canada while Canada will have recorded their imports of maize from the United States. In theory, these numbers should match, although there are a number of reasons why this might not be the case. Through-trade is one such example - i.e. when a country receives goods as an intermediate stop rather than a final destination. However, even when through-trade is taken into account, disparities can still occur.
Ultimately, having a better understanding of global trade can be immensely beneficial in understanding global economic development, globalization, enabling the enforcement of trade restrictions, and reducing global money laundering to name but a few examples.
Case Study description
The goal of the project was to use privacy-enhancing technologies to share more granular information between countries and to enable the linkage of additional heterogeneous data sources. Comtrade data and the Harmonised System of classification undergo classical data disclosure controls prior to publication, thus minimizing the abilities of analysts to understand where and why disparities occur.
The starting point of this was to initially work with safe, publicly available data from Comtrade, and as systems emerged to connect and analyze the data, more sensitive data may be included. This balanced the goals of proving the usefulness of PETs to securely link and perform analysis on trade data, whilst reducing project risk and minimizing the time to kick off the experimentation.
A secondary - but important - goal of the project was to understand the pros and cons of different privacyenhancing technologies that can be used in such a setting. The experiments used two very different architectural approaches to address the problem.
The first approach used secure multi-party computation (sMPC) and differential privacy via a peer-to-peer federated data network, provided by OpenMined. In this setting, each party involved set up a node that housed their sensitive data and made requests to calculate the total value of the goods traded (imported/exported) across all the parties involved, without any party having to disclose the amount of any particular good imported or exported. The data queries did not require manual approval from a data compliance officer from any of the parties involved.
The second approach used enclave technology in combination with differential privacy. Each party was able to connect to the enclave via a secure proxy on their local device. The proxy makes an initial handshake with the enclave, authenticating the client and receiving the attestation document of the enclave which in turn guarantees the software running inside the enclave. Through this handshake, a symmetric key is also shared between the client and the enclave which enables bilateral secure communication thereafter.
While the enclave-based sMPC framework, provided by Oblivious, is reusable and highly generic, the software running inside the enclave was written especially for this case study. There were two core parts to the software. The first was the data science element which joined data from each party and applied various forms of aggregate queries. To ensure no low-level information was shared between parties, differential privacy was applied to outputs. Specifically, this was done in collaboration with the OpenDP (Harvard) and SmartNoise (Microsoft & Harvard) projects.
The second element of the software packaged the outputs into a formal PDF report for the purpose of upstream sharing. To ensure that the PDF would not be modified throughout its life cycle, it was digitally self-signed from within the enclave and the public key of the signature was embedded into an attestation document from the enclave and used to watermark the document. This encapsulation of the attestation document and the self-signed public key in the PDF allows upstream users to confirm who uploaded the data originally, the software used to process the data and that the document was not modified since its creation.
The parties engaged in this collaboration spanned the NSOs from the United States, Canada, UK, Netherlands, and Italy, with infrastructure and assistance from the United Nations Global Platform.
Outcomes and lessons learned
A second, and important outcome in the context of this document, is that there can very often be multiple privacy-enhancing technologies that can solve a specific challenge. The OpenMined PySyft framework and Oblivious enclaves offered different pros and cons which may be more suitable in different contexts. OpenMined’s PySyft enabled users to be very flexible in terms of access control management, with requests for queries being adhoc and approved just-in-time but after the initial infrastructure was created and approved. On the contrary, the Oblivious sMPC framework placed the access management controls prior to the deployment of the enclave itself.
Such differences directly stem from the range of functionalities the frameworks offer. OpenMined’s PySyft is limited in nature to deal with specific types of queries and arithmetic combinations thereof. Thus when agreeing to use the federated network users are in turn agreeing to the access control mechanism and the differential privacy and sMPC mechanisms built within.
On the contrary, the enclave-based computation can run any software that would run on a typical server. For this reason, the enclave-based solution could do advanced functionality like generating PDF documents with corresponding signatures. However, due to this wideranging flexibility, it is the internal software that requires bilateral approval.