Over the years, countless systems that do not talk to one another have been created within and across organizations for the purposes of collecting, processing and disseminating data for development. With the proliferation of different technology platforms, data definitions and institutional arrangements for managing, sharing and using data, it has become increasingly necessary to dedicate resources to integrate the data necessary to support policy-design and decision-making.
Interoperability is the ability to join-up and merge data without losing meaning (JUDS 2016). In practice, data is said to be interoperable when it can be easily re-used and processed in different applications, allowing different information systems to work together. Interoperability is a key enabler for the development sector to become more data-driven.
In today’s world, people’s expectations are for greater interconnectivity and seamless interoperability, so different systems can deliver data to those who need it, in the form they need it. Data interoperability and integrationare therefore crucial to data management strategies in every organization. However, teams and organizations are often overloaded with day-to-day operations, and have little time left to introduce and adopt standards, technologies, tools and practices for greater data interoperability. Within the process of devising such strategies, exploring and adopting conceptual frameworks can help practitioners to better organize ideas and set the scene for the development of more tailored, detailed, and interoperable approaches to data management.
Interoperability as a conceptual framework
Interoperability is a characteristic of good quality data, and it relates to broader concepts of value, knowledge creation, collaboration, and fitness-for-purpose. As one of the interviewees in The Frontiers of Data Interoperability for Sustainable Development put it, “the problem with interoperability is that it... means different things to different people.” (JUDS 2016, 5). Part of the reason for this is that interoperability exists in varying degrees and forms, and interoperability issues need to be broken down into their key components, so that they can be addressed with concrete, targeted actions.
Conceptual frameworks help us to consider interoperability in different contexts and from different perspectives. For instance:
- from a diversity of technological, semantic, or institutional viewpoints, recognizing that interoperability challenges are multi-faceted and manifest in different ways across scenarios and use cases; and
- within the context of the data value chain, as well as within the context of broader data ecosystems.
Figure 1: Data Commons Framework
Following the Data Commons Framework devised by Goldstein et al (2018), we can split out the concept of interoperability into a number of narrow and broad layers that relate to standardization and semantics respectively. These layers can help in the development of projects, plans, and roadmaps to better understand interoperability needs at various points and can be summarised thus:
- Technology layer: This represents the most basic level of data interoperability, and is exemplified by the requirement that data be published, and made accessible through standardized interfaces on the web;
- Data and format layers: These capture the need to structure data and metadata according to agreed models and schemas, and to codify data using standard classifications and vocabularies;
- Human layer: This refers to the need for a common understanding among users and producers of data regarding the meaning of the terms used to describe its contents and its proper use (there is an overlap here with the technology and data layers, in that the development and use of common classifications, taxonomies, and ontologies to understand the semantic relationships between different data elements are crucial to machine-to-machine data interoperability);
- Institutional and organisational layers: These are about the effective allocation of responsibility (and accountability) for data collection, processing, analysis and dissemination both within and across organizations. They cover aspects such as data sharing agreements, licenses, and memoranda of understanding (see Annex B for more detail on legal frameworks).
These various ‘layers’ of interoperability are explored throughout the Guide and manifest in various ways. They also provide a useful frame of reference when thinking about interoperability needs at a systemic scale; as the example in Figure 2 demonstrates.
Figure 2: A User-Centric Approach to Interoperability and Open Data
Many National Statistical Offices (NSOs) are now adopting open data policies that authorize and facilitate the reuse of their statistical products, including sometimes the datasets relied upon to produce them. When thinking about how to openly publish data, it is crucial to identify the different needs of the various audiences that are likely to want to use that information.
For instance, analysts may want to access multiple datasets in machine-readable format, so they can be easily fed into statistical models to test hypotheses or make predictions. Similarly, application developers may want to use Application Programming Interfaces (APIs) that provide online access to data in standardized, open formats, so they can build interactive dashboards, maps and visualizations.
In contrast, journalists and policy makers are more likely to want to access the data in human readable formats such as tables, charts and maps, and to appreciate the ability to explore and discover related data and information using web search engines.
Each of these prospective use cases requires data interoperability at various junctures.
Journalists and researchers, on the other hand, are more interested in the ability to analyze, group and compare various datasets along meaningful categories. In other words, they are interested in the semantic coherence and comparability of data. This requires ensuring that the published data conforms to standard methods, definitions and classifications across countries and institutions.
Underpinning these use cases is a need for clear and agreed rules for accessing, using and re-using data from different sources. In this context, ‘reuse’ requires data to be interoperable not only from a technical perspective, but also from a legal and institutional perspective (including so-called ‘legal interoperability’, which forms the basis for cross-jurisdictional data sharing and use).
Another point that it is important to keep in mind when thinking about interoperability is that maximum levels of interoperability are not always desirable and can in fact be harmful or even unlawful (e.g., if they result in the unintentional disclosure of personal data). Before any decisions can be made on the degree to which a dataset should be made interoperable, careful consideration should be given to what the intended and anticipated use case of a dataset or IT system will be.
“One of the primary benefits of interoperability is that it can preserve key elements of diversity while ensuring that systems work together in ways that matter most. One of the tricks to the creation of interoperable systems is to determine what the optimal level of interoperability is: in what ways should the systems work together, and in what ways should they not?” (Palfrey et al 2012, p 11).
Interoperability across the data lifecycle and value chain
A useful framework for development practitioners seeking to better understand how interoperability can add value to data is the idea of the Data Value Chain (Open Data Watch 2018), which highlights the role that interoperability plays in binding together its various components.
Figure 3: The Data Value Chain
Within this model, interoperability is explicitly referenced as part of the processing stage of data collection; for example, ensuring that the right classifications and standards are used to collect and record data from the outset or that the individuals tasked with collecting data have liaised with counterparts in other organizations to define how they will capture and store it. The message here is two-fold: on the one hand, planning for interoperability during the data collection stage of a dataset’s lifecycle is an important part of thinking about prospective use cases down the line. At the same time, how datasets are used should also inform what steps are taken at the data collection stage so that needs are anticipated, and processes optimized. Interoperability, therefore, should be linked both to data collection and use within programmatic cycles, and this should be reflected in organizational practices and data management plans that cover the full breadth of the value chain.
Data interoperability and the SDGs
In 2015, all UN member states adopted 17 Sustainable Development Goals (SDGs) as part of the 2030 Agenda for Sustainable Development (the 2030 Agenda), which spans socio-economic development, environmental protection, and tackling economic inequalities on a global scale. The unprecedented scope and ambition of the 2030 Agenda requires the design, implementation and monitoring of evidence-based policies using the best data and information available from multiple sources – including administrative data across the national statistical system, and data ecosystems more broadly. In this context, the Data Commons Framework introduced in the previous section can help us to understand the nature of the many data interoperability challenges that need to be addressed to support evidence-based decision making to achieve the SDGs.
Because the sustainable development field is global – the ‘indivisible’, ‘holistic’, and ‘universal’ dimensions of the 2030 Agenda are some of its core attributes – it is not enough for development professionals to have a common understanding of the language of sustainable development. Government Ministries Departments and Agencies (MDAs), National Statistics Offices (NSOs), intergovernmental organizations (IGOs) including UN agencies, non-governmental organizations (NGOs), and other interest groups all need to interpret and share data and information in a way that is logical and makes sense.
Sharing data and information in the sustainable development field necessitates having a common understanding of the semantics used by all groups of stakeholders involved. For example, to understand the impact of climate change on macro-economic trends, development economists must learn and understand the meaning of a host of scientific terms to understand how a changing climate will impact economic indicators. Similarly, as the fields of statistics and data science edge closer together, statisticians are having to learn whole new vocabularies and concepts that will help them disseminate and share their products in new ways. For instance, ensuring that statistical data can be presented online on interactive maps combined with data gleaned from satellite and other observational and sensory sources and sometimes further reinforced by perceptions data generated by citizens themselves (so-called citizen-generated data, or CGD). Common ways of organizing data, and information are needed to enable the exchange of knowledge between policy makers and development practitioners.
Another component to realizing effective data sharing, and particularly common semantics, is the use of industry standards. Across a number of sectors, there are both information models and accepted terminologies/coding systems, which provide the semantic foundation for the sharing of information. Key to this sharing is the ability to not only share labels, but to maintain consistency of meaning, particularly across organizations or national boundaries. For example, within the healthcare domain, terminologies such as the Systematized Nomenclature of Medicine (SNOMED) provide millions of well-defined concepts and their interrelationships, reducing ambiguity in the practice of clinical medicine and documentation of patient observations in their medical records (U.S. National Library of Medicine 2018).
From a data perspective, the SDG data ecosystem is characterized by several tensions:
- between global and local data needs – for instance between globally comparable statistics and disaggregated data that is compiled for local decision-making;
- between top-down data producers (such as UN agencies or multilateral and bilateral development entities) and bottom-up ones such as small civil society organizations or local companies;
- between structured data exchange processes, such those based on the Statistical Data and Metadata eXchange (SDMX) suite of standards, and more organic processes, such as informal in-country data sharing between development actors; and
- between data producers and users from sectoral (health, education, etc.) and cross-cutting (gender, human-rights, partnerships) domains.
Within this complex matrix of processes and different levels of capacity and resources available for investment in data, coordination is key. In situations where coordination is weak, information systems and data platforms often do not share common goals and miss opportunities to create synergies and coherence. For example, even within individual NSOs or government MDAs, different IT solution providers contracted separately as part of different programmes of work or donor-sponsored projects may end up creating siloed information systems that produce architectures and datasets that are not interoperable and whose data outputs cannot be integrated with each other. This is a common challenge that directly inhibits the efficient production and processing of data needed to achieve and monitor the SDGs. Resolving the problem requires a coordinated approach and set of common guidelines across governments that consider interoperability from the outset when it comes to the procurement of IT solutions. This requires ensuring that data management and governance principles become integral components of organizational strategies and business processes. At a more systemic level, it may also mean taking a leaf out of the book of international standard development organizations such as the World Wide Web Consortium (W3C), the International Organization for Standardization (ISO), the Open Geospatial Consortium (OGC), and others.
In sum, interoperability is an approach that can help the development sector leverage the potential of data to increase the socio-economic value of the outcomes it is working towards. A data governance framework is needed to ensure interoperability exists between the data we collectively need to collect to both inform policies to achieve the SDGs and measure our progress is doing so.
 While the focus of this Guide is on data interoperability, it is also important to highlight its close connection to data ‘integration’ which is the act of incorporating two or more datasets into the same system in a consistent way. Data integration is one of the possible outcomes of data interoperability.