Big Data methods for SDG indicators

Concrete examples of methods for the use of Big Data for monitoring the indicators associated with the Sustainable Development Goals (SDGs).

SDG Indicators

The Task Team on Big Data for the SDG has concentrated its effort in identifying current practices in informing the SDG indicators with the use of non-traditional data. The focus was on the indicators and methods that are currently in use and can be applied in different countries.

Please find below the list of the indicators identified so far:

Indicator 6.6.1

Change in the extent of water-realted ecosystems over time

Freshwater, in sufficient quantity and quality, is essential for all aspects of life and fundamental to sustainable development. Water-related ecosystems - including lakes, rivers, wetlands and groundwater - supply water and food to billions of people, provide unique habitats for many plants and animals and protect us from droughts and floods.

Indicator 6.6.1 tracks the extent to which different types of water-related ecosystems are changing in extent over time. The indicator is multifaceted requiring data on specific types of ecosystems as well as information on the changing state (surface area, quantity, and quality of water) of each one.

Monitoring dynamic freshwater changes across the entire surface of the planet is made possible from analysis of satellite imagery derived from Earth observations. An array of satellites continually observing planet Earth capture measurements from which different types of land cover, such as freshwater, can be distinguished. Millions of images are processed to identify and classify areas of surface water and specific ecosystem types. The satellite imagery is represented numerically and the statistical trends per ecosystem are aggregated into national country boundaries and river basin areas.

Data on permanent water, seasonal water, reservoirs, wetlands, and mangroves; as well as lake water quality is available for countries to view and download at the SDG 6.6.1 data portal. At this site, data is visualized for users on geo-spatial maps with accompanying numerical statistics displayed through informational graphics.

The full indicator 6.6.1 metadata document is available here.

  1. Space agencies (ESA and NASA) acquires and publish global satellite data from the Sentinel and Landsat satellite missions.
  2. Google ingest the satellite data into Google Earth engine and applies pre-processing to generate analysis ready data (ARD).
  3. UNEP collaborating partners (JRC and DHI) use the ARD data to generate SDG 661 sub-indicator time-series information on spatial extent of inland surface waters (lakes, rivers and reservoirs), water quality and wetlands (Note: currently there is no time-series information available on inland vegetated wetlands).
  4. UNEP and collaborating partners generate summary statics on the freshwater ecosystem’s changes over time at national, sub-national and hydro-basin levels.
  5. UNEP publish the SDG661 sub-indicator information on the SDG661.app and share the information with SDG661 national focal point for country approval every three years.

Indicator 9.1.1

Proportion of the rural population who live
within 2 km of an all-season road

The Rural Access Index (RAI) is one of the most important global indicators in the transport sector. It measures the proportion of people who have access to an all-season road within an approximate walking distance of 2 kilometers (km). There is a common understanding that the 2 km threshold is a reasonable extent for people’s normal economic and social purposes. The definition is also simple enough to understand and use not only in the transport sector, but also in the broader development context, such as poverty alleviation. The initial RAI study in 2006 was based on household surveys and other simplified methods, estimating the global index at 68.3 percent, leaving a rural population of about one billion disconnected around the world.

To assure regular updates of the index, a new method was developed to takes advantage of spatial techniques and data collected using innovative technologies. See World Bank (2016). Conceptually, the methodology is still focused on access to an all-weather road, but it emphasizes sustainability, consistency and operationally relevance of the index. In recent years, various new technologies and data sets have been developed, such as high-resolution global population distribution data. Digitized road alignment data, including road conditions, may also be available at road agencies or in the public domain. Smartphone applications have been developed to assess road roughness while driving. Some other technologies, such as high-resolution satellite imagery, also have the potential to evaluate road conditions in mass. By combining these spatial data, the RAI is virtually computed by spatial software.

See more details in Annex of World Bank (2016)
  • Obtain spatial population distribution data. The WorldPop is among the best population distribution datasets. The WorldPop data are available at http://www.worldpop.org.uk/. For each country, there may be several population values for different years, primarily depending on the availability of census data. The data closest to the year of interest should be downloaded.
  • Obtain spatial urban-rural extent data. The GRUMP data set provides a raster data of urban areas, which can be downloaded from here.
  • Compute the total rural population for each boundary (for example, country X or district Y). Overlaying the prepared polygon of rural areas and the population raster, the sum of the raster values, such as, population estimates in individual pixels, is calculated by spatial software.
  • Define the road network used to calculate RAI. Different countries have different road classification systems. There may be some differences between the official and actual road networks. It is important to agree on the scope of the road network for which the RAI is calculated for consistency purposes.
  • Prepare road network data in the vector data GIS format. The existing data may be fragmented or duplicated. Although it is not necessarily critical to calculate RAI, it is recommended to make sure that the data set is free of topographical errors, such as disconnected or duplicated features.
  • Attach road condition measurements to the road network data, if not included yet. Different types of measurements can be used, such as International Roughness Index (IRI) and visual assessment by class category (for example, excellent, good, fair, poor and very poor). It is important to establish the system of conversion from the available data to the all-season definition that is used in the RAI calculation.
  • Generate spatial areas of 2-km buffers of the “all-season” road network, with urban areas erased.
  • Compute the rural population within the buffer areas by overlaying the above 2 km buffer area (rural only) and the population raster.
  • Calculate the RAI. With this divided by the total rural population, the RAI is computed for each administrative boundary.

Indicator 9.c.1

Proportion of population covered by a mobile network, by technology

Mobile phone data is used for calculating this indicator. Mobile phone data is already generating a more pervasive data footprint in developing and least-developed countries, compared to other well-known sources of big data. Its widespread availability makes it a desirable source of globally comparable statistics. However, for a number of reasons, obtaining access to mobile phone data for use in official statistics is seen to be a major challenge. Building partnerships is crucial towards closing the capacity and privacy gaps that hinder access to mobile phone data for use in official statistics.

A project, led by ITU, was conducted in 2020 to demonstrate how big data can be used to produce internationally agreed ICT SDG indicators 9.c.1 (Proportion of population covered by a mobile network) and 17.8.1 (Individuals using the Internet). The feasibility of using MPD for both indicators was tested in both Brazil and Indonesia, one of each presented in the case studies. The collaboration has demonstrated that public and private sector organizations can work together for societal interest to leave no one behind.

This is a summary of a more detailed method description that can be found here
  1. Data access

    Before the project begins, NSOs should ensure the availability of necessary data processing infrastructure, data science skills and access to the following data:

    • mobile positioning data, i.e. metadata on calls, messages and internet usage;
    • mobile network cell locations, i.e. from the mobile network operator, the telecom regulator or crowdsourced database like OpenCelliD;
    • GIS files of local administrative units, i.e. shapefiles;
    • population grid, i.e. from WorldPop database;
    • any necessary spatial data, i.e. digital elevation model.
  2. Input quality assurance

    Check input data to ensure quality before calculations:

    • prepare and run quality assurance on local administrative unit layers;
    • run quality assurance of mobile positioning data to ensure it is reliable for the required task;
    • download and validate cell location data;
    • download and validate any external data (i.e. population grid, digital elevation model), compare with other available data, if necessary.
  3. Processing

    Implement all processing steps:

    • calculate coverage areas;
    • calculate and validate home locations;
    • calculate indicators.
  4. Output quality assurance

    Implement all processing steps:

    • NSO validates results of the calculated indicators;
    • calculated indicators are visualized to analyze regional differences.

Task Team Contacts

Statistics Denmark
Maciej Truszczynski

UNSD
Karoly Kovacs