United Nations World Data Forum

Composite data for the SDGs – is this the answer?

Ms Ying Zhu, Data Analyst and M&E Specialist, GCERF

Ms Sarah Le Mesurier, Country Manager, GCERF

Ms Kristen O’Connell, Governance and Partnerships Coordinator, GCERF

December 7, 2021

In 2021, GCERF developed an interactive data platform to help it better identify potential areas for programming in Nigeria. To do this, we used machine learning to aggregate data from dozens of sources and then produce indicators of fragility, vulnerability, and susceptibility to violent extremism. These indicators cut across environmental, social, political, economic, and a range of other factors. Access to such expansive data coverage raised the question: how can this be used more widely? Is a composite data index the answer for cross-SDG programming and sustainable advancement?

We developed the platform developed based on a conceptual framework of three primary analytical angles, or “pillars”: environmental fragility, social structure instability, and information sources. Each of the pillars is constructed from corresponding indicators, with the collected geospatial data available at 1 km2 level, the desired level of granularity for our analysis. We used four main types of data: satellite imagery (NASA/ESA), estimated raster (WorldPop, SIDE, IHME), geolocated survey microdata (DHS, LSMS, Afrobarometer), social media (tweets), and crowdsourced GIS data (OSM). These were processed and modelled to generate a 1km2 resolution data layer, providing a vulnerability scale at each indicator level. We aggregated the result at indicator level using principal components analysis (PCA, https://www.oecd.org/els/soc/handbookonconstructingcompositeindicatorsmethodologyanduserguide.htm) to derive weights, then used the weights for geometric aggregation to each main pillar (https://www.oecd.org/sdd/42495745.pdf). The final outputs is a normalized vulnerability score (1 – 100) at 1 km2 level for each pillar, presented through a series of choropleth maps and an interactive dashboard.

GCERF envisioned several key uses for the platform:

Enable us to better understand key socio-economic indicators of community vulnerability in Nigeria on high-resolution (1 km2) digital maps, showcase the dynamics at national, district and village level
Prioritize specific geographic location for intervention, create tailored programme design and implementation using a data-driven approach
Compare and validate indicated regions of vulnerability against expert and local advice
Regularly update data layers and incorporate new data from additional sources, create time-series analysis for continuous monitoring of community vulnerability
Monitor the effectiveness and impact of community-based programme on SDG as an evaluation tool, by observing overtime changes in pillar/indicator score in specific areas
Examine the correlations of key socio-economic indicators that affect community vulnerability and susceptibility to VE groups for more tailored intervention design at local level
Estimate potential number of people at risk of VE according to different analytical angles
Act as a data repository for global, national and local partners who work on SDG

We are currently using the preliminary results of the platform to identify areas demonstrating increased vulnerability to VE groups in the north west of the country. It has allowed us to look in detail at four states where the risk is high but the operating environment still permissive enough for preventative work to be of value. The mapping also allows us to identify those areas where our programming would now be of limited use given the extent of extremist group control.

While being able to produce an unprecedented level of granular analysis and scoring of communities, the platform also presents us with a few traps in an ultimate “data-driven” approach. How do we know our conceptual framework accurately captures all data needed to assess community vulnerability to extremist group recruitment? How do we interpret the results if they don’t match what we already know? By collapsing the more than 40 indicators and using PCA to reduce the data dimensionality, the abundance of data is reduced to single scores and can no longer be interpreted or interrogated. Any index like this relies on a combination of primary and secondary data which are often collected through different methodologies and timeframes, all subject to limitations inherent to the data environment. It is almost impossible to examine how much the actual risks and needs are misrepresented or undermined by the intrinsic data structure.

Our solution to these “traps” is to bring the human element back to the center. Data is and should be one of the driving forces behind programme planning, designing and implementation in the 21st century. However, we need to combine the collective efforts of thematic and contextual experts, global and national stakeholders, local partners and organizations across the industry to reach consensus about the data used, analysis done, the approach to interpretation and its practical application. To be able to explore other digital applications using big data, we would like to see long-term cross-silo collaboration on technical consensus building, digital tool development and applications for SDG programming.

Experience to date, and during external consultations, demonstrated that many people remain suspicious of the use of big data in this type of analysis, both in terms of its accuracy and its utility. However, there is a consensus on the need for a mixed approach, using both quantitative and qualitative data, in determining intervention priorities, the needs of the population, and in understanding any complex research topic. More accessible graphics are likely to increase engagement, which can then develop trust and result in increased ownership of this type of tool. While the dashboard is clearly useful at the donor and the national government level, a well-designed, interactive tool should also be able to be utilised by local level actors to better understand the challenges faced by local communities within the same administrative areas – these can vary substantially within a relatively small geographic area, particularly where they are a range of different ethnic groups. This has the potential to substantially increase the ability to target interventions most effectively, as well as to be able to measure the change that they have achieved.

Authors:

Ms Ying Zhu, Data Analyst and M&E Specialist, GCERF
Ms Sarah Le Mesurier, Country Manager, GCERF
Ms Kristen O’Connell, Governance and Partnerships Coordinator, GCERF

Global Community Engagement and Resilience Fund (GCERF)

The Global Community Engagement and Resilience Fund (GCERF) is the global fund dedicated to preventing violent extremism. We connect local communities to global resources, supporting grassroots initiatives that are typically out of reach for international donors and helping them thrive.

https://www.gcerf.org/

BLOG