Data for sector-wide change

Jake Porway, DataKind Founder & Executive Director

This year’s UN World Data Forum finds us at a time of great unrest and confusion. We are in the grips of a global pandemic, and it demands that we have reliable information at our fingertips and lightning speed efficiency. We need data and algorithms on our side where they can be helpful to better understand the state of play and to react quickly. However, we know that no single party nor any single data science project that can make a difference alone. Government agencies, corporations, and NGOs need to work in concert, datasets need to be shared across multiple parties, and algorithms and models need to apply to many different contexts to be effective. For example, It is not enough to use data to track how one hospital’s ventilator supply is changing. We must instead look at how data can predict ventilator availability across many hospitals, based on data from many agencies, and influenced by the supply from many companies. The question before us then is, “how might we use data science and algorithms to not just solve individual needs, but to support entire sector-level problems?”

The UN World Data Forum is the perfect arena to tackle such challenges, as this conference focuses on multidisciplinary partnerships. At DataKind, a nonprofit dedicated to harnessing the power of data science and AI in the service of humanity, we’ve been innovating a new collaborative model of data science innovation called Impact Practices. The idea behind Impact Practices is that we believe that multiple data science projects within the same issue area could make dramatically more progress than a set of disconnected projects. For example, if we care about training more nurses to support COVID relief efforts but want to understand what training activities are most successful using data science, we believe we could learn more by attempting to solve this problem with multiple clinics, hospitals, and NGOs than by just trying it with one alone. Moreover, running a cohort of projects does more than just improve the model or algorithm you get in the end—it also helps understand what data is available at each organization, how business processes differ, and it bonds organizations working in the same space together in their data journey.

You can learn more about the results from our first Impact Practice in Frontline Health System here, but I want to share three things we’ve learned in building systems-level data solutions for social impact in the hopes that some UNWDF participants can benefit:

  • Map before you model: One of the most important phrases in data science is “start with the problem, not the data.” This line implores data science project owners to understand what problem they’re really solving with the user before they get seduced by the datasets they have available. This design principle applies tenfold in building data science solutions for entire sectors, as now the goal is to find a solution that can apply to many organizations’ problems. At DataKind, we spend time interviewing multiple stakeholders in the space to create systems maps that detail how problems interlock and how activities interrelate. For example, in our work with frontline health systems, data scientists and machine learning engineers were first seduced by the data and suggested projects around using satellite imagery to identify homes in need of health services. After interviewing over 50 organizations to map their activities, we found that much more mundane engineering solutions, like digitizing paper forms, would yield much more impact than any flashy satellite imagery work.

  • Sector change is a team sport: It’s no surprise to anyone following the systems change discussions over the last five years to hear that multidisciplinary teams are critical to success. However, we observe that many collectives still tend to defer to technologists in the room when it comes to solving data problems on account of the specialized knowledge they have. We work to alleviate that dynamic by ensuring that facilitators exist in every engagement, sometimes dubbed “bilinguals” by groups like GovLab or Schmidt Futures, who create a welcoming space for technologists and social sector experts alike. Their key skills involve explaining jargon, giving equal floor time to all folks, and ensuring that everyone plays to their strengths.

  • Embrace the journey, not the destination: When building social sector data science solutions one often focuses on the algorithm, model, or analysis at the end. While of course we hope that Impact Practices will result in moonshot solutions that entire sectors can make use of, the data capacity building and lessons learned are often just as valuable. For example, during our first Impact Practice on Frontline Health Systems, we attempted to build a model that would optimize nurse training programs. While the ultimate model had some success, what was far more helpful was learning about all of the data processing needed to get there. The result was that the organization we worked with was able to very clearly write job descriptions for more data scientists for their team, something that would have been difficult to do accurately before the project. Moreover, the tool we built to organize the data seems to have wide applicability on DHIS2 deployments, which could create even greater impact at scale than the original project. Many folks consider it a failure if they don’t reach the initial project goal, but we emphasize the value of all of the data capacity building that can happen along the way.

The world is facing incredible complexities, and we know we can only solve them together. During this 2020 Virtual UN World Data, we are eager to explore all the ways that multidisciplinary teams of social sector actors and technologists can make progress on bigger problems than any of us can tackle alone.