Carl Shan home my books quotes

How Data Science Can Be Used For Social Good

08 Jan 2015 - Chicago

To get automatically notified about new posts, you can subscribe by clicking here. You can also subscribe via RSS to this blog to get updates.


Give Directly

Credit: Google Images

In 2013 Kush Varshney, a researcher at IBM, signed up through a non profit called DataKind to volunteer his technical skills on assisting pro bono projects. DataKind's flagship program, DataCorps, assembles teams of data scientists to partner with social organizations like governments, foundations or NGOs for three- to six-month collaborations to clean, analyze, visualize and otherwise use data to make the world a better place.

Kush, who holds a PhD in electrical engineering and computer science from MIT, was promptly contacted by DataKind to work on a project with GiveDirectly. He was joined by another team member, Brian Abelson, himself now a data scientist at an open data search company. The two of them were brought together to tackle a challenging problem for a non profit called GiveDirectly.

GiveDirectly conducts direct cash transfers to low-income families in Uganda and Kenya through mobile payments. These donations are given with no strings attached, trusting that the poor know how to best use the money effectively. One of the top-rated charities on GiveWell, GiveDirectly has had randomized controlled trials conducted evaluating the effectiveness of its approach, with strong positive results.

GiveDirectly’s model is to conduct direct cash transfers to villages with large number of residents in poverty. However, to assess which villages these are, the organization relied upon staff members to individually visit villages in Uganda and Kenya and assess the relative poverty of the inhabitants.

When I spoke with Kush he described some drawbacks of this method, saying, “This method could be costly in both time required to visit each site, and in using donations to help pay wages for inspections that could otherwise be going directly to the poor.”

Together with GiveDirectly, Kush and Brian sought a better way to accomplish this task.

Enter data science.

What Is Data Science?

Data Science Venn Diagram

Credit: Drew Conway - The Data Science Venn Diagram

Data science is an emerging discipline that combines techniques of computer science, statistics, mathematics, and other computational and quantitative disciplines to analyze large amounts of data for better decision making. The field arose in response to the fast growing amount of information and the need for computational tools to augment humans in understanding and using that data.

Rayid Ghani, Director of the Data Science for Social Good Fellowship and former Chief Scientist for Obama, noted that “the power of data science is typically harnessed in a spectrum with the following two extremes: helping humans in discovering new knowledge that can be used to inform decision making, or through automated predictive models that are plugged into operational systems and operate autonomously.” Put plainly, these two ways of using data can be summarized as turning data into knowledge, or converting data into action.

Chiefly responsible for wrangling findings and crafting models using the data is an emerging profession: the data scientist. The “scientist” portion of the title conjures a vision of academia, partially as a result of many data scientists holding advanced STEM degrees, but it also paints a false picture of a data scientist as someone holed up in the research lab of an organization tinkering away on esoteric questions. This view of the data scientist characterizes peering into the depths of “Big Data” in pursuit of knowledge.

Rayid debunks this myth, saying that “frequently, however, the challenge in data science is not the science, but rather the understanding and formulation of the problem; the knowledge of how to acquire and use the right data; and once all that work is done, how to operationalize the results of the entire process.” Accordingly, the real role of a data scientist should be thought of as much more embedded in the core of a company or non profit, directly shaping the scope and direction of the organization’s products and services.

The handiwork of data scientists can be found in a plethora of products we interact with every day. Facebook uses data from each visit to tailor the posts you see in your News Feed. Amazon takes account of what you’ve purchased to recommend other items for purchase. PayPal roots out fraudulent behavior by analyzing the data from seller-buyer transactions.

So far, most of the uses of data science have been towards business objectives. The technology, financial services and advertising industries are rife with opportunities to convert data into profit. But now, more and more innovative social sector organizations like GiveDirectly are catching on to how technology and data science can be used to solve their problems.

Organizations like Rayid’s Data Science for Social Good Fellowship, Y Combinator-backed nonprofit Bayes Impact, and DataKind are popping up to fund, train and deploy excellent data scientists to tackle pressing social issues.

Data Science In Action

In the case of GiveDirectly, Kush and Brian were tasked to use their computational data science skills to help discover where the poorest villages were located, so that donations could be channeled to households with the highest needs.

To do this, Kush and Brian used GiveDirectly’s knowledge that an indication of the poverty of a household is the type of roofing of their home. Kush told me that in Kenya, “poorer families tended to live in homes with thatched roofs. On the other hand, a home with a metal roof typically meant the family was well-to-do enough to purchase a more sturdy shelter.”

Thatched vs. Metal Roofs

Credit: GiveDirectly

Using this knowledge, Kush and Brian used Google Maps to extract satellite images of the various villages in Kenya and deployed an algorithm that used the coloring of the roof to determine whether it was made of metal or straw. Doing this across all of the houses in the village could gave an estimate of the level of poverty in that village.

In early 2014, GiveDirectly piloted this algorithm to detect poverty levels in 50 different villages in Kenya. It was doing so in one of its largest campaigns, moving $4 million to households all over western Kenya.

By employing Kush and Brian’s algorithm, GiveDirectly eliminated over 100 days of manual inspection of each village. Through doing so, over $4,000 was saved, allowing GiveDirectly to fund four more households.

Excited by the potential of data science playing a role in more effectively help families escape poverty, GiveDirectly is now discussing with Kush, Brian and DataKind to see how their algorithm can be used even more precisely, and scaled to additional villages.

Potential To Build The Future

As an increasing volume of information is generated by the world, there will be more opportunities to apply data science towards socially meaningful causes. What if we could help guidance counselors predict which students were the most likely to drop out, and then design to successful interventions around them? What if we improve parole decisions, reduce prison overcrowding and lower prison recidivism?

Examples of how data science can be applied to the social sector include:

It’s clear that we can be optimistic about how data scientists can use the data at their fingertips for social good. As an emerging technological frontier, data science is in a position of immense potential. As a result, there is much to explore about how we can use it to push the human race forward.


Targeting direct cash transfers to the extremely poor (2014), Kush Varshney and Brian Abelson

I write about data science applied to social causes. If you want to be notified when my next post is published, subscribe by clicking here.