Carl Shan home my books quotes

Week 4: Bringing Humans To The Data

30 June 2014 - Chicago

This is the fourth in a series of posts chronicling my reflections on participating in the 2014 Data Science for Social Good Fellowship at the University of Chicago. You can read the last post here:

To get automatically notified about new posts, you can subscribe by clicking here. You can also subscribe via RSS to this blog to get updates.

Last week, my team spent the bulk of our time visiting a series of different locations in Chicago. We spent time at a social services community center, a homeless shelter and the headquarters of an anti-poverty non profit.

Although each of the visits was remarkably distinctive in terms of atmosphere and people, they were all motivated by the same underlying motivation: to bring the human context to the data.

From each visit, our team was exposed to a series of insights and understandings about the problem we were helping these organizations tackle. Each visit helped me build greater empathy for the human beings behind the numbers.


Empathy is the ability to understand the feelings, beliefs, values, ideas and worldviews of other people.

I believe that a great deal of problems in the world stem first and foremost from a lack of empathy.

In my personal life, I notice that conflict and discord arise more frequently through muddled communication than through malicious intent.. In most conflicts, neither party truly seeks to harm the other, but can’t brush aside enough of their pride to apologize or admit wrongdoing. Misunderstanding and subsequently refusing to understand each other, both parties cast themselves as each other’s enemy.

Similarly, in theology, each of the seven deadly sins are incubated by a self-idolatry that seduces us into an indulgent dismissiveness of others.

In politics, misunderstanding and imperialism rampage when more effort is put behind sharpening weapons to fight than sharpening minds to understand.

Even in technology, where engineers are often caricaturized as emotionless machines, empathy prevails as the superior strategy. Entrepreneur, hacker and investor Paul Graham wrote famously[1]:

Empathy is probably the single most important difference between a good hacker and a great one. Some hackers are quite smart, but when it comes to empathy are practically solipsists. It’s hard for such people to design great software, because they can’t see things from the user’s point of view.

I found that our visits to the various locations, speaking with various social workers, understanding the context the data was collected in, and seeing the constraints they operated within brought concreteness to abstraction. What before were just a sequence of rows in a table became tiny stories, each a brief glimpse into a slice of someone’s life.

I found myself much more motivated to work on the problems that I could now directly connect with being part of someone else’s life.

Seeing In Higher Dimensions

As a data scientist, beyond generating empathetic awareness, spending time with the people behind the data I’m working with has a high degree of impact on the quality of work I produce as well.

In short, hearing the stories and details uncovered during my teams’ visits added another dimension to our data set. Literally.

In linear algebra, there exists the mathematical notion of a basis. Put simply, a basis is a set of vectors that encodes all the information in a particular dimension. For example, the vectors lying on the X and Y axis are sufficient to encode all information in two-dimensions – any point in 2D can be described by how far it is along the X and Y axes.

In order for a set of vectors to encode the most amount of information, each vector to be sufficiently different from each other. Otherwise you have vectors that are so homogenous that it creates an “echo chamber” effect: the vectors look enough alike that they all repeat each other, bringing nothing unique to the table. In order to increase the amount of information a set of vector can encode (thereby also increasing the number of ‘dimensions’ that set of vectors is said to represent) you need a vector that is so different, it juts out perpendicular to the rest.[2]

Similarly, I believe there is an analogous extension to the real world.

Our visits brought forth a human context to the data that clarified why some outliers existed, when some rows had more nulls than others and also seeded some initial hypotheses that our team could then analyze. The new set of information we received from those we spoke with added complexity to the data, but it also contextualized it.

When I have only my limited set of understandings about the world, I cannot conceive of those that are outside of the span of my worldviews. The set of vectors that represent my knowledge and experiences simply isn’t enough to accurately capture the full complexity of the world.

However, by adding other individuals’ perspectives, my basis increases in size. By talking with friends who vehemently maintain that technology is eroding human civilization, for example, I am able to come to further clarity on their viewpoints and see sides of arguments I was blind to before.

As a data scientist, I feel that I begin to see in higher dimensions when I add the human element.


[1] Hackers and Painters

[2] A friend described to me how you could think of the acquisition of knowledge as trying to approximate an infinite-dimensional space with a finite number of vectors. No matter how many unique and orthogonal vectors you have, you will never know everything.

Thanks to Vrushank Vora and Michael Lai for giving feedback and comments on this essay.

I write posts about data science applied to social causes. If you want to be notified when my next reflection is published, subscribe by clicking here.