What Will 'Data Science' Teach Us?
If the level of online discourse is a good indicator of whether a topic has penetrated the collective nerd consciousness, then the notion of a burgeoning "data science" discipline has taken hold. A few weeks ago I discussed where to draw the line on this idea, but recently I again begann thinking about the idea and term more critically. Yesterday, I had a wonderful discussion with a brilliant member of the data community here in New York, which focused on the delicate balance between keeping a human-friendly face on mass quantities of data—something the data scientists are meant to do—and having this new discipline make formidable contributions to our general understand of human behavior.
That is, up to this point, many of the great evangelists of data science have focused on telling stories with data. Science, however, is not about story telling, but about discovery. Perhaps I am particularly cautious of the suffix "science" because of the awkward self-consciousness the word has imbued in my own discipline. At its roots, political science was a discipline that sought to construct narratives; equal parts history, philosophy and personal experience. The name "political science," therefore, brought the ire of the "hard science" community, as they felt (perhaps with reason) that the word had been appended to the title erroneously, as there were no identifiably scientific aspects to the endeavor. While my discipline has come a long way in its application of the scientific method, and today can much more accurately be referred to as a science, there continues to be a delicate balance between discovery and story telling. What, then, can the data science community learn from this experience?
Broadly, all disciplines are measured by their contributions to our understanding of the universe. Data science—by design—is the product of measured human activity, and therefore should seek to provide new insight into human behavior. Unfortunately, the current focus of many of the community's members has been a self-congratulatory appraisal of the tools that have been developed to allow for this large-scale measurement and recording. To be a successful discipline, however, the focus must move away from tools and toward questions.
To paraphrase a famous nerd, with great data comes great responsibility; so to begin, the data science community must ask: what questions do mass quantities of measured human existence allow us to address that were never previously possible? Just the thought should be enough to inspire some to begin writing research proposal, but in effort to contribute to this discussion here are a few things I hope data science will teach us:
- How do online discourses manifest in offline behavior? - I study terrorism, and one lingering problem in this area is the threat from so-called online radicalization. That is, to what extent does information obtained online influence individuals to join radical organization or commit acts of terror? This question, however, applies to many other areas, such as voting and purchasing decisions. As our ability to analyze these discourses increases, perhaps data science will provide some answers.
- How do we reach the "tipping point"? - Malcom Gladwell did well to introduce the idea of the tipping point, but since then we have learned too little about how these culminations occur, and what—if any—are the consistent behavioral features that lead to them. Often, these events occur online, where data science may be able to analyze the tracks that lead to these phase shifts.
- What are the ethical limits of personal data analysis? - The rise of massive stores of personal data online has been a boon to the data science community, but it has not come without some trepidation. With intimate knowledge of the tools and processes used to capture and analyze this information, this community is uniquely positioned to contribute to a discussion of the ethical limits of their own work.
- Do we really consumer things differently? - Everyday people make decisions about what they will consume; in terms of purchases, food, information, etc., and conventional wisdom states that these decisions are largely a function of birth cohorts, geography, educations, etc. Is this really case? The vast amount of consummatory data being generated online may be able to help us understand the most significant indicators of these differences.
- Can more/better data explain rational irrationality? - Today we learned some of the limits of behavioral economics, which have helped explain instances of seemingly irrational behavior. As the op-ed points out, however, there continue to be many questions that discipline fails to explain. Perhaps, then, the explanations of these anomalies can be borne out of data.
I welcome your own thoughts on what data science will teach us, and hope you will share them. Personally, I think this discipline has the potential to generate vast amounts of knowledge, but must be cautious to not loose sight of the question in the sea of information.