The Shades of TIME Project

April 20, 2012 by Drew Conway

A couple of days ago someone posted a link to a data set of all TIME Magazine covers, from March, 1923 to March, 2012. Of course, I downloaded it and began thumbing through the images. As is often the case when presented with a new data set I was left wondering, "What can I ask of the data?"

After thinking it over, and with the help of Trey Causey, I came up with, "Have the faces of those on the cover become more diverse over time?" To address this questions I chose to answer something more specific: Has the color values of skin tones in faces on the covers changed over time?

I developed a data visualization tool, I'm calling the Shades of TIME, to explore the answer to that question.

The process for generating the Shades of TIME required the following steps:

Using OpenCV to detect and extract the faces appearing in the magazine covers
Using the Python Image Library to implement the Peer, at al. (2003) skin tone classifier to find the dominant skin tone in each face
Designing a data visualization and exploration tool using d3.js

The code and data are all available at my Github. Instructions for how to use the tool to explore the data are available at the tool page itself. It is worth checking out just as a fun way to explore the TIME Magazine covers.

I have two primary observations from exploring the data. First, it does appear that the variance in skin tones have changed over time, and in fact the tones are getting darker. Most of the first quarter of the data are hard to interpret because TIME was still largely using black and white images, and when they did use color it was often artist's renderings of portraits. The interpretation of skin tone in drawings is difficult. Around the mid-1970's, however, there appears to be an explosion of skin tone diversity. Of course, there can be many reasons for this, not the least of which may be improvement in photo and magazine printing technologies.

Second, and much more certainly, is TIME has steadily increased the number of faces that appear on their covers over time. As you scroll through the visualization you will quickly notice the number of faces per cover increase from one, to a few, to many in the 1990's through 2010's. Whether this is the result of a desire to show a more diverse set of faces, or increase their marketing appeal on newsstands, or both; is completely unknown.

But, as with most data projects of this nature the resulting tool generates more observations than questions. Perhaps the most important is how brittle the out-of-the-box face detection algorithms were. As you click through the tone cells you will notice that many of them do not correspond to a face at all. As such, it may be difficult to interpret any of this as relevant to the motivational question. That said, in aggregate there are many more faces than there are false-positives, so the exercise still seems useful.

Code for Machine Learning for Hackers

February 16, 2012 by Drew Conway

With the release of the eBook version of Machine Learning for Hackers this week, many people have been asking for the code. With good reason—as it turns out—because O'Reilly still (at the time of this writing) has not updated the book page to include a link to the code.

For those interested, my co-author John Myles White is hosting the code at his Github, which can be accessed at:

https://github.com/johnmyleswhite/ML_for_Hackers

Please feel free to clone, fork, and hack the repository as much as you like. As we mention in the README, some of the code will not appear exactly as it does in the text. This happens for two reasons; first, because some minor formatting changes had to be made to fit the code into the book; and second, some of the code has been updated or edited to remove typos and minor errors.

We hope you find the code a useful supplement to the text!

My Setup

October 01, 2011 by Drew Conway

One way increasing your productivity is to see how other people get their work done. The blog The Setup provides this by asking, "What do people use to get stuff done?" If you would like to compare setups with an exceedingly eclectic group of people, than this is a very interesting resource.

If, for some reason, you are curious what I use to get my work done, you now have the opportunity to compare your setup with mine.

Create an animated clock in R with ggplot2 (and ffmpeg)

August 12, 2011 by Drew Conway

Because it's Friday—and I needed to create this for a separate visualization—here is how to create an animated clock in R using ggplot2

I think this is a nifty way to show time elapse, rather than the windowed timelines I had used previously. Enjoy.

P.S. I will be officially off the grid starting now until mid-September. See you on the other side!

Data use policies and social media: an appeal

August 08, 2011 by Drew Conway

After my post a few weeks ago lamenting Twitter's data use policies, many people reached out to me supporting my position and asking what they could do to help. One person was Mark Huberty, a fellow political scientist at UC Berkeley. Mark mentioned that there were many other social scientists who had similar experiences and were worried about its ramifications for research.

We decided to the best way to proceed was to make an appeal to all researchers—not only social scientists—to gather examples of work, and stories, of how many disciplines are using this data to uncover new aspects of human behavior. This morning, Mark wrote just such an appeal to the POLMETH mailing list, and in an effort to make this appeal to a larger audience I have reproduced it below:

Greetings,

One of us (Drew Conway) recently found that, although Twitter makes its data open to almost anyone via well-documented interfaces, and although it appears to encourage experimentation with its data, that doesn't extend to redistributing that data for replication. This poses serious questions about the use of twitter-based data for academic research. Twitter has been shown to be an accurate predictor of vote and polling outcomes, and a novel way to measure partisan polarization and communication. But without clear data use policies, research taking advantage of this data may not pass muster with journals requiring the release of replication files, and research progress will be hindered.

We think there is an opportunity here to show interdisciplinary academic interest in the Twitter data, and open a conversation about reasonable data retention and release policies on their part. At present, there appears to be a disconnect between Twitter's analysts, who seem to encourage data use, and its legal and business arm, who are very conservative with Twitter's intellectual property rights. Given this disconnect, Twitter has been inconsistent in its demands on researchers using this data. We're hoping that by pointing out the inconsistency and seeking a reasonable resolution, we can find a suitable outcome for both Twitter's business model and researchers' interests. If done correctly, this might have the potential to become a model for other social media sites of interest to social scientists.

We would like to engage participation from anyone in the PolMeth community who has an interest in this outcome. If you might be interested in participating, please let one of us know. We're only in the early stages of working on this, so we welcome all inquiries, ideas, and concerns.

Thanks for your interest. We will look forward to hearing from you.

We hope that those of you using Twitter data for research will help us in this effort. Please feel free to contact me directly, either by email or in the comments section below. We look forward to hearing from you!