A couple of days ago someone posted a link to a data set of all TIME Magazine covers, from March, 1923 to March, 2012. Of course, I downloaded it and began thumbing through the images. As is often the case when presented with a new data set I was left wondering, "What can I ask of the data?"
After thinking it over, and with the help of Trey Causey, I came up with, "Have the faces of those on the cover become more diverse over time?" To address this questions I chose to answer something more specific: Has the color values of skin tones in faces on the covers changed over time?
I developed a data visualization tool, I'm calling the Shades of TIME, to explore the answer to that question.
The process for generating the Shades of TIME required the following steps:
- Using OpenCV to detect and extract the faces appearing in the magazine covers
- Using the Python Image Library to implement the Peer, at al. (2003) skin tone classifier to find the dominant skin tone in each face
- Designing a data visualization and exploration tool using d3.js
The code and data are all available at my Github. Instructions for how to use the tool to explore the data are available at the tool page itself. It is worth checking out just as a fun way to explore the TIME Magazine covers.
I have two primary observations from exploring the data. First, it does appear that the variance in skin tones have changed over time, and in fact the tones are getting darker. Most of the first quarter of the data are hard to interpret because TIME was still largely using black and white images, and when they did use color it was often artist's renderings of portraits. The interpretation of skin tone in drawings is difficult. Around the mid-1970's, however, there appears to be an explosion of skin tone diversity. Of course, there can be many reasons for this, not the least of which may be improvement in photo and magazine printing technologies.
Second, and much more certainly, is TIME has steadily increased the number of faces that appear on their covers over time. As you scroll through the visualization you will quickly notice the number of faces per cover increase from one, to a few, to many in the 1990's through 2010's. Whether this is the result of a desire to show a more diverse set of faces, or increase their marketing appeal on newsstands, or both; is completely unknown.
But, as with most data projects of this nature the resulting tool generates more observations than questions. Perhaps the most important is how brittle the out-of-the-box face detection algorithms were. As you click through the tone cells you will notice that many of them do not correspond to a face at all. As such, it may be difficult to interpret any of this as relevant to the motivational question. That said, in aggregate there are many more faces than there are false-positives, so the exercise still seems useful.