Wikileaks Afghanistan Data

By now, you have most certainly have read about the publication of a massive (72,000+) number of classified documents related to coalition operation in Afghanistan by the whistleblowers group Wikileaks. The data are available in several formats at the Wikileaks dedicated site.

Before proceeding, I want to point out that given the nature by which this information was obtained and subsequently disseminated I am unclear as to the legal protections provided to those in possession of the data (i.e., retaining copies on their hard drives), or performing analysis (i.e., citing data in research). As such, I am not recommending or condoning anyone download the data until these questions are explicitly addressed.

I, however, have downloaded the data and begun examining it at a high-level. I believe such an examination is critical for two reasons: first, this is the first time in history that the public has been given such a granular view of the day-to-day operation of contemporary warfare. With the proper analytical tools, this data may reveal insights to the predicates of conflict in ways that previous aggregate-level data could not. Second, because the data may have gone through some degree of filtering/selection by Wikileaks, an intricate analysis of the data may provide insight into the nature of that selection and the process by which this selection occurred.

After the jump is an initial overall descriptive visualization of the data as it was provided by Wikileaks, with some brief interpretations. Over the next several days and weeks, I hope to examine the data in more detail and periodically present the results.


The above graph displays the volume of reports over the six year period covered by the data set, broken down by the reporting region, e.g, RC SOUTH, RC EAST, etc.; and the target of attack noted in the incident report, e.g., ENEMY, FRIENDLY, etc.

My motivation in creating this chart was to do a very quick assessment of the trends in the data. Given the nature of the reports, we would expect a noticeable degree of seasonality (peaks and valleys) given the natural ebb and flow of war. Any drastic deviations from this expectation could indicate a strong degree of selection on the part of Wikileaks. As you can see, however, the data generally do fit this expectation. Note the dramatic upward trending seasonality present in the heavy reporting areas of RC EAST and RC SOUTH. Perhaps more interestingly, though, is the sudden increase in the number of NEUTRAL reports present in the data for RC EAST and RC CAPITAL for the period roughly between mid-2006 and mid-2008.

Perhaps a more detailed reading of the reports from those areas at that time would reveal information about the nature of the fighting at that time, or the selection process present in the data.