This project was inspired by various globally-reported mass shootings in the US. In particular, I was intrigued that there was no fixed definition of what constitues a mass shooting. The FBI defined as a mass killing as four or more victims, Congress voted to define it as three or more killings and various US media organisations tended to use four as the cut-off point, although some went higher. In addition to the cognitive dissonance arising from setting the bar for mass shootings at four victimes, it was surprising that no-one could agree on what a mass shooting was, given the reported scale of the problem. I also thought it was odd that the shooting or death of the perpetrator was not considered in any of the definitions.
For this project, I analysed a dataset of gun violence incidents between 2013 and 2018, collected by the Gun Violence Archive and available on Kaggle. I was interested to find out how the number of mass shootings changed depending on the definition of mass shooting.
The code, written in R, can be viewed at: USgunviolence.r
The full findings, with charts, are outlined in my blog post, The bar for mass shootings in the US is ridiculously low
The project can be broken down into the stages below. In the code, this is indicated by commented headings
After reading a dataset in CSV/Excel format, the number of gun violence incidents (number of rows) and the number of deaths and non-fatal injuries (sum of the particular columns) was calculated.
The first task was to calculate the average number of deaths and injuries per shooting. The mean returned a value between 0 and 1, which did not appear be particular helpful at first. So the mode was calculated to find out the number of times that there were x deaths and injuries in a shooting and a chart was plotted. From this, it could be seen that the most frequent outcome for a gun violence incident was 1 death or injury, followed by 0 deaths or injuries. This explained the value of the mean.
In order to calculate the mode using R, the relevant column was converted into a table and then sorted by number of incidents in ascending order. The mode was the last entry in the sorted table.
Using the table of modes created above, it was possible to calculate the number of mass shootings, using 3, 4 and 5 as the cut-off point, by calculating the sum of the frequency of modes for a subset of the whole table. Then, using iteration, it was possible to create a list for the number of mass shootings using i as the cut-off point, where i is the number of deaths and injuries starting with 3.
The list was plotted on a chart to show graphically how the different definitions affected the number of mass shootings. It can be seen that there is a big difference in the number of mass shootings between using a cut-off point of 3 and cut-off point of 4. There is also a smaller but still significant difference between using a cut-off point of 4 versus 5. Really, the difference in definition starts to matter less and less once you use a cut-off point that is larger than five, i.e. at least six.
The problem was that there was no obvious way, from an visual examinsation of the list of sum of modes, of picking a cut-off point that did not seem arbitrary. Statistically speaking, the most sensible approach was to find the average of the list of sum of modes. Depending on whether one used the median or the mean, the most reasonable cut-off point was either 15 (mean) or 20 (median). Given the variability in the number of deaths and injuries, I decided to use the median.
In other words, I judged that the most reasonable definition for mass shooting was 20 deaths or injuries.
Using the above adjusted definition, I was then able to go iterate through the original dataset to identify where and when were the gun violence incidents that met this criteria. In my definition, there were only 19 mass shootings between 2013 and 2018.
Whilst it is not possible to say that my method for defining what constitutes a mass shooting is more accurate, the data does show that how one defines a mass shooting does affect the number of gun violence incidents that become categorised as mass shootings. There is an argument to be made that how US Congress and the FBI defines mass shooting is too low.