Collecting U.S. Bicycle Fatality Data from the FARS database

We were looking to investigate bicycle fatalities involving cars, first at a more recent and local level and then eventually at a national and historical level. This post will detail our data collection at the national and historical level.

The National Highway Traffic Safety Administration (NHTSA) maintains an online resource for collecting information on “public yearly data regarding fatal injuries suffered in motor vehicle traffic crashes”. This is known as the Fatality Analysis Reporting System (FARS) and contains extensive information available to the public. A subset of this information of interest for this project is the crash level detail for all crashes resulting in the death of a bicyclist. The site also has a very nice query system, but we were hoping to mash up the data with some external data, so wanted to store the data for those purposes. Visit the FARS website here

Data Formats
In March of 2017, the FTP data from the NHTSA is available from 1975 through 2015. We wanted to store the bicycle crashes in a SQL database, so were hoping to see the data available in a CSV or SQL format. Unfortunately, this was only possible for the year 2015. Prior to this, data is available in either DBF or SAS format. We decided to use the DBF format and convert to CSV files which we would the load to the SQL database.

Data Necessary for Reporting Bike Fatalities from FARS
The information contained in the FARS database is really quite extensive. We were looking for a subset of this, namely crashes involving bicycle fatalities, so would not need the entirety of the data. The database is very well documented; indeed there’s a 500+ page manual on the data elements available. After poring through (OK, skimming) this information,  we determined we would need the following tables from the FARS system to start.

NOTE: The following is generally the case for a number of years being analyzed; starting in 2015, the reporting for identifying bicyclist is slightly different. Additionally, not all the years have the exact format, so there is a bit of work in importing and storing

Accident table
This is the main table in the FARS system. It contains every crash that has occurred, indexed by a state code and a case number within the state. Additionally, it contains crash level data such as timing, location,weather and number of people involved.

Person table
The Person table contains one entry for each person involved in an accident, and is tied to the accident via the state and state case number. It is here that we can determine whether a person involved in an crash was a bicyclist and also whether the crash resulted in a fatality for the bicyclist.

Additional Master Data
In addition to the individual crash data noted above, we would need data to translate such values as state name, city name and county name. This data itself was not available from the FARS website, but the codes are standardized and so were available elsewhere.

A quick snapshot

Using the data, here’s a snapshot of all auto crashes resulting in bike deaths between the years 2011 and 2014. Besides being taken aback by the how many there are, there’s not much to be gleaned here as, is often the case, the most incidents occur in the most populated areas. But having the data in SQL form,we are ready to start more in depth analyses. Feel free to contact  if you have interest in this data.  At some point we may stand the SQL database on a website, but for now are keeping it local.

Bike accidents 2011 thru 2014