Extract Chicago Crime Data from the Open Data Portal Using Python

As part of the investigation in crime around the 606 trail, we pulled the crime data from Chicago’s open data portal. The following is a straight forward sample Python script for pulling a daily extract. Note that the available crime data on the portal lags by 7 days, so this script will extract a day’s worth of data from 8 days prior to the run date.

import urllib
import json
from datetime import datetime,timedelta
now = datetime.now()

sep  = ','

# Base url for Chicago Open Data Portal crime API; we'll add date and location filters
baseurl="https://data.cityofchicago.org/resource/6zsd-86xi.json"
datebetw = "?$where=date between "

# Crime data availability lags by 7 days; we run on a daily basis and grab 8 days ago 
end_date = now - timedelta(days=8)
st_date = now - timedelta(days=9)

dateocc_1 = "'" + str(st_date.year) + '-' +  str(st_date.month) + '-' + str(st_date.day) + 'T00:00:00' + "'" 
dateocc_2 = "'" + str(end_date.year) + '-' +  str(end_date.month) + '-' + str(end_date.day) + 'T00:00:00' + "'" 


# the syntax for this filter is  'within_box(location_col, NW_lat, NW_long, SE_lat, SE_long)'
boxurl = 'within_box(location, 41.924393, -87.736558, 41.902645, -87.6538)'

# Create the overall URL to interogate API with our data and location filters
ourl = baseurl + datebetw + dateocc_1  + ' and ' + dateocc_2  + ' AND ' + boxurl


jsonurl = urllib.urlopen(ourl)
text = json.loads(jsonurl.read())

# Create a file name for the daily extract with a timestamp
fname = 'extract_' + str(now.year) + str(now.strftime('%m')) + str(now.strftime('%d')) + '_T' + str(now.strftime('%H')) +  str(now.strftime('%M')) +  str(now.strftime('%S')) +'.txt'    

tfile = open(fname, 'w')

# Loop through crimes a write a comma separated line for each crime
for t in text:
	tfile.write(t['id'] + sep + t['block'] + sep + t['date'] + sep + t['description'] + sep + t['primary_type'] + sep + t['location_description']) 
	tfile.write("\n")

tfile.close()