Filling in empty values in a time series in Python
I track my bike rides, runs and walks in an Android app called KeepTrack. It’s a great app for doing so, but I wanted to look at a cumulative mileage graph for my bike riding and the free app didn’t provide that (not sure if the paid one does either). In any case, the app does allow you to export data to a CSV file, so I first gave it a try in Python.
What I will discuss here is a technique used to get the data in a ready to graph format.
One problem I encountered is that if I don’t ride on a day, I don’t enter zero for the day; rather I just do no entry for that day. So if I skipped the 19th and 22rd of July, a sorted data snippet might look like:
07-18-2014,30
07-20-2014,20
07-21-2014,23
07-23-2014,20
When I want a cumulative mileage graph, I want to show the mileage as it grows for ALL days, not just the days I have ridden. That is, I want the above snippet to look like
07-18-2014,30
07-19-2014,0
07-20-2014,20
07-21-2014,23
07-22-2014,0
07-23-2014,20
It’s fairly straight forward to implement this in Python. Essentially I used a dictionary comprehension to accomplish what you might use a left outer join to accomplish in SQL.
Let’s say we’re at the point where we have the sorted dates (only those with mileage) in a list called xdtsfmt and the corresponding mileage for those days in another list called y.
First, let’s form a dictionary with x as the keys and y as the values
bdict=dict(zip(xdtsfmt, y))
Now, we’ll form another list which contains all dates between our start and end date (basically a little less than 8 months starting in March and going 230 days)
import datetime as dt
from datetime import timedelta
alldt={dt.datetime.strptime(’03/01/2014′,”%m/%d/%Y”) +timedelta(days=x):0 for x in range(0,230)}
alldtfmt=[dt.datetime.strftime(d,’%m-%d-%Y’) for d in alldt]
The first statement populates a list with everyday from 3/1/2014 , for a total of 230 days.
The second statement was necessary since I found looking up in a dictionary by key would support strings, but not datetime.
Now we perform the outer join-like operation by using the following dictionary comprehension.
newdict = {k: bdict.get(k, 0) for k in alldtfmt}
Basically, where we find a match from alldtfmt list on bdict, we will use the corresponding mileage from bdict. When we don’t find the date in bdict, we will assign a 0 for mileage in newdict.
I then iterated through an order version of the dictionary to create two new lists
riddt=[]
ridcumsum=[]
sum = 0
for key in sorted(res):
riddt.append(key)
sum = sum +int(res[key])
ridcumsum.append(sum)
List riddt contains an order list of ALL dates in the range above, and ridcumsum contains a running sum of mileage for each corresponding date.
(Note before I could plot using matplotlib, I had to convert riddt to a datetime in order to use as the x-value in a plot)
0 Comments on “Filling in empty values in a time series in Python”