Jittering a map location for privacy
As I mentioning in a previous post , I had a project that required mapping locations, but I wanted to slightly perturb the locations on the map so as to not reveal any of the actual locations on the map, while still showing an approximate location. Throughout this post I will use the terms jitter and perturb interchangeably; in either case, I mean to slightly move a location. We are doing this to preserve privacy of the original location while still maintaining the approximate location for illustrative purposes of the analysis.
From the requirements of the project (this wasn’t exactly national security), I decided I wanted to take the latitude and longitude of the actual location and randomly move it a small amount from the actual location. I wanted this movement to be random per location so that knowledge of one actual location and its perturbed location would not allow determination of all actual locations. Additionally, I wanted to be sure each perturbed location was a minimum distance from the actual location; I did not want to accidentally (randomly) generate a distance from the actual location that was so small that it would reveal the actual location to someone looking at it and not realizing the location had been perturbed.
I played around with a few approaches, and settled on the idea of randomly moving a location to be 1/4 block away from the actual location on a random point on a circle drawn around the original point. In this way, the new location would be close to the original to be worthwhile for illustrative purposes, but not so close as to original to be able to figure out the original location
DETERMINING THE NEW LOCATION
My first passes at this included just randomly perturbing the lat and long separately. This wasn’t very satisfactory, as it resulted in the jittered locations being confined to four square areas around the original location. Here’s an example adding a random 1/4 to 1/2 block independently to the latitude and longitude from an original location of 2200 W Madison in Chicago, IL. This is a 3000 iteration simulation.
I wanted something a little more evenly distributed in the area. My solution to pick a random point on a circle of a fixed radius (1/4 block) around the original location satisfied my requirements. Here’s a simulation generating 3000 perturbed points from an original location of 2200 W Madison in Chicago, IL.
IMPLEMENTATION IN PYTHON
The python code to implement the jitter of one point is very straight forward. Assuming the variables lat and lng contain the original latitude and longitude,we can just execute as follows. This uses polar coordinates, so for a fixed radius rrad, we generate a random angle rtheta to determine where on the circle we select from, then determine the amount of jitter in each direction with a little trigonometry.
Note that we scale the jitter in each direction by an amount lat_scale and lng_scale to scale to a Chicago city block as discussed here.
import random
import math
# These constants scale our jitters down to Chicago city blocks; see previous post linked above
lat_jitter = 0.002438
lng_jitter = 0.001831
rrad = 0.25
rtheta = random.uniform(0,2*pi)
rjlat = rrad*math.cos(rtheta)
rjlng = rrad*math.sin(rtheta)
jit_lat = lat + rjlat*lat_scale
jit_lng = lng + rjlng*lng_scale
COULD THE ACTUAL LOCATION BE DETERMINED FROM THE NEW LOCATION ?
Given just the perturbed location presented in the analysis, would it be possible to determine the original location ? I tried to think this through, and it seems hard to imagine. Certainly, in theory, there are an infinite number of original points which could generate the same perturbed points. In reality, though, we are not starting with an infinite number of points from which to generate the perturbed point. Rather, we have a very well defined, finite number of points, namely the list of addresses in an area. Again, this wasn’t an issue of national security, but this question still vexed me. I hope to address it in a future post. For now though, I think a reasonable approach would be either to not reveal the length of the radius of the circle from which we are choosing random points (to prevent a reverse engineering). Additionally, we could randomly perturb the radius itself to ensure this.