Determining the closest city that you’ve never heard of
I’ve spent the majority of my life in the Chicagoland area. So it came as somewhat of a surprise to me when I recently ran across the name of a city that was less than 30 miles away that I had never heard of.
It got me thinking if there were cities even closer to me that I never heard of. (Yes, I’m going to keep ending this sentence with a preposition throughout). How would one go about finding this answer definitively? Well, “definitively” in the sense of ranking the closest cities to a given location. I hold no hope I can determine whether someone has heard of a city before !
This is the exact sort of proximity analysis for which spatial databases are so useful. So how would we go about approaching this problem ? To start, let’s assume that we know the location from which we are trying to determine the closest cities. (At some point, I may put a little site together to allow entry of coordinates, but for now let’s just say we know the latitude and longitude of the location in which we are interested)
Step 1 – Determine our methodology
If we are going to determine “closest” cities, we first have to determine what we mean by closest. Two metrics for the distance between our point and a given city come to mind:
a. The distance between our location and the closest point in a city
b. The distance from our location to the center (centroid) of a city
For this analysis, let’s say we are going to use (b), the distance to the centroid of the city. So we will want to
1. determine all candidate cities nearby,
2. compute the centroids of all the candidate cities above,
3. compute the distance from our location to all of the centroids
4. rank the distances from closest to furthest centroid
5. determine the closest city which we’ve never heard of (alas, this must remain a manual step)
Step 2 – Determine our candidate cities
This represents sort of a rough first pass at the problem. We want to say “what are all the cities” around here, and then proceed to compute their distances. An easy first cut might be all cities in the same county I am in. But what if I’m near a county border ? Wouldn’t I want to investigate the adjoining counties too ? So maybe I can pick my county and all the counties touching my county. What if that’s not big enough to yield a city I’ve never heard of ? Maybe I grab all the cities in my state. But what if I live near a state border ? It makes sense the closest city may be in that neighboring state. You can see the sort of issues we face in the first cut here. To start, though, let’s do this for the state of our location.
Where do I get a listing (and some sort of GIS coordinates) for all the cities in my state?
The US government has shapefiles available that contain cities by state. I grabbed the Illinois file from here
https://catalog.data.gov/dataset/tiger-line-shapefile-2013-state-illinois-current-places
I want to do this analysis from PostgreSQL/PostGIS, so let’s import that shapefile into a PostgreSQL table. A helpful cheat sheet for such a process can be found here. We will use the shp2pgsql command to do this step.
> shp2pgsql “c:\folder\test.shp” public.tshp> test.sql
> psql -h localhost -d closest_city -U username -f test.sql
After successfully running this, we should have the data representing all the cities in Illinois in table tshp.
Step 3 – Compute centroids
In a simplified version of the problem, in Step 2 sometimes we might be able to retrieve the centroids of the candidate cities. If that’s not the case, we will have some polygon representation of a city and wish to compute the centroid of the polygon. We can normally use the features of a spatial database to compute this and this is true for PostGIS. Specifically, we can use the ST_Centroid function.
Step 4 – Compute the distance to the centroids
Again, this is where a spatial database really shines. To compute the distance between our location and the centroids, it is a fairly straightforward call to a function, in this case ST_Distance.
Step 5 – Rank the distances
We can just use the old trusted SQL keyword ORDER BY to achieve this.
Step 6 – Consolidate the above functions into one SQL statement
select name, to_char(ST_Distance(ST_SetSRID(ST_Centroid(geom),4326)::geography,ST_SetSRID(ST_MakePoint(-87.684,41.836),4326)::geography)/1609,'9999.00') as dist_mi from tshp order by dist_mi
Let’s break that calculated field down a little. It’s basically a formatted version of the statement ST_DISTANCE(A,B)
where A is the centroid of each city in our table (calculated from the geom column) and B is our location of interest from which we are calculating distance, in this case the centroid of the city of Chicago. In both cases of A and B we set the SRID to 4326 with the ST_SetSRID function so we are comparing equivalent projections. We also cast A and B from a geometry to a geography (via the ::geography operator) so our distance calculation will return meters and subsequently divide by 1609 to convert the distance to miles.
The final to_char conversion is just a simple way of rounding to 2 digits for our printing purposes.
Step 7 – Determine the closest city you’ve never heard of
I won’t list the distances to all 1,367 cities from Chicago , but here are the 10 closest Illinois cities to Chicago and the 10 farthest cities from Chicago still in Illinois. Note that these are ‘as the crow flies’ straight line distances (in miles), not driving distances. For the purposes of our exercise, the closest city I had never heard of was Fairmont,IL at 27.14 miles (this game will be hard to play twice, as now I’ve heard of it).
Closest:
Farthest: