Building the National Park road trip route

UPDATE: As mentioned in the previous post, after posting this I found that the folks at Isle Box  had this idea before me and beat me to the punch on doing it. I’ll leave this up since I think this post on methodology is illustrative, though my final results here don’t really add anything to their original results.

 

ORIGINAL POST:

I became interested in the idea of a road trip that would hit as many of the national park in the United States as possible. I was inspired by the road trips created by Randal Olson (http://www.randalolson.com/2015/03/08/computing-the-optimal-road-trip-across-the-u-s/), and used some of the code provided by his GitHub site to calculate the trip itself.

In the previous post, I showed the road trip as computed. I’ll use this post to take you through the steps involved.

Get a list of the national parks

There are 59 officially designated national parks in the United States. I realized that I would need to eliminate any park that one could not actually drive to, but for now I figured I’d collect all of the parks and cull the herd later in the analysis. There are no shortage of sources of data for this for the web.  I used the list available from Wikipedia and scraped this using Python code

Geocode the parks

In order calculate distances and routes between parks, we need to obtain the latitude and longitude of each park. Usually when I do this sort of thing, I use an API like Google Maps or Mapquest and do so programmatically.  However, in this case, the Wikipedia entries contained the lat and long for each park. I came to regret this decision later, as when I later tried to calculate routes using these values, the mapping services were not always able to calculate driving routes to the points indicated in the Wikipedia data.  It appears the lats and longs for some parks in Wikipedia were within the respective parks, but not necessarily reachable by car,  so calculating a route via car failed.  Being a thousand feet  off in the middle of Chicago might not effect the ability to drive to the nearby location, but in the middle of a park it becomes a problem. So I had to correct a number of these lats and longs later, usually to the location of the visitor center, which is likely reachable by car.
As my dad used to point out to me “I guess anything worth doing is worth doing twice, huh ?”

Culling the herd

I knew there were some parks I would naturally exclude from this little project. Parks in Hawaii make for a rough drive, as do other island based parks like those in the US Virgin Islands and American Samoa.  I also decided to exclude the 7 parks of Alaska , as the distance as a group made for a discontinuity I thought made them worthy of a trip by themselves.
I thought that would be it, but it turns out 3 more parks in the continental U.S. are not directly accessible via car, and I felt the time to get there via other methods made them infeasible as part of a road trip. These 3 parks were the Channel Islands, Isle Royale and the Dry Tortugas, all of which require a flight or boat ride to reach. This then left us with 44 trips to include in the road trip.

Here’s the 44 parks on a map, as shown in the very helpful tool at www.geojsonlint.com (note that this is a screenshot, not a navigable map).

Parks on map

Well, that’s good to know. Now how do we route an efficient trip through all those points ?

Get the distances between all the points

At any project of this type, at some point you need to calculate the distances between the points being considered. This can be done with a number of mapping services on the web, the most popular being Google Maps or Mapquest.  The distance between each point and every other point must be calculated (e.g. you need to calculate the distance from Acadia to each of the other 43 parks).  An efficiency to be gained here is that you are essentially calculating a 44 x 44 distance matrix, but since it should be symmetric (the distance from Acadia to the Everglades is the same as that from the Everglades to Acadia), you only have to calculate half those distances via calls to the mapping service.

Calculate the best route

Now armed with a matrix of parks and the associated distance matrix, we can calculate the near optimal route using the code provided by Randal Olson’s site (http://www.randalolson.com/2015/03/08/computing-the-optimal-road-trip-across-the-u-s/).  As I was doing this as a one time output where time wasn’t a major concern,  I ran the overall algorithm a number of times and picked the final route as that with the lowest mileage.

Map the result

I have set up a local app that I developed in Django that I use to map the parks and the route between them. The app relies on D3.js and leaflet to produce the map shown in the previous post.