Here’s the setup for this post:

  1. What specifically was I trying to do?
  2. How did I do it?
  3. What significance does this problem have in the world of natural resource economics?
  4. Does this problem have any additional significance?

Normally, I would put the why before the how but this structure allows any R-philes with no interest in economics or public policy to bow out after the meaty stuff.

What was I trying to do
I have a reasonably large (by no means huge) list of a couple thousand origin/destination pairs (zip codes) and I needed to get driving distances for each pair.

How did I do it
I claim no credit for this as I pretty much pulled the R code right off StackOverflow – see this thread – but it solved a problem that comes up in natural resource economics from time-to-time so I’m sharing.

#read in the list of origin/destination zip codes
zips <- read.csv("ZipResFish.csv")

#remove any duplicates to cut down on the number of calls to the google maps
# API
zips <- zips[which(!duplicated(zips)==T),]

#now we are down to just the unique, nonzero combinations.  Run these through the script 
# to get travel distance from Google Maps

#assign a distance of 0 to any case where the zip_fish = zip_home
zips$km <- ifelse(zips$ZipRes==zips$ZipFish,0,-1

#some zips cannot be run
bad.zips <- c(95364,96701,27526,96010)
good.zips <- unique(c(zips$ZipRes,zips$ZipFish))
good.zips <- good.zips[which(! good.zips %in% bad.zips)]

library(XML)
library(RCurl)

zip.dist <- function(origin,destination){
  xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',origin,'&destinations=',destination,'&mode=driving&sensor=false')
  xmlfile <- xmlParse(getURL(xml.url))
  dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
  distance <- as.numeric(sub(" km","",dist))
  return(distance)
}

#test the function by getting the distance from West Linn, OR (where my parents live) and #Santa Cruz, CA (my house).
 zip.dist(origin=97068,destination=95065)/1000
[1] 1096.422

#now call the function for each pair in my list of zips
for(i in 1:nrow(zips)){
	origin <- zips$ZipRes[i]
	destination <- zips$ZipFish[i]
	
	if(origin != destination & 
              origin %in% good.zips & 
              destination %in% good.zips){
		d <- zip.dist(origin=origin,destination=destination)
	}else{
		d <- 0
	}
#distances are returned in meters so convert to km
zips$km[i] <- (d/1000)
}

Note that there was a little trial and error involved here since some of the zip codes reported in my survey did not yield driving directions.

It is also probably worth noting that I was a little sketched out about doing this so I contacted Google just to make sure my little routine was consistent with their terms of service and, in general, not likely to piss them off. I talked to a very nice sounding young man named Tony who assured me that as long as I could display the data on a Google Map somewhere in the public domain I was golden.

How did this problem arise?
The zip codes of origin and zip code of destination in my data are from a survey of recreational fishermen who visited sites in the Sacramento-San Joaquin Delta. A common flavor of analysis undertaken by resource economists in the public policy realm is to approximate the value of recreational fishing activity. This is potentially important if, for example, an agency is considering a some restoration project that might result in improved fishing conditions. Changing the fishing conditions would likely change the number of trips taken by anglers. If one has an approximation of the monetary value that anglers attach to each trip then one can calculate the expected economic benefits to anglers of engaging in the restoration project.

One of the ways economists attempt to value things like recreational fishing trips that are not priced in a formal market is to look at what anglers spent in order to take that trip. An important element of the cost of a recreational fishing trip is the travel cost incurred by the fisherman to access the fishing site.

Here are a few additional resources for readers interested in why public policy folks might care about the value angler’s attach to their fishing experience:

Is there any greater significance to this solution?
At the end of the day this solution probably seems trivial to any half-competent computer programmer…Hell, it took me less than 10 minutes to find some working code on StackOverflow. So why is it worth a blogpost? First, as I’ve mentioned, it solve a real problem that real applied economists face frequently – getting batch travel distances for large lists of origin/destination pairs.

Second, and possibly more importantly, it saves money. The previous method of getting the driving distances necessary for many flavors of economic analysis (in my agency at least) was to use a piece of proprietary software called PC Miler. PC Miler is produced by a commercial entity and costs a few thousand dollars for an annual license. The Google Maps Distance Matrix API provides a free (or low cost if you need to get more than 2,500 pings/day) solution, satisfying my fiduciary responsibility to the taxpayers.

Advertisements