Tuesday, January 24, 2012

Geocode your data using, R, JSON and Google Maps' Geocoding APIs



Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics.  In 2012, I decided to take thing in my control and turned to R.  Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.


To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.

Here is function that can be called repeatedly by other functions:

getGeoCode <- function(gcStr)
{
  library("RJSONIO") #Load Library
  gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
 #Open Connection
 connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="") 
  con <- url(connectStr)
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con)
#Flatten the received JSON
  data.json <- unlist(data.json)
  lat <- data.json["results.geometry.location.lat"]
  lng <- data.json["results.geometry.location.lng"]
  gcodes <- c(lat, lng)
  names(gcodes) <- c("Lat", "Lng")
  return (gcodes)
}

Let's put this function to test:
geoCodes <- getGeoCode("Palo Alto,California")

> geoCodes
           Lat            Lng 
  "37.4418834" "-122.1430195" 


You can run this on the entire column of a data frame or a data table:

Here  is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.

> head(shortDS,10)
     Opposition              Ground.Country Toss
1      Pakistan            Karachi,Pakistan  won
2      Pakistan         Faisalabad,Pakistan lost
3      Pakistan             Lahore,Pakistan  won
4      Pakistan            Sialkot,Pakistan lost
5   New Zealand    Christchurch,New Zealand lost
6   New Zealand          Napier,New Zealand  won
7   New Zealand        Auckland,New Zealand  won
8       England              Lord's,England  won
9       England          Manchester,England lost
10      England            The Oval,England  won

To geo code this, here is a simple one liner I execute:

shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,
                  laply(Ground.Country, function(val){getGeoCode(val)})))



> head(shortDS, 10)
    Opposition           Ground.Country Toss  Ground.Lat  Ground.Lng
1     Pakistan         Karachi,Pakistan  won   24.893379   67.028061
2     Pakistan      Faisalabad,Pakistan lost   31.408951   73.083458
3     Pakistan          Lahore,Pakistan  won    31.54505   74.340683
4     Pakistan         Sialkot,Pakistan lost  32.4972222  74.5361111
5  New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6  New Zealand       Napier,New Zealand  won -39.4928444 176.9120178
7  New Zealand     Auckland,New Zealand  won -36.8484597 174.7633315
8      England           Lord's,England  won     51.5294     -0.1727
9      England       Manchester,England lost   53.479251   -2.247926
10     England         The Oval,England  won   51.369037   -2.378269



Happy Coding!

4 comments:

  1. Thanks for this great tool. Any idea why it returns a bunch of (but not all) NAs?

    ReplyDelete
    Replies
    1. google only allows 10 requests per second.

      Delete
  2. Hi Dave, Where can I specify the API key?

    ReplyDelete
  3. Can anyone here please explain about geocode and the use of it?

    ReplyDelete