Thursday, March 22, 2012

Tracking SFO Airport's Performance Using R, HANA and D3

This is my first introduction to D3 and I am simply blown away.  Mike Bostock (@mbostock), you are genius and thanks for creating D3!  With HANA, R, D3, HTML5 and iPad, and you got yourself a KILLER combo!

I have been burning my midnight oil on piecing together my big data story using HANA, R, JSON and HTML5.  If you recall, I did a technical session on R and SAP HANA at DKOM, SAP's Development Kickoff Event last week where I showcased the supreme powers of R and HANA when analyzing 124 million records in real time.  R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data

Since last week, I have been looking for other creative ways to analyze and then visualize this airlines data. I am very fortunate to come across D3.  After spending couple of hours with D3, I decided to build the calendar view for the airlines data I have.  The calendar view is the first example Mike shows on his D3 page. Amazingly awesome!

I created this calendar view capturing the percent of delayed flight from SFO airports that departed daily between 2005-2008.  For this analysis, I used HANA to get the data out for SFO (out of 250 plus airports) over this 4 years period in seconds and then did all the aggregation in R including creating a JSON and .CSV file in seconds again.  Later, I moved to HTML5 and D3 to generate this beautiful calendar view showing SFO's performance.  Graphics is presented below:


As expected, December and January are two notorious months for flights delay.  Have fun with the live example hosted in the Amazon cloud.


Once again, my R code is very simple:

## Depature Delay for SF Airport
ba.hp.sfo <- ba.hp[Origin=="SFO",]

ba.hp.sfo.daily.flights <- ba.hp.sfo[,list(DailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
ba.hp.sfo.daily.flights.delayed <- ba.hp.sfo[DepDelay>15,list(DelayedDailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
setkey(ba.hp.sfo.daily.flights.delayed, Year, Month, DayofMonth)
response <- ba.hp.sfo.daily.flights.delayed[ba.hp.sfo.daily.flights]
response <- response[,list(Date=as.Date(paste(Year, Month, DayofMonth, sep="-"),"%Y-%m-%d"),
                           #DailyFlights,DelayedDailyFlights,
                           PercentDelayedFlights=round((DelayedDailyFlights/DailyFlights), digits=2))]
objs <- apply(response, 1, toJSON)
res <- paste('{"dailyFlightStats": [', paste(objs, collapse=', '), ']}')
writeLines(res, "dailyFlightStatsForSFO.json")                
write.csv(response, "dailyFlightStatsForSFO.csv", row.names=FALSE)

For D3 and HTML code, please take a look at this example from D3 website.

Happy Analyzing and Keep That Mid Night Oil Burning!

2 comments:

  1. This example was updated with a better tooltip to allow viewing from iPad - http://ata.blog.s3-website-us-east-1.amazonaws.com/d3SFO.html

    ReplyDelete
  2. seems like you need to update the D3 dependency. the examples on aws are broken

    ReplyDelete