Wednesday, April 11, 2012

Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps

Technologies: SAP HANA, R, HTML5, D3, Google Maps, JQuery and JSON
For this fun exercise, I analyzed more than 200 million data points using SAP HANA and R and then brought in the aggregated results in HTML5 using D3, JSON and Google Maps APIs.  The 2008 airlines data is from the data expo and I have been using this entire data set (123 million rows and 29 columns) for quite sometime. See my other blogs

The results look beautiful:

Each airport icon is clickable and when clicked displays an info-window describing the key stats for the selected airport:
I then used D3 to display the aggregated result set in the modal window (light box):
D3 made it looks ridiculously simpler to generate a table from a JSON file. 
Unfortunately, I can't provide the live example due to the restrictions put in by Google Maps APIs and I am approaching my free API limits. 

Fun fact:  The Atlanta airport was the largest airport in 2008 on many dimensions: Total Flights Departed, Total Miles Flew, Total Destinations.  It also experienced lower average departure delay in 2008 than Chicago O'Hare. I always thought Chicago O'Hare is the largest US airport.

As always, I just needed 6 lines of R code including two lines of code to write data in JSON and CSV files:
################################################################################
airports.2008.hp.summary <- airports.2008.hp[major.airports,     
    list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),
    TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),
    TotalFlights=length(Month),
    TotalDestinations=length(unique(Dest)),
    URL=paste("http://www.fly", Origin, ".com",sep="")), 
                    by=list(Origin)][order(-TotalFlights)]
setkey(airports.2008.hp.summary, Origin)
#merge the two data tables
airports.2008.hp.summary <- major.airports[airports.2008.hp.summary, 
                                                     list(Airport=airport, 
                                                          AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations, 
                                                          Address=paste(airport, city, state, sep=", "), 
                                                          Lat=lat, Lng=long, URL)][order(-TotalFlights)]

airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)
writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json")                 
write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)
##############################################################################

Happy Coding and remember the possibilities are endless!

22 comments:

  1. This post has a lot of importance to the people…I hope you can continue to inspire and post more of this…Thanks

    ReplyDelete
  2. Thanks for your post. I've been learning R for the past few weeks and finding it wonderful, especially the data.table package (which you are using here).

    You are joining the major.airports and airports.2008.hp.summary tables, and then overwriting the result to airports.2008.hp.summary. I think if you keep the existing airports.2008.hp.summary table and just add the new columns of matching rows from major.airports using the ":=" operator, you'll see a speed improvement. I tried an example and found it to be roughly 50 times faster (less than 2 seconds for a table with 90M rows, vs about 85 seconds using the overwrite method). Here is my example code:

    # http://stackoverflow.com/questions/11308754/add-multiple-columns-to-r-data-table-in-one-function-call

    library(data.table)

    fDT1<-function(n) data.table(x=rep(rep(c("a","b","c"),each=3),n), y=rep(c(1L,3L,6L),n), v=rep(1L:9L,n), key="x")
    DT2<-data.table(x=letters, z1=sample(1L:26L), z2=sample(27L:52L),key="x")

    n<-1e7L
    DT1<-fDT1(n)
    res1<-system.time(DT1<-DT2[DT1])[3]

    DT1<-fDT1(n)
    res2<-system.time(DT1[DT2,c("z1","z2"):=list(z1,z2),nomatch=0])[3]

    list(method_1=res1,method_2=res2,improvement=paste0(round(res1/res2,1),"X"))

    ReplyDelete
  3. .It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command.

    Hadoop Online Training
    Data Science Online Training|
    R Programming Online Training|

    ReplyDelete
  4. Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday. I want to encourage that you continue your great posts.
    oracle training in chennai

    oracle training institute in chennai

    oracle training in bangalore

    oracle training in hyderabad

    oracle training

    oracle online training

    hadoop training in chennai

    hadoop training in bangalore

    ReplyDelete
  5. Quite Interesting post!!! Thanks for posting such a useful post. I wish to read your upcoming post to enhance my skill set, keep blogging.I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
    selenium training in chennai

    selenium training in chennai

    selenium online training in chennai

    software testing training in chennai

    selenium training in bangalore

    selenium training in hyderabad

    selenium training in coimbatore

    selenium online training

    selenium training



    ReplyDelete
  6. Good Post! , it was so good to read and useful to improve my knowledge as an updated one, keep blogging.After seeing your article I want to say that also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts likethis. https://www.3ritechnologies.com/course/aws-online-training/

    ReplyDelete
  7. Definitely a great post. Hats off to you! The information that you have provided is very helpful. Also read this article Franchise Options In Bangalore

    ReplyDelete
  8. kya aap islamic information zero to hero sikhna chahte hai to www.namazquran.com website ke sath jude jao waha pe apko daily new post islamic knowledge ka milega.

    ReplyDelete
  9. Trade FX At Home On Your PC: tradeatf Is A Forex Trading Company. The Company States That You Can Make On Average 80 – 300 Pips Per Trade. tradeatf States That It Is Simple And Easy To Get Started.

    ReplyDelete
  10. That Is Very Interesting, You Are An Excessively Skilled Blogger. Stay In Control Of Your Online Trades With AximTrade Review Login, A Cloud-based Online Trading Platform.

    ReplyDelete
  11. Whether you are a new trader or an experienced trader, Online Stock Broker provides all of the information you need to make an educated decision about which online broker to use. We have years of experience in this industry working alongside brokers, traders and market makers and we bring our knowledge and expertise to you with this guide.

    ReplyDelete
  12. Loginpal Is An Online Portal Created For Traders To Help Them Find The Best Brokers And Strategies For Online Trading. Along With Broker Review And Login Details At Loginpal We Offer Guest Post And Blog, Content Marketing Services, Link Building Services, And Much More. Also Get The Latest Press Releases, Articles, Industry News And Price Quotes That Might Effect Investment Decision.

    ReplyDelete
  13. Forex Trading Reviews Is An Online Portal That Brings You The Best Brokers & Strategies For Forex. Along With Our Broker Review And Login Details, We Offer Blog Post Writing, Content Marketing Services, Press Release And Much More! Visit Now & Get Started!

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. I want the world to know about where to invest their hard earned money and get fruitful returns. If one is looking forward of investing he can go into investment of crypto coins.
    You can invest in Fudxcoin company that deals in the selling and purchasing of Crypto Currency. It is a reliable company. One need not doubt in investing in it as i have also bought crypto currency from it and feeling very satisfied with their services.
    crypto currency block chain technology

    ReplyDelete
  16. Fudx is a hospitality industry which caters the need of an individual by providing them with food,medicines,grocery and dairy products at their door steps with speedy delivery from your favourite places. One can order through Fudx app and the needs of the customers are met with their speedy service. One need not go anywhere ,just download its app and start ordering.



    contact@thefudx.com
    +91 9833 86 86 86
    022 4976 1922

    ReplyDelete