Wednesday, April 25, 2012

Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part II


In my last blog, Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps, I analyzed historical airlines performance data set using R and SAP HANA and put the aggregated analysis on Google Maps.  Undoubtedly, Map is a pretty exciting canvas to view and analyze big data sets. One could draw shapes (circles, polygons) on the map under a marker pin, providing pin-point information and display aggregated information in the info-window when a marker is clicked.  So I enjoyed doing all of that, but I was craving for some old fashion bubble charts and other types of charts to provide comparative information on big data sets.  Ultimately, all big data sets get aggregated into smaller analytical sets for viewing, sharing and reporting.  An old fashioned chart is the best way to tell a visual story!

On bubble charts, one could display four dimensional data for comparative analysis. In this blog analysis, I used the same data-set which had 200M data points and went deeper looking at finer slices of information.  I leveraged D3, R and SAP HANA for this blog post.  Here I am publishing some of this work:  

In this first graphics, the performance of top airlines is compared for 2008.  As expected, Southwest, the largest airlines (when using total number of flights as a proxy), performed well for its size (1.2M flights, 64 destinations but average delay was ~10 mins.)  Some of the other airlines like American and Continental were the worst performers along with Skywest.  Note, I didn't remove outliers from this analysis.  Click here to interact with this example (view source to get D3 code).


In the second analysis, I replaced airlines dimension with airports dimension but kept all the other dimensions the same.  To my disbelief, Newark airport is the worst performing airport when it comes to departure delays.  Chicago O'Hare, SFO and JFK follow.  Atlanta airport is the largest airport but it has the best performance. What are they doing differently at ATL?  Click here to interact with this example (view source to get D3 code).


It was hell of a fun playing with D3, R and HANA, good intellectual stimulation if nothing else!  Happy Analyzing and remember possibilities are endless!

As always, my R modules are fairly simple and straightforward:
###########################################################################################  
#ETL - Read the AIRPORT Information, get major aiport informatoin extracted and upload this 
#transfromed dataset into HANA
###########################################################################################
major.airports <- data.table(read.csv("MajorAirports.csv",  header=TRUE, sep=",", stringsAsFactors=FALSE))
setkey(major.airports, iata)

all.airports <- data.table(read.csv("AllAirports.csv",  header=TRUE, sep=",", stringsAsFactors=FALSE)) 
setkey(all.airports, iata)

airports.2008.hp <- data.table(read.csv("2008.csv",  header=TRUE, sep=",", stringsAsFactors=FALSE)) 
setkey(airports.2008.hp, Origin, UniqueCarrier)

#Merge two datasets
airports.2008.hp <- major.airports[airports.2008.hp,]


###########################################################################################  
# Get airport statisitics for all airports
###########################################################################################
airports.2008.hp.summary <- airports.2008.hp[major.airports,     
    list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),
    TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),
    TotalFlights=length(Month),
    TotalDestinations=length(unique(Dest)),
    URL=paste("http://www.fly", Origin, ".com",sep="")), 
                    by=list(Origin)][order(-TotalFlights)]
setkey(airports.2008.hp.summary, Origin)
#merge two data tables
airports.2008.hp.summary <- major.airports[airports.2008.hp.summary, 
                                                     list(Airport=airport, 
                                                          AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations, 
                                                          Address=paste(airport, city, state, sep=", "), 
                                                          Lat=lat, Lng=long, URL)][order(-TotalFlights)]

airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)
writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json")                 
write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)

41 comments:

  1. What other R packages you use aside from Rjson?

    I think Rjson package is the secret weapon to R and D3 for visualization.

    http://cran.r-project.org/web/packages/rjson/index.html

    Thanks.

    Noli

    ReplyDelete
  2. In one of your blog, you are using D3 with Sencha Touch for iPad for visualization of SAP Hana. Is SAP Hana live i.e. on-line in this case when you do the querying?

    However, you have 2 D3 examples using .csv and .json files from SAP Hana data mining results using R. I guess they are also hosted in AWS as well probably outside the SAP Hana server, right?

    They reason, I am asking these questions are, I am interested in Hybrid apps using SAP Hana (e.g. results from data mining and or live queries from SAP Hana). I have created a Hybrid app for iOS using Cartodb (Cartodb.com) i.e. GIS PostgreSQL database / PostGIS in the cloud.

    Here are the screenshots of the cartodb

    https://picasaweb.google.com/116847891529748214201/LeafletCartoDBProtectedPlanetIPhone

    Here are some of screenshots for D3 and g.Raphael which I intended for SAP Hana mobile hybrid apps using PhoneGap.

    https://picasaweb.google.com/116847891529748214201
    /D3AndGRaphaelForSAPHanaHybridMobile

    I am interested to see some example with D3, SAP Hana and PhoneGap. I think you can easily package your example (e.g. Sencha Touch with D3 json data). If only I can get the exact URL of D3.json data, I think can compile your example in iOS phonegap.

    Hope to see some replies.

    Thanks.

    Noli

    ReplyDelete
  3. Have you seen the relatively new fread function in the data.table package? It is easier to use than read.csv and loaded the 2008.csv data set about 6 times faster.

    > system.time(airports.2008.hp <- fread("./2008.csv"))
    user system elapsed
    17.44 0.29 17.81
    > system.time(airports.2008.hp <- data.table(read.csv("2008.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)))
    user system elapsed
    99.72 3.29 103.87

    ReplyDelete
  4. thanks for sharing important information on SAP HANA

    ReplyDelete
  5. The information which you have provided is very good and easily understood.
    It is very useful who is looking for sap hana Online Training.

    ReplyDelete
  6. The Information you provided is very much useful for SAP HANA Learners. This Information was very Intersting, We also provide SAP HANA Online training in India.

    ReplyDelete
  7. Appreciation for nice Update, I found something new and folks can get useful information about sap hana Online Training

    ReplyDelete
  8. Excellent sharing!Ultimately, all big data sets get aggregated into smaller analytical sets for viewing, sharing and reporting. An old fashioned chart is the best way to tell a visual story!At Pherona, we provide a complete set of services that cover the full software development lifecycle. Our expertise is in web, database and mobile applications, and our client list spans from startups to the Fortune 500.
    Web Design Services Orlando

    ReplyDelete
  9. very useful really good information thanks for posting such a good information it will hepls the people a lot keep it up , Regards
    sap scm training
    sap fico training

    ReplyDelete
  10. Thanks for valuable great article.One important decision to make is whether or not your website should use a custom New York web design or a generic website template.Web Design Orlando

    ReplyDelete
  11. شهر آنتالیا که یکی از معروفترین مقاصد سفر در بین مردم ایران است،سالانه پذیرای تعداد زیادی از مسافران در قالب تور آنتالیا می باشد.اگر قصد سفر به شهر آنتالیا را دارید،خواهشمندیم برای رزرو تور آنتالیا با ما در تماس باشید.
    http://sourtmehtravel.com/antalya-tours/

    ReplyDelete
  12. Its a wonderful post and very helpful, thanks for all this information.
    SAP HANA Training in Gurgaon

    ReplyDelete
  13. Hey, Wow all the posts are very informative for the people who visit this site. Good work! We also have a Website. Please feel free to visit our site. Thank you for sharing. Well written article Thank You for Sharing with Us pmp training centers in chennai| pmp training in velachery | project management courses in chennai | project management training in chennai | project management certification online | project management course online

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete

  15. keep up the good work. this is an Assam post. this to helpful, i have reading here all post. i am impressed. thank you. this is our digital marketing training center. This is an online certificate course digital marketing training in bangalore

    ReplyDelete
  16. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge. Hope you'll share this type of post in regular basis.lovely page guys
    Ai & Artificial Intelligence Course in Chennai
    PHP Training in Chennai
    Ethical Hacking Course in Chennai Blue Prism Training in Chennai
    UiPath Training in Chennai

    ReplyDelete

  17. ظهور نخستین سایت های شرط بندی
    اما نخستین سایت های شرط بندی چگونه شکل گرفتند؟ قبل از اینکه به مبحث اصلی مقاله یعنی معرفی سایت بازی انفجار نیترو بپردازیم، بد نیست کمی با ظهور نخستین سایت های شرط بندی آشنا شویم. اگر فراموش نکرده باشید اشاره کردیم بازی های شرط بندی در گذشته محدودیت های زیادی را داشتند.
    حضرات نیترو
    حضرات
    بازی رولت

    اما امروزه قدرت اینرنت باعث شده تا این بازی به سایت های شرط بندی کوچ کنند و از محدودیت مکانی خارج شوند. تاریخ دقیق ظهور سایت های شرط بندی در دسترس نمی باشد.
    بتخته نرد آنلاین نیترو
    تخته نرد آنلاین
    بلک جک نیترو

    اما با رجوع به تاریخچه بعضی از این سایت های از جمله سایت نیترو، خواهیم دریافت که پیشرفت تکنولوژی، بزرگترین ترغیب کننده برای ایجاد این سایت ها بوده است. البته مطالب گفته شده تمامی اطلاعات درباره این سایت های نمی باشد.
    پوکر آنلاین نیترو
    پوکر آنلاین
    بلک جک

    شمار سایت های شرط بندی امروزه رو به افزایش می باشد. برای مثال می توانیم به سایت نیترو اشاره کنیم. این سایت معتبر یکی از پیشگامان در این عرصه می باشد که قصد داریم در این مقاله شما را بیشتر با آن آشنا کنیم.
    پاسور آنلاین نیترو
    پاسور آنلاین

    ترفند برد و آموزش بازی انفجار آنلاین و شرطی، نیترو بهترین و پرمخاطب ‌ترین سایت انفجار ایرانی، نحوه برد و واقعیت ربات ها و ...
    Visit https://www.wmsociety.org/
    here for more information

    ReplyDelete

  18. The Tour de France 2020 Live starts on August 29. NBC will broadcast the Tour daily and offer full, commercial-free livestream coverage via the NBC Sports Gold Cycling Pass for. FuboTV also provides access to NBC’s coverage for you.

    ReplyDelete
  19. How to watch US Open Tennis Live TV Channel Live stream, dates, times, schedule The US Open 2020 will be shown live on Amazon Prime Video and we’ve got all the information you need to know including times, dates and daily schedules.

    ReplyDelete
  20. The way you write, you are really a professional blogger. https://eezbatteryreconditioning.com/

    ReplyDelete
  21. Good site! I really love how it is simple on my eyes and the data are well written. I’m wondering how I might be notified whenever a new post has been made.the underground fat loss manual reviews

    ReplyDelete
  22. Outstanding article! I want people to know just how good this information is in your article. Your views are much like my own concerning this subject. I will visit daily your blog because I know. It may be very beneficial for me. https://mikeacnenomore.com/

    ReplyDelete
  23. Thanks for the blog filled with so many information. Stopping by your blog helped me to get what I was looking for. Now my task has become as easy as ABC.super fat burning system reviews

    ReplyDelete
  24. Thanks for the blog filled with so many information. Stopping by your blog helped me to get what I was looking for. Now my task has become as easy as ABC.sue heintze review

    ReplyDelete
  25. What a well written and compassionate article. I found your thoughts and wisdom to be encouraging and helpful. https://paleohacksinfo.com/

    ReplyDelete
  26. Thanks for the blog filled with so many information. Stopping by your blog helped me to get what I was looking for. Now my task has become as easy as ABC.https://the28dayketochallenge.com/

    ReplyDelete
  27. I respect this article for the very much investigated substance and magnificent wording. I got so included in this material that I couldn't quit perusing. I am awed with your work and aptitude. Much obliged to you to such an extent.exercises to unlock hip flexors

    ReplyDelete
  28. This is certainly as well a really good posting we seriously experienced looking through. It is far from on a daily basis we have risk to check out a little something.https://yogaburnmag.com/

    ReplyDelete
  29. I have recently started a blog, the info you provide on this site has helped me greatly. Thanks for all of your time & work
    business analytics course

    ReplyDelete
  30. Purchasing Individual Health Insurance: 3 Essential Tips From a Health Insurance Specialist By Shaun P Avery. 2000 Backlink at cheapest
    5000 Backlink at cheapest
    Boost DA upto 15+ at cheapest
    Boost DA upto 25+ at cheapest
    Boost DA upto 35+ at cheapest
    Boost DA upto 45+ at cheapest . Submitted On June 15, 2012 Suggest Article Comments Print ArticleShare this article on Facebook3Share this article on Twitter2Share

    ReplyDelete
  31. Many businesses which are hoping to increase their online presence are hiring an SEO company or using SEO services well to gain every single benefit while achieving their goals. # BOOST Your GOOGLE RANKING.It’s Your Time To Be On #1st Page
    Our Motive is not just to create links but to get them indexed as will
    Increase Domain Authority (DA).We’re on a mission to increase DA PA of your domain
    High Quality Backlink Building Service
    1000 Backlink at cheapest
    50 Free Backlink
    Although attempting to learn SEO yourself is a good idea as well as a tempting one, there are high chances you might implement SEO techniques wrongly because they need months and years of experience.

    ReplyDelete
  32. Data visualization is one of those things that is easier said than done. In theory, it's pretty straightforward: take a large set of data, and then present it in a way that best tells the story of that data. In practice, however, this concept can be harder to implement.

    ReplyDelete
  33. Sir apki website par original content diya jata ha muje apke dyra di gyi jankari bhout acchi lagi agar ap moje yh bta ske ki agla airtical kab ayega your very very knowledge full airtical thanks sir:
    short story in hindi motivational

    ReplyDelete
  34. Good site! I really love how it is simple on my eyes and the data are well written. I’m wondering how I might be notified whenever a new post has been made. TheJustReviews

    ReplyDelete
  35. Thanks for the blog filled with so many information. Stopping by your blog helped me to get what I was looking for. Now my task has become as easy as ABC. https://thejustreviews.com

    ReplyDelete
  36. Your post is very informative. I want to share this post with more people, so I bookmarked your website and shared your post on my website. Thanks for your help
    english short english stories

    ReplyDelete