Wednesday, May 2, 2012

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part III


Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays

For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA.  Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was visualized in HTML5, D3 and Google Maps.  The previous blogs on this series are:
  1. Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part II
  2. Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps
  3. Getting Historical Weather Data in R and SAP HANA 
  4. Tracking SFO Airport's Performance Using R, HANA and D3
In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather data to understand the reasons behind the airport/airlines delay.  Why weather - because weather is one of the commonly cited reasons in the airlines industry for flight delays.  Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines' claims.  However, I will not be doing this here, I will just be displaying the mashed-up data.

I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport's performance using a HTML5 calendar built from scratch using D3.js.  One can use all 20 years of data and for all the airports to extend this example.  I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?).  Here is how the final result will look like in HTML5:



Click here to interact with the live example.  Hover over any cell in the live example and a tool tip with comprehensive analytics will show the break down of the performance delay for the selected cell including weather data and correct icons* - result of a mash-up.  Choose a different airport from the drop-down to change the performance calendar. 
* Weather icons are properties of Weather Underground.

As anticipated, SFO airport had more red on the calendar than SJC and OAK.  SJC definitely is the best performing airport in the bay-area.  Contrary to my expectation, weather didn't cause as much havoc on SFO as one would expect, strange?

Creating a mash-up in R for these two data-sets was super easy and a CSV output was produced to work with HTML5/D3.  Here is the R code and if it not clear from all my previous blogs: I just love data.table package.


###########################################################################################  

# Percent delayed flights from three bay area airports, a break up of the flights delay by various reasons, mash-up with weather data

###########################################################################################  

baa.hp.daily.flights <- baa.hp[,list( TotalFlights=length(DepDelay), CancelledFlights=sum(Cancelled, na.rm=TRUE)), 

                             by=list(Year, Month, DayofMonth, Origin)]
setkey(baa.hp.daily.flights,Year, Month, DayofMonth, Origin)

baa.hp.daily.flights.delayed <- baa.hp[DepDelay>15,
                                     list(DelayedFlights=length(DepDelay), 
                                      WeatherDelayed=length(WeatherDelay[WeatherDelay>0]),
                                      AvgDelayMins=round(sum(DepDelay, na.rm=TRUE)/length(DepDelay), digits=2),
                                      CarrierCaused=round(sum(CarrierDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      WeatherCaused=round(sum(WeatherDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      NASCaused=round(sum(NASDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      SecurityCaused=round(sum(SecurityDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),
                                      LateAircraftCaused=round(sum(LateAircraftDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2)), by=list(Year, Month, DayofMonth, Origin)]
setkey(baa.hp.daily.flights.delayed, Year, Month, DayofMonth, Origin)

# Merge two data-tables
baa.hp.daily.flights.summary <- baa.hp.daily.flights.delayed[baa.hp.daily.flights,list(Airport=Origin,
                           TotalFlights, CancelledFlights, DelayedFlights, WeatherDelayed, 
                           PercentDelayedFlights=round(DelayedFlights/(TotalFlights-CancelledFlights), digits=2),
                           AvgDelayMins, CarrierCaused, WeatherCaused, NASCaused, SecurityCaused, LateAircraftCaused)]
setkey(baa.hp.daily.flights.summary, Year, Month, DayofMonth, Airport)

# Merge with weather data
baa.hp.daily.flights.summary.weather <-baa.weather[baa.hp.daily.flights.summary]
baa.hp.daily.flights.summary.weather$Date <- as.Date(paste(baa.hp.daily.flights.summary.weather$Year, 
                                                           baa.hp.daily.flights.summary.weather$Month, 
                                                           baa.hp.daily.flights.summary.weather$DayofMonth, 
                                                           sep="-"),"%Y-%m-%d")
# remove few columns
baa.hp.daily.flights.summary.weather <- baa.hp.daily.flights.summary.weather[, 
            which(!(colnames(baa.hp.daily.flights.summary.weather) %in% c("Year", "Month", "DayofMonth", "Origin"))), with=FALSE]

#Write the output in both JSON and CSV file formats
objs <- baa.hp.daily.flights.summary.weather[, getRowWiseJson(.SD), by=list(Airport)]
# You have now (Airportcode, JSONString), Once again, you need to attach them together.
row.json <- apply(objs, 1, function(x) paste('{\"AirportCode\":"', x[1], '","Data\":', x[2], '}', sep=""))
json.st <- paste('[', paste(row.json, collapse=', '), ']')
writeLines(json.st, "baa-2005-2008.summary.json")                 
write.csv(baa.hp.daily.flights.summary.weather, "baa-2005-2008.summary.csv", row.names=FALSE)


Happy Coding!

11 comments:

  1. dongtam
    mu private
    tim phong tro
    http://nhatroso.com/
    nhac san cuc manh
    tổng đài tư vấn luật
    http://dichvu.tuvanphapluattructuyen.com/
    văn phòng luật
    tổng đài tư vấn luật
    dịch vụ thành lập công ty
    http://we-cooking.com/
    chém gió
    trung tâm ngoại ngữcũng chỉ có U Cơ cùng Minh Yêu bị thương, những người khác nbị tiêu hao đấu khí mà thôi, lúc này Nhạc Thành cũng thở dài nhẹ nhõm.

    - Chủ nhân, Đô Thiên Phần Tiên Trận bị hủy diệt không ít mắt trận, muốn chữa trị luyện chế mới được.

    Đại Song nhìn Nhạc Thành nói.

    - Ân, hai người các ngươi đem Đô Thiên Phần Tiên Trận chữa trị cho tốt.

    Nhạc Thành nhìn hai nàng nói, Đô Thiên Phần Tiên Trận bị Thanh Sơn Lão Tổ, nhóm người Minh U Vương mạnh mẽ phá trận ra, Nhạc Thành cũng biết bị tổn hại không ít mắt trận, ngoài ra Thiên Long Phục Ma Trận có chút tổn hại, cũng phải chữa trị mới được.

    - Mọi người không có chuyện gì cứ ở lại Hạo Thiên Tháp tu luyện, tạm thời người không bế quan

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Using Dell printers and during usage got this Windows 10 Error Code 0xc000000f? dont get hesitate or confused just visit us and follow our blog as well contact us on our toll-free number we had an amazing team for 24*7 support.

    ReplyDelete
  4. Plumbing & HVAC Services San Diego
    Air Star Heating guarantees reliability and quality for all equipment and services
    Air Star Heating is specializing in providing top-quality heating, ventilating, air conditioning, and plumbing services to our customers and clients.
    Our company is leading the market right now. By using our seamless and huge array of services. Our customers can now have the privilege of taking benefit from our services very easily and swiftly. To cope up with the desires and needs of our clients we have built an excellent reputation. We are already having a huge list of satisfied customers that seem to be very pleased with our services.

    Plumbing & HVAC Services in San Diego. Call now (858) 900-9977 ✓Licensed & Insured ✓Certified Experts ✓Same Day Appointment ✓Original Parts Only ✓Warranty On Every Job.
    Visit:- https://airstarheating.com

    ReplyDelete
  5. https://ipsnews.net/business/2021/02/15/wind-turbine-blade-market-2021-industry-analysis-size-share-growth-trends-and-forecast-to-2027/

    ReplyDelete
  6. Wind Turbine Blade Market 2021-2027

    A New Market Study, Titled “Wind Turbine Blade Market Upcoming Trends, Growth Drivers and Challenges” has been featured on fusionmarketresearch.

    Description

    This global study of the Wind Turbine Blade market offers an overview of the existing market trends, drivers, restrictions, and metrics and also offers a viewpoint for important segments. The report also tracks product and services demand growth forecasts for the market. There is also to the study approach a detailed segmental review. A regional study of the global Wind Turbine Blade industry is also carried out in North America, Latin America, Asia-Pacific, Europe, and the Near East & Africa. The report mentions growth parameters in the regional markets along with major players dominating the regional growth.

    Request a Sample Report @ https://www.fusionmarketresearch.com/sample_request/(COVID-19-Version)-Global-Wind-Turbine-Blade-Market/12430

    ReplyDelete
  7. Wind Turbine Blade Market 2021-2027

    A New Market Study, Titled “Wind Turbine Blade Market Upcoming Trends, Growth Drivers and Challenges” has been featured on fusionmarketresearch.

    Description

    This global study of the Wind Turbine Blade market offers an overview of the existing market trends, drivers, restrictions, and metrics and also offers a viewpoint for important segments. The report also tracks product and services demand growth forecasts for the market. There is also to the study approach a detailed segmental review. A regional study of the global Wind Turbine Blade industry is also carried out in North America, Latin America, Asia-Pacific, Europe, and the Near East & Africa. The report mentions growth parameters in the regional markets along with major players dominating the regional growth.

    Request a Sample Report @WIND TURBINE BLADE MARKET 2021, INDUSTRY ANALYSIS, SIZE, SHARE, GROWTH, TRENDS AND FORECAST TO 2027

    ReplyDelete
  8. Chemical Milling Market 2021-2027

    A New Market Study, Titled “Chemical Milling Market Upcoming Trends, Growth Drivers and Challenges” has been featured on fusionmarketresearch.

    Description

    This global study of the Chemical Milling market offers an overview of the existing market trends, drivers, restrictions, and metrics and also offers a viewpoint for important segments. The report also tracks product and services demand growth forecasts for the market. There is also to the study approach a detailed segmental review. A regional study of the global Chemical Milling industry is also carried out in North America, Latin America, Asia-Pacific, Europe, and the Near East & Africa. The report mentions growth parameters in the regional markets along with major players dominating the regional growth.

    Request a Sample Report @ "Chemical Milling Market
    "

    ReplyDelete
  9. https://voxbikol.com/news/439736/tea-light-candles-market-size-share-2022-growth-analysis-by-competitors-strategy-future-demands-top-players-and-industry-consumption-to-2028/

    ReplyDelete
  10. Protein Characterization and Identification Market Status (2016-2020) and Forecast (2021E-2028F) by Region, Product Type & End-Use
    Protein Characterization and Identification market

    Overview

    At the beginning of a recently published report on the global Protein Characterization and Identification market, extensive analysis of the industry has been done with an insightful explanation. The overview has explained the potential of the market and the role of key players that have been portrayed in the information that revealed the applications and manufacturing technology required for the growth of the global Protein Characterization and Identification market.

    Protein Characterization and Identification market

    ReplyDelete