tag:blogger.com,1999:blog-71330393404816868422024-03-26T23:37:56.716-07:00All Things RJitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-7133039340481686842.post-1735799240031389422015-10-07T11:47:00.002-07:002015-10-07T11:49:14.850-07:00Treasure Trove of R Scripts for Auto Classification, Chart Generation, Solr, Mongo, MySQL and Ton More<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="box-sizing: border-box; color: #333333; font-family: 'Helvetica Neue', Helvetica, 'Segoe UI', Arial, freesans, sans-serif; font-size: 16px; line-height: 25.6px; margin-bottom: 16px;">
In this <a href="https://github.com/datadolphyn/R" target="_blank">repository hosted at github</a>, the datadolph.in team is sharing all of the R codebase that it developed to analyze large quantities of data.<br />
datadolph.in team has benefited tremendously from fellow R bloggers and other open source communities and is proud to contribute all of its codebase into the community.</div>
<div style="box-sizing: border-box; color: #333333; font-family: 'Helvetica Neue', Helvetica, 'Segoe UI', Arial, freesans, sans-serif; font-size: 16px; line-height: 25.6px; margin-bottom: 16px;">
The codebase includes ETL and integration scripts on -</div>
<ul style="box-sizing: border-box; color: #333333; font-family: 'Helvetica Neue', Helvetica, 'Segoe UI', Arial, freesans, sans-serif; font-size: 16px; line-height: 25.6px; margin-bottom: 0px !important; margin-top: 0px; padding: 0px 0px 0px 2em;">
<li style="box-sizing: border-box;">R-Solr Integration</li>
<li style="box-sizing: border-box;">R-Mongo Interaction</li>
<li style="box-sizing: border-box;">R-MySQL Interaction</li>
<li style="box-sizing: border-box;">Fetching, cleansing and transforming data</li>
<li style="box-sizing: border-box;">Classification (identify column types)</li>
<li style="box-sizing: border-box;">Default chart generation (based on simple heuristics and matching a dimension with a measure)</li>
</ul>
<div>
<span style="color: #333333; font-family: Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif;"><span style="line-height: 25.6px;"><br /></span></span></div>
<div>
<span style="color: #333333; font-family: Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif;"><span style="line-height: 25.6px;">Github Source: https://github.com/datadolphyn/R</span></span></div>
</div>
Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com39tag:blogger.com,1999:blog-7133039340481686842.post-7282758750424774332013-11-19T00:06:00.001-08:002013-11-19T00:23:21.534-08:00R and Solr Integration Using Solr's REST APIs<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2Yqho4jOOuqa3IQDVJ82ezaYXCyPSw2vypuMZMvNyV8ZZMT9ULoISrwPXgapNJXfTkmJqFZx7cTCMWqO9YDqC8Vm2CC_qVQh1cPZdO_D8ESB7oeEl-SpNcZ-7OW_lLKTgOruT54v3FLw/s1600/R-Solr.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="62" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2Yqho4jOOuqa3IQDVJ82ezaYXCyPSw2vypuMZMvNyV8ZZMT9ULoISrwPXgapNJXfTkmJqFZx7cTCMWqO9YDqC8Vm2CC_qVQh1cPZdO_D8ESB7oeEl-SpNcZ-7OW_lLKTgOruT54v3FLw/s200/R-Solr.png" width="200" /></a></div>
<div style="text-align: left;">
Solr is the most popular, fast and reliable open source enterprise search platform from the Apache Luene project. Among many other features, we love its powerful full-text search, hit highlighting, faceted search, and near real-time indexing. Solr powers the search and navigation features of many of the world's largest internet sites. Solr, written in Java, uses the Lucene Java search library for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language including R. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
We invested significant amount of time integrating our R-based data-management platform with Solr using HTTP/JSON based REST interface. This integration allowed us to index millions of data-sets in solr in real-time as these data-sets get processed by R. It took us few days to stabilize and optimize this approach and we are very proud to share this approach and source code with you. The full source code can be found and downloaded from <a href="https://github.com/datadolphyn/R/blob/master/r_solr_integration.R" target="_blank">datadolph.in's git repository</a>. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
The script has R functions for:</div>
<div style="text-align: left;">
<ul style="text-align: left;">
<li>querying Solr and returning matching docs</li>
<li>posting a document to solr (taking a list and converting it to JSON before posting it)</li>
<li>deleting all indexes, deleting indexes for a certain document type and for a certain category within document type</li>
</ul>
</div>
<div style="text-align: left;">
<div>
<span style="color: #999999;"> # query a field for the text and return docs</span></div>
<div>
<span style="color: #999999;"> </span>querySolr <- function(queryText, queryfield="all") {</div>
<div>
response <- fromJSON(getURL(paste(getQueryURL(), queryfield, ":", queryText, sep="")))</div>
<div>
if(!response$responseHeader$status) #if 0</div>
<div>
return(response$response$docs)</div>
<div>
}</div>
<div>
<span style="color: #999999;"><br /></span></div>
<div>
<span style="color: #999999;"> # delete all indexes from solr server</span></div>
<div>
<span style="color: #999999;"> </span>deleteAllIndexes <-function() {</div>
<div>
response <- postForm(getUpdateURL(),</div>
<div>
.opts = list(postfields = '{"delete": {"query":"*:*"}}',</div>
<div>
httpheader = c('Content-Type' = 'application/json', </div>
<div>
Accept = 'application/json')</div>
<div>
ssl.verifypeer=FALSE</div>
<div>
)</div>
<div>
) #end of PostForm</div>
<div>
return(fromJSON(response)$responseHeader[1])</div>
<div>
}</div>
<div>
<br /></div>
</div>
<div style="text-align: left;">
<div>
<span style="color: #999999;"> # delete all indexes for a document type from solr server </span></div>
<div>
<span style="color: #999999;"> # in this example : type = sports</span></div>
<div>
deleteSportsIndexes <-function() {</div>
<div>
response <- postForm(getUpdateURL(),</div>
<div>
.opts = list(postfields = '{"delete": {"query":"type:sports"}}',</div>
<div>
httpheader = c('Content-Type' = 'application/json', </div>
<div>
Accept = 'application/json'),</div>
<div>
ssl.verifypeer=FALSE</div>
<div>
)</div>
<div>
) #end of PostForm</div>
<div>
return(fromJSON(response)$responseHeader[1])</div>
<div>
}</div>
<div>
<span style="color: #999999;"><br /></span></div>
<div>
<span style="color: #999999;"> # delete indexes for all baskeball category in sports type from solr server </span></div>
<div>
<span style="color: #999999;"> # in this example : type = sports and category: basketball</span></div>
<div>
deleteSportsIndexesForCat <-function(category) {</div>
<div>
response <- postForm(getUpdateURL(),</div>
<div>
.opts = list(postfields = </div>
<div>
paste('{"delete": {"query":"type:sports AND category:', category, '"}}', sep=""),</div>
<div>
httpheader = c('Content-Type' = 'application/json', </div>
<div>
Accept = 'application/json'),</div>
<div>
ssl.verifypeer=FALSE</div>
<div>
)</div>
<div>
) #end of PostForm</div>
<div>
return(fromJSON(response)$responseHeader[1])</div>
<div>
}</div>
<div>
#deletePadIndexesForCat("baskeball")</div>
<div>
<br /></div>
<div>
<span style="color: #999999;"> #Post a new document to Solr</span></div>
<div>
<span style="color: #999999;"> </span>postDoc <- function(doc) { </div>
<div>
solr_update_url <- getUpdateURL()</div>
<div>
jsonst <- toJSON(list(doc))</div>
<div>
response <- postForm(solr_update_url,</div>
<div>
.opts = list(postfields = jsonst,</div>
<div>
httpheader = c('Content-Type' = 'application/json', </div>
<div>
Accept = 'application/json'),</div>
<div>
ssl.verifypeer=FALSE</div>
<div>
)) #end of PostForm</div>
<div>
return(fromJSON(response)$responseHeader[1])</div>
<div>
########## Commit - only if it doesn't work the other way ###############</div>
<div>
#return(fromJSON(getURL(getCommitURL())))</div>
<div>
}</div>
<div>
<span style="color: #999999;"><br /></span></div>
<div>
Happy Coding!</div>
</div>
</div>
Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com16tag:blogger.com,1999:blog-7133039340481686842.post-74189000202788917852013-06-14T16:41:00.000-07:002013-10-09T20:11:13.993-07:00Simulating Map-Reduce in R for Big Data Analysis Using Flights Data<div dir="ltr" style="text-align: left;" trbidi="on">
We are constantly crunching through large amounts of data and designing unique and innovative ways to process large datasets on a single node and use distributed computing only when single node computing becomes time consuming and less efficient. <br />
<br />
We are happy to share with the R community one such unique map-reduce like approach we designed in R for a single node to process flights data (<a href="http://stat-computing.org/dataexpo/2009/" target="_blank">available here</a>) which has ~122 million records and occupies 12GB of space when uncompressed. We used Mathew Dowle's<a href="http://datatable.r-forge.r-project.org/" target="_blank"> data.table package</a> heavily to load and analyze large datasets. <br />
<br />
It took us few days to stabilize and optimize this approach and we are very proud to share this approach and source code with you. The full source code can be found and downloaded from <a href="https://github.com/datadolphyn/R" target="_blank">datadolph.in's git repository.</a><br />
<br />
Here is how we approached this problem: First, before loading the datasets in R, we compressed each of the 22 CSV files using gunzip for faster reading in R. The method read.csv can read gzip files faster than it can read uncompressed files:<br />
<br />
<span style="color: #999999; font-family: Verdana, sans-serif; font-size: x-small;"># load list of all files</span><br />
<span style="color: #999999; font-family: Verdana, sans-serif; font-size: x-small;"> flights.files <- list.files(path=flights.folder.path, pattern="*.csv.gz")</span><br />
<span style="color: #999999; font-family: Verdana, sans-serif; font-size: x-small;"><br /></span>
<span style="color: #999999; font-family: Verdana, sans-serif; font-size: x-small;"># read files in data.table</span><br />
<span style="color: #999999; font-family: Verdana, sans-serif; font-size: x-small;"> flights <- data.table(read.csv(flights.files[i], stringsAsFactors=F))</span><br />
<div>
<br /></div>
Next, we mapped the analysis we wanted to run to extract insights from each of the datasets. This approach included extracting flight level, airlines level and airport level aggregated analysis and generating intermediate results. Here is example code to get stats for each airline by year:<br />
<br />
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">getFlightsStatusByAirlines <- function(flights, yr){ </span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> # by Year</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> if(verbose) cat("Getting stats for airlines:", '\n')</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> airlines.stats <- flights[, list(</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> dep_airports=length(unique(origin)),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights=length(origin),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights_cancelled=sum(cancelled, na.rm=T),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights_diverted=sum(diverted, na.rm=T),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights_departed_late=length(which(depdelay > 0)),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights_arrived_late=length(which(arrdelay > 0)),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> total_dep_delay_in_mins=sum(depdelay[which(depdelay > 0)]),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> avg_dep_delay_in_mins=round(mean(depdelay[which(depdelay > 0)])),</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> median_dep_delay_in_mins=round(median(depdelay[which(depdelay > 0)])), </span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> miles_traveled=sum(distance, na.rm=T)</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> ), by=uniquecarrier][, year:=yr]</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> #change col order</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> setcolorder(airlines.stats, c("year", colnames(airlines.stats)[-ncol(airlines.stats)]))</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> #save this data</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> saveData(airlines.stats, paste(flights.folder.path, "stats/5/airlines_stats_", yr, ".csv", sep=""))</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> #clear up space</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> rm(airlines.stats) </span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> # continue.. see git full code</span></div>
<div style="text-align: left;">
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">}</span></div>
<br />
Here is a copy of the map function:<br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"><br /></span>
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">#map all calculations </span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">mapFlightStats <- function(){</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> for(j in 1:period) {</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> yr <- as.integer(gsub("[^0-9]", "", gsub("(.*)(\\.csv)", "\\1", flights.files[j])))</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights.data.file <- paste(flights.folder.path, flights.files[j], sep="")</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> if(verbose) cat(yr, ": Reading : ", flights.data.file, "\n")</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> flights <- data.table(read.csv(flights.data.file, stringsAsFactors=F))</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> setkeyv(flights, c("year", "uniquecarrier", "dest", "origin", "month")) </span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> # call functions</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> getFlightStatsForYear(flights, yr)</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> getFlightsStatusByAirlines(flights, yr)</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> getFlightsStatsByAirport(flights, yr)</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> }</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">} </span><br />
<br />
As one can see, we are generating intermediate results by airlines (and by airports / flights) for each year and storing it on the disk. <b>The map function takes less than 2 hours to run on a MacBook Pro which had 2.3 GHZ dual core processor and 8 GB of memory and generated 132 intermediate datasets containing aggregated analysis. </b><br />
<br />
And finally, we call the reduce function to aggregate intermediate datasets into final output (for flights, airlines and airports):<br />
<br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">#reduce all results</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">reduceFlightStats <- function(){</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> n <- 1:6</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> folder.path <- paste("./raw-data/flights/stats/", n, "/", sep="")</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> print(folder.path)</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> for(i in n){</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> filenames <- paste(folder.path[i], list.files(path=folder.path[i], pattern="*.csv"), sep="") </span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> dt <- do.call("rbind", lapply(filenames, read.csv, stringsAsFactors=F))</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> print(nrow(dt))</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> saveData(dt, paste("./raw-data/flights/stats/", i, ".csv", sep=""))</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;"> }</span><br />
<span style="color: #999999; font-family: Trebuchet MS, sans-serif; font-size: x-small;">}</span><br />
<br /></div>
Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com8tag:blogger.com,1999:blog-7133039340481686842.post-22524904090166389052012-10-29T14:43:00.001-07:002012-10-29T14:45:08.347-07:00Pull Yahoo Finance Key-Statistics Instantaneously Using XML and XPath in R<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<a href="http://goo.gl/CBYQI">This two-part blog post I published a day ago</a> required <a href="http://finance.yahoo.com/q/ks?s=msft">key-stats from Yahoo Finance</a> for all the companies in the control group I created for my research. I wanted all the key-stats pulled, arranged in a data-frame and then present them side-by-side to form my opinions. <br />
<br />
Quantmod package has "getQuote" method which should return the desired metrics. The number and names of the metrics can be controlled via "yahooQF" (see the script below.) Unfortunately, this method seems to be broken as the resulting data-frame had large amount of null values for some metrics. Here is the script nonetheless if one wishes to experiment:<br />
<br />
<div style="text-align: left;">
<i><span style="color: #c27ba0;">#######################################################################</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"># Script to download key metrics for a set of stock tickers using the quantmod package</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">#######################################################################</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">require(quantmod)</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">require("plyr")</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">what_metrics <- yahooQF(c("Price/Sales", </span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"> "P/E Ratio",</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"> "Price/EPS Estimate Next Year",</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"> "PEG Ratio",</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"> "Dividend Yield", </span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"> "Market Capitalization"))</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"><br /></span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">tickers <- c("AAPL", "FB", "GOOG", "HPQ", "IBM", "MSFT", "ORCL", "SAP")</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"># Not all the metrics are returned by Yahoo.</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">metrics <- getQuote(paste(tickers, sep="", collapse=";"), what=what_metrics)</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"><br /></span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">#Add tickers as the first column and remove the first column which had date stamps</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">metrics <- data.frame(Symbol=tickers, metrics[,2:length(metrics)]) </span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"><br /></span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">#Change colnames</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">colnames(metrics) <- c("Symbol", "Revenue Multiple", "Earnings Multiple", </span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"> "Earnings Multiple (Forward)", "Price-to-Earnings-Growth", "Div Yield", "Market Cap")</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"><br /></span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">#Persist this to the csv file</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">write.csv(metrics, "FinancialMetrics.csv", row.names=FALSE)</span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;"><br /></span></i></div>
<div style="text-align: left;">
<i><span style="color: #c27ba0;">#######################################################################</span></i></div>
<br />
After some digging around and staring at the raw HTML for Yahoo's KeyStats page for sometime, I decided to use the XML package and the XPath operators to get all the nodes which host key stats (name and values). This turned out to be lot simpler. Let's walk through this using three easy steps:<br />
<br />
1) The CSS class name for the HTML nodes which host the name of the metric such as Market Cap or Enterprise value is "yfnc_tablehead1". This made it quite easy to grab all the elements from the HTML tree with this class name:<br />
<div style="text-align: center;">
<i><span style="color: #c27ba0;"> nodes <- getNodeSet(html_text, "/*//td[@class='yfnc_tablehead1']")</span></i></div>
<br />
2) Now all I needed to do was get the value of this node using xmlValue function to get the name of the metric (Enterprise Value as an example):<br />
<div style="text-align: center;">
<i><span style="color: #c27ba0;"> measures <- sapply(nodes, xmlValue)</span></i></div>
<br />
3) Next, to get the value of any metric, I used the getSibling function to get the adjacent node (i.e. sibling) and used xmlValue function to get the value. Here is how it was done:<br />
<div style="text-align: center;">
<span style="color: #c27ba0;">values <- sapply(nodes, function(x) xmlValue(getSibling(x)))</span></div>
<div style="text-align: center;">
<br /></div>
This is it, I then used some other common functions to clean up column names and constructed a data-frame to arrange the key-stats in a columnar fashion. Here is the final script and the result is shown in the graphics below. Please feel free to use this and share it with other R enthusiasts:<br />
<br />
<i><span style="color: #c27ba0;">#######################################################################</span></i><br />
<i><span style="color: #c27ba0;">##Alternate method to download all key stats using XML and x_path - PREFERRED WAY</span></i><br />
<i><span style="color: #c27ba0;">#######################################################################</span></i><br />
<i><span style="color: #c27ba0;"><br /></span></i>
<i><span style="color: #c27ba0;">setwd("C:/Users/i827456/Pictures/Blog/Oct-25")</span></i><br />
<i><span style="color: #c27ba0;">require(XML)</span></i><br />
<i><span style="color: #c27ba0;">require(plyr)</span></i><br />
<i><span style="color: #c27ba0;">getKeyStats_xpath <- function(symbol) {</span></i><br />
<i><span style="color: #c27ba0;"> yahoo.URL <- "http://finance.yahoo.com/q/ks?s="</span></i><br />
<i><span style="color: #c27ba0;"> html_text <- htmlParse(paste(yahoo.URL, symbol, sep = ""), encoding="UTF-8")</span></i><br />
<i><span style="color: #c27ba0;"><br /></span></i>
<i><span style="color: #c27ba0;"> #search for <td> nodes anywhere that have class 'yfnc_tablehead1'</span></i><br />
<i><span style="color: #c27ba0;"> nodes <- getNodeSet(html_text, "/*//td[@class='yfnc_tablehead1']")</span></i><br />
<i><span style="color: #c27ba0;"> </span></i><br />
<i><span style="color: #c27ba0;"> if(length(nodes) > 0 ) {</span></i><br />
<i><span style="color: #c27ba0;"> measures <- sapply(nodes, xmlValue)</span></i><br />
<i><span style="color: #c27ba0;"> </span></i><br />
<i><span style="color: #c27ba0;"> #Clean up the column name</span></i><br />
<i><span style="color: #c27ba0;"> measures <- gsub(" *[0-9]*:", "", gsub(" \\(.*?\\)[0-9]*:","", measures)) </span></i><br />
<i><span style="color: #c27ba0;"> </span></i><br />
<i><span style="color: #c27ba0;"> #Remove dups</span></i><br />
<i><span style="color: #c27ba0;"> dups <- which(duplicated(measures))</span></i><br />
<i><span style="color: #c27ba0;"> #print(dups) </span></i><br />
<i><span style="color: #c27ba0;"> for(i in 1:length(dups)) </span></i><br />
<i><span style="color: #c27ba0;"> measures[dups[i]] = paste(measures[dups[i]], i, sep=" ")</span></i><br />
<i><span style="color: #c27ba0;"> </span></i><br />
<i><span style="color: #c27ba0;"> #use siblings function to get value</span></i><br />
<i><span style="color: #c27ba0;"> values <- sapply(nodes, function(x) xmlValue(getSibling(x)))</span></i><br />
<i><span style="color: #c27ba0;"> </span></i><br />
<i><span style="color: #c27ba0;"> df <- data.frame(t(values))</span></i><br />
<i><span style="color: #c27ba0;"> colnames(df) <- measures</span></i><br />
<i><span style="color: #c27ba0;"> return(df)</span></i><br />
<i><span style="color: #c27ba0;"> } else {</span></i><br />
<i><span style="color: #c27ba0;"> break</span></i><br />
<i><span style="color: #c27ba0;"> }</span></i><br />
<i><span style="color: #c27ba0;">}</span></i><br />
<i><span style="color: #c27ba0;"><br /></span></i>
<i><span style="color: #c27ba0;">tickers <- c("AAPL")</span></i><br />
<i><span style="color: #c27ba0;">stats <- ldply(tickers, getKeyStats_xpath)</span></i><br />
<i><span style="color: #c27ba0;">rownames(stats) <- tickers</span></i><br />
<i><span style="color: #c27ba0;">write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE) </span></i><br />
<br />
<i><span style="color: #c27ba0;">#######################################################################</span></i><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIgwwgmWa79V8hP1u2_-nukdZp6XPsc16wMGiRQUs1djRmqgc9xf1OXUeUx07Sq4ciTO8EIPDbwEn_qq5S1nM8RNDbujBR-oNIoDPoiVopqwlv2DxrVaM6PVRRlid1wGRo1Crvh-curHQ/s1600/Key-Stats-From-Yahoo-Finance.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIgwwgmWa79V8hP1u2_-nukdZp6XPsc16wMGiRQUs1djRmqgc9xf1OXUeUx07Sq4ciTO8EIPDbwEn_qq5S1nM8RNDbujBR-oNIoDPoiVopqwlv2DxrVaM6PVRRlid1wGRo1Crvh-curHQ/s1600/Key-Stats-From-Yahoo-Finance.png" /></a></div>
<div>
<i><span style="color: #c27ba0;"><br /></span></i></div>
<div>
<i><span style="color: #c27ba0;"><br /></span></i></div>
<br />
<br />
Happy Analyzing!<br />
All Things R &<br />
All Things Analytics (http://goo.gl/CBYQI)<br />
<br />
<br /></div>
Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com164tag:blogger.com,1999:blog-7133039340481686842.post-62608479051568122672012-05-23T14:37:00.001-07:002012-05-23T14:42:44.962-07:00If You are a R Developer, Then You Must Try SAP HANA for Free.<br />
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-size: small;">
<span style="font-family: Calibri, sans-serif;"><u><br class="Apple-interchange-newline" />This is a guest blog from </u></span><span style="font-family: Calibri, sans-serif;"><u>Alvaro Tejada Galindo</u>, my colleague and fellow R and SAP HANA enthusiast. I am thankful to Alvaro for coming and posting on "AllThingsR".</span></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-size: small;">
<span style="font-family: Calibri, sans-serif;"><br /></span></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-size: small;">
<span style="font-family: Calibri, sans-serif;">Are you an R developers? Have ever heard of SAP HANA? Would you like to test</span><span style="font-family: Calibri, sans-serif;"> </span><span style="font-family: Calibri, sans-serif;">SAP HANA for free?</span></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
SAP HANA is an In-Memory Database Technology allowing developers to analyze big data in real-time.</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
Processes that took hours now take seconds due to SAP HANA's power to keep everything on RAM memory.</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
As announced in SAP Sapphire Now event in Orlando, Florida, SAP HANA is free for developers. You just need to download and install both the SAP HANA Client and the SAP HANA Studio, and create an SAP HANA Server on the Amazon Web Services as described in the following document:</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
Get your own SAP HANA DB server on Amazon Web Services - <a href="http://scn.sap.com/docs/DOC-28294" style="color: #1155cc;" target="_blank">http://scn.sap.com/docs/DOC-<wbr></wbr>28294</a></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
Why should this interest you? Easy...SAP HANA is an agent of change bringing speed to its limits and it can also be integrated with R as described in the following blog:</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
When SAP HANA met R - First kiss - <a href="http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r--first-kiss" style="color: #1155cc;" target="_blank">http://scn.sap.com/community/<wbr></wbr>developer-center/hana/blog/<wbr></wbr>2012/05/21/when-sap-hana-met-<wbr></wbr>r--first-kiss</a></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
Want to know more about SAP HANA? Read everything you need here: <a href="http://developers.sap.com/" style="color: #1155cc;" target="_blank">http://developers.sap.com</a></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
You're convinced but don't want to pay for the Amazon Web Services? No problem. Just leave a comment including your name, company and email. We will reach you and send you an Amazon Gift Card so you can get started. Of course, your feedback would be greatly appreciated. Of course, we only a limited set of gift cards, so be quick or be out.</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
<b>Author Alvaro Tejada Galindo</b>, mostly known as "Blag" is a Development Expert working for the Technology Innovation and Developer Experience team in SAP Labs. He can be contacted at <span style="color: #555555; font-family: arial, sans-serif; font-size: 13px; text-align: left; white-space: nowrap;">a.tejada.galindo@sap.com.</span></div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
<br />
Alvaro's background in his own words: I used to be an ABAP Consultant for 11 years. I worked in implementations on Peru and Canada. I’m also a die hard developer using R, Python, Ruby, PHP, Flex and many more languages. Now, I work for SAP Labs and my main roles are evangelize SAP technologies by writing blogs, articles, helping people on the forums, attending SAP events, besides many other “Developer engagement” activities.</div>
<div style="background-color: rgba(255, 255, 255, 0.917969); color: #222222; font-family: Calibri, sans-serif; font-size: small;">
I maintain a blog called “Blag’s bag of rants” at <a href="http://blagrants.blogspot.com/" style="color: #1155cc;" target="_blank">blagrants.blogspot.com</a></div>Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com112tag:blogger.com,1999:blog-7133039340481686842.post-79947981147275357022012-05-02T13:46:00.001-07:002012-05-02T13:52:23.078-07:00Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part III<br />
<div class="post-body entry-content" id="post-body-8571422585343504190" itemprop="articleBody" style="line-height: 1.4; width: 660px;">
<b style="background-color: white; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: medium;">Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays</span></b><br />
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white;"><span style="font-size: xx-small;"><span style="font-family: inherit;"><br /></span></span></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="font-family: inherit;"><span style="background-color: white;">For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA. </span>Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was <span style="background-color: white;">visualized in HTML5, D3 and Google Maps. The previous blogs on this series are:</span></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
</div>
<ol style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><a href="http://goo.gl/y9pQ4" style="background-color: white; color: #4d469c; text-decoration: none;">Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part II</a></li>
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><a href="http://goo.gl/Uj8KU">Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps</a></li>
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><a href="http://goo.gl/6UkVi" style="background-color: white; color: #4d469c; text-decoration: none;">Getting Historical Weather Data in R and SAP HANA </a></li>
<li style="margin-bottom: 0.25em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><a href="http://goo.gl/7nqXA">Tracking SFO Airport's Performance Using R, HANA and D3</a></li>
</ol>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="font-family: inherit;"><span style="background-color: white;">In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather </span><span style="background-color: white;">data to understand the reasons behind the airport/airlines delay. Why weather - because weather is one of the commonly cited </span></span><span style="background-color: white;">reasons</span><span style="background-color: white; font-family: inherit;"> in the airlines industry for flight delays. Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines' claims. However, I will not be doing this here, I will just be displaying the mashed-up data.</span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;"><br /></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;">I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport's performance using a HTML5 calendar built from scratch using D3.js. One can use all 20 years of data and for all the airports to extend this example. I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?). Here is how the final result will look like in HTML5:</span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9zWMr8vb0DrATPU2qQdOaouxiPUwX21tK_t7pGcSfTbcxEKxL9sSTUgaXwSfh3Fzmm0G8wJJKI22FR2-GHOqxYzB75JJAmvPJBM0W1qLEbg0UvFzsPFd0p4qkbM5mtLenZy_fUY7stso/s1600/BAA-ATA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="334" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9zWMr8vb0DrATPU2qQdOaouxiPUwX21tK_t7pGcSfTbcxEKxL9sSTUgaXwSfh3Fzmm0G8wJJKI22FR2-GHOqxYzB75JJAmvPJBM0W1qLEbg0UvFzsPFd0p4qkbM5mtLenZy_fUY7stso/s640/BAA-ATA.png" width="640" /></a></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;"><br /></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
</div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;"><br /></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<a href="http://goo.gl/SsiQz" style="color: #666666; font-family: inherit; text-decoration: none;">Click here to interact with the live example</a>.<span style="font-family: inherit;"> Hover over any cell in the live example and a </span>tool tip<span style="font-family: inherit;"> with comprehensive </span>analytics<span style="font-family: inherit;"> will show the break down of the performance delay for the selected cell including weather data and correct icons* - result of a mash-up. Choose a different airport from the drop-down to change the performance calendar. </span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;">* W</span><span style="background-color: white; font-size: xx-small;">eather icons are properties of Weather Underground.</span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white;"><span style="font-family: inherit;"><br /></span></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white;"><span style="font-family: inherit;">As anticipated, SFO airport had more red on the calendar than SJC and OAK. SJC </span>definitely is the best performing airport in the bay-area. Contrary to my expectation, weather didn't cause as much havoc on SFO as one would expect, strange?</span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;"><br /></span></div>
<div class="separator" style="background-color: white; clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit;">Creating a mash-up in R for these two data-sets was super easy and a CSV output was produced to work with HTML5/D3. Here is the R code and if it not clear from all my previous blogs: I just love d<a href="http://datatable.r-forge.r-project.org/" style="color: #4d469c; text-decoration: none;">ata.table package</a>.</span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;"></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;">########################################################################################### </span></div>
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: xx-small;"></span></span></span><br />
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;"># Percent delayed flights from three bay area airports, a break up of the flights delay by various reasons, mash-up with weather data</span></div>
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: xx-small;"></span></span></span><br />
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;">########################################################################################### </span></div>
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: xx-small;"></span></span></span><br />
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;">baa.hp.daily.flights <- baa.hp[,list( <span style="font-family: inherit;">TotalFlights=length(DepDelay), CancelledFlights=sum(Cancelled, na.rm=TRUE)), </span></span></div>
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: xx-small;"></span></span></span><br />
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: xx-small;"> by=list(Year, Month, DayofMonth, Origin)]</span></span></span></div>
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"><span style="font-family: inherit; font-size: xx-small;">
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
setkey(baa.hp.daily.flights,Year, Month, DayofMonth, Origin)</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<br /></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.delayed <- baa.hp[DepDelay>15,</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
list(<span style="font-family: inherit;">DelayedFlights=length(DepDelay), </span></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
WeatherDelayed=length(WeatherDelay[WeatherDelay>0]),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
AvgDelayMins=round(sum(DepDelay, na.rm=TRUE)/length(DepDelay), digits=2),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
CarrierCaused=round(sum(CarrierDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
WeatherCaused=round(sum(WeatherDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
NASCaused=round(sum(NASDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
SecurityCaused=round(sum(SecurityDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
LateAircraftCaused=round(sum(LateAircraftDelay, na.rm=TRUE)/sum(DepDelay, na.rm=TRUE), digits=2)<span style="font-family: inherit;">), </span><span style="font-family: inherit;">by=list(Year, Month, DayofMonth, Origin)]</span></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
setkey(baa.hp.daily.flights.delayed, Year, Month, DayofMonth, Origin)</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<br /></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
# Merge two data-tables</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.summary <- baa.hp.daily.flights.delayed[baa.hp.daily.flights,list(Airport=Origin,</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
TotalFlights, CancelledFlights, DelayedFlights, WeatherDelayed, </div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
PercentDelayedFlights=round(DelayedFlights/(TotalFlights-CancelledFlights), digits=2),</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
AvgDelayMins, CarrierCaused, WeatherCaused, NASCaused, SecurityCaused, LateAircraftCaused)]</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="font-family: inherit;">setkey(baa.hp.daily.flights.summary, Year, Month, DayofMonth, Airport)</span></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<br /></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
# Merge with weather data</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.summary.weather <-baa.weather[baa.hp.daily.flights.summary]</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.summary.weather$Date <- as.Date(paste(baa.hp.daily.flights.summary.weather$Year, </div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.summary.weather$Month, </div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.summary.weather$DayofMonth, </div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
sep="-"),"%Y-%m-%d")</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
# remove few columns</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
baa.hp.daily.flights.summary.weather <- baa.hp.daily.flights.summary.weather[, </div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
which(!(colnames(baa.hp.daily.flights.summary.weather) %in% c("Year", "Month", "DayofMonth", "Origin"))), with=FALSE]</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="font-family: inherit;"><br /></span></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="font-family: inherit;">#Write the output in both JSON and CSV file formats</span></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<span style="font-family: inherit;">objs <- baa.hp.daily.flights.summary.weather[, getRowWiseJson(.SD), by=list(Airport)]</span></div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
# You have now (Airportcode, JSONString), Once again, you need to attach them together.</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
row.json <- apply(objs, 1, function(x) paste('{\"AirportCode\":"', x[1], '","Data\":', x[2], '}', sep=""))</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
json.st <- paste('[', paste(row.json, collapse=', '), ']')</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
writeLines(json.st, "baa-2005-2008.summary.json") </div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
write.csv(baa.hp.daily.flights.summary.weather, "baa-2005-2008.summary.csv", row.names=FALSE)</div>
<div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">
<br /></div>
</span></span></span><br />
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
<span style="background-color: white; font-family: inherit; font-size: xx-small;">Happy Coding!</span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: none; outline-width: initial; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">
</div>
</div>Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com11tag:blogger.com,1999:blog-7133039340481686842.post-40291946652462293092012-04-25T12:00:00.000-07:002012-04-25T12:00:02.267-07:00Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 - Part II<br />
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;">In my last blog, <a href="http://goo.gl/Uj8KU">Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps</a>, I analyzed historical airlines performance data set using R and SAP HANA and put the aggregated analysis on Google Maps. Undoubtedly, Map is a pretty exciting canvas to view and analyze big data sets. One could draw shapes (circles, polygons) on the map under a marker pin, providing pin-point information and display aggregated information in the info-window when a marker is clicked. So I enjoyed doing all of that, but I was craving for some old fashion bubble charts and other types of charts to provide comparative information on big data sets. Ultimately, all big data sets get aggregated into smaller analytical sets for viewing, sharing and reporting. An old fashioned chart is the best way to tell a visual story!</span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;">On bubble charts, one could display four dimensional data for comparative analysis. In this blog analysis, I used the same data-set which had 200M data points and went deeper looking at finer slices of information. I leveraged D3, R and SAP HANA for this blog post. Here I am publishing some of this work: </span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;">In this first graphics, the performance of top airlines is compared for 2008. As expected, Southwest, the largest airlines (when using total number of flights as a proxy), performed well for its size (1.2M flights, 64 destinations but average delay was ~10 mins.) Some of the other airlines like American and Continental were the worst performers along with Skywest. Note, I didn't remove outliers from this analysis. <a href="http://goo.gl/OGd4V" style="color: #4d469c; text-decoration: none;" target="_blank">Click here to interact with this example</a> (view source to get D3 code).</span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUJPbrAMGSZeBjmF9Ro4Oo7y56S-k1QPYx8xWeFhimUXhUED-g0etRxK8nADvSQVVKTcOiyKf1YDw8BBOjhDogOnOkgcCZEpXGMqGIvpqFbXUZPKcDurFcwyZM5VFhtgMXx-sO2e4HaqU/s1600/Airlines.png" imageanchor="1" style="background-color: white; color: #4d469c; margin-left: 1em; margin-right: 1em; text-decoration: none;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUJPbrAMGSZeBjmF9Ro4Oo7y56S-k1QPYx8xWeFhimUXhUED-g0etRxK8nADvSQVVKTcOiyKf1YDw8BBOjhDogOnOkgcCZEpXGMqGIvpqFbXUZPKcDurFcwyZM5VFhtgMXx-sO2e4HaqU/s640/Airlines.png" style="border-bottom-style: none; border-color: initial; border-image: initial; border-left-style: none; border-right-style: none; border-top-style: none; border-width: initial; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; position: relative;" width="640" /></a></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: center;">
<span style="background-color: white;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;">In the second analysis, I replaced airlines dimension with airports dimension but kept all the other dimensions the same. To my disbelief, Newark airport is the worst performing airport when it comes to departure delays. Chicago O'Hare, SFO and JFK follow. Atlanta airport is the largest airport but it has the best performance. What are they doing differently at ATL? <a href="http://goo.gl/jieen" style="color: #4d469c; text-decoration: none;" target="_blank">Click here to interact with this example</a> (</span><span style="background-color: white;">view source to get D3 code)</span><span style="background-color: white;">.</span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: left;">
<span style="background-color: white;"><br /></span></div>
<div class="separator" style="clear: both; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5f_ccBRQ89-cpL1eUlbguvxSE-cTQiNQMnxOS3SYA_pEF4qPvSYW-Zqsc-3du_Phny22i7grHdZgyx8YGaqFWpzQ88ZgnI6p7b2kPzggf1p0loqq1DvYhyphenhyphenbPuohlAtaSIBamlAA0Mzus/s1600/Airports.png" imageanchor="1" style="background-color: white; color: #4d469c; margin-left: 1em; margin-right: 1em; text-decoration: none;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5f_ccBRQ89-cpL1eUlbguvxSE-cTQiNQMnxOS3SYA_pEF4qPvSYW-Zqsc-3du_Phny22i7grHdZgyx8YGaqFWpzQ88ZgnI6p7b2kPzggf1p0loqq1DvYhyphenhyphenbPuohlAtaSIBamlAA0Mzus/s640/Airports.png" style="border-bottom-style: none; border-color: initial; border-image: initial; border-left-style: none; border-right-style: none; border-top-style: none; border-width: initial; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; position: relative;" width="640" /></a></div>
<span style="background-color: white;"><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">It was hell of a fun playing with D3, R and HANA, good intellectual stimulation if nothing else! Happy Analyzing and remember possibilities are endless!</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">As always, my R modules are fairly simple and straightforward:</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">########################################################################################### </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">#ETL - Read the AIRPORT Information, get major aiport informatoin extracted and upload this </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">#transfromed dataset into HANA</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">###########################################################################################</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">major.airports <- data.table(read.csv("MajorAirports.csv", header=TRUE, sep=",", stringsAsFactors=FALSE))</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">setkey(major.airports, iata)</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"></span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">all.airports <- data.table(read.csv("AllAirports.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)) </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">setkey(all.airports, iata)</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"></span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">airports.2008.hp <- data.table(read.csv("2008.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)) </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">setkey(airports.2008.hp, Origin, UniqueCarrier)</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"></span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">#Merge two datasets</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">airports.2008.hp <- major.airports[airports.2008.hp,]</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"></span><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"><br /></span></span><br />
<span style="background-color: white;"><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;">########################################################################################### </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"># Get airport statisitics for all airports</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">###########################################################################################</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">airports.2008.hp.summary <- airports.2008.hp[major.airports, </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> TotalFlights=length(Month),</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> TotalDestinations=length(unique(Dest)),</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> URL=paste("http://www.fly", Origin, ".com",sep="")), </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> by=list(Origin)][order(-TotalFlights)]</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">setkey(airports.2008.hp.summary, Origin)</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">#merge two data tables</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">airports.2008.hp.summary <- major.airports[airports.2008.hp.summary, </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> list(Airport=airport, </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations, </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> Address=paste(airport, city, state, sep=", "), </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;"> Lat=lat, Lng=long, URL)][order(-TotalFlights)]</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"><span style="line-height: 18px;"></span></span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)</span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json") </span><br style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;" /><span style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: xx-small;">write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)</span></span>Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com41tag:blogger.com,1999:blog-7133039340481686842.post-21439668193519406712012-04-19T12:59:00.000-07:002012-04-19T14:42:42.054-07:00Getting Historical Weather Data in R and SAP HANAFor many of my latest data blogs, I needed historical weather data to perform data mash-ups to pin-point the cause. For example, for my continued exploration into the airlines/airports historical data using SAP HANA and R, I wanted to find out whether the weather was behind the extreme delay experienced out of a particular airport for a particular day/hour. So I needed to mash-up the weather data with the airlines data for this analysis.<br />
<br />
I looked around but could not find a better way to get the weather data. So I turned to R. Now, to get historical weather data, I am using <a href="http://www.wunderground.com/weather/api/d/documentation.html">Weather Underground's REST APIs</a> and I put together a simple program in R to get the weather data in a data.frame. This R module gets called from SAP HANA and it inserts a new table into HANA with the right weather information. Once, I have the data in HANA, I performed mash-ups in HANA and off I go on my intellectual pursuit.<br />
<br />
Weather Underground returns the data in both XML and JSON file formats. The program logic is very simple, [once you have spent hours cracking it, the end product looks simple anyways :-)] and there are appropriate comments in the code below for self-learning.<br />
<br />
I want to mention that you are not limited to just getting the historical view on weather data. You can get the weather forecast for next 10 days, perform your analysis and predict future!<br />
<br />
Make sure to register with Weather Underground<a href="http://www.wunderground.com/weather/api/d/documentation.html"> (API documentation link)</a>, comply with their rules and get your own key to access their APIs.<br />
<span style="font-size: x-small;">############################################################################</span><br />
<span style="font-size: x-small;">getHistoricalWeather <- function(airport.code="SFO", date="Sys.Date()")</span><br />
<span style="font-size: x-small;">{</span><br />
<span style="font-size: x-small;"> base.url <- 'http://api.wunderground.com/api/<span style="background-color: yellow;">{your key here}</span>/'</span><br />
<span style="font-size: x-small;"> # compose final url</span><br />
<span style="font-size: x-small;"> final.url <- paste(base.url, 'history_', date, '/q/', airport.code, '.json', sep='')</span><br />
<span style="font-size: x-small;"><br /></span><br />
<span style="font-size: x-small;"> # reading in as raw lines from the web service</span><br />
<span style="font-size: x-small;"> conn <- url(final.url)</span><br />
<span style="font-size: x-small;"> raw.data <- readLines(conn, n=-1L, ok=TRUE)</span><br />
<span style="font-size: x-small;"> # Convert to a JSON</span><br />
<span style="font-size: x-small;"> weather.data <- fromJSON(paste(raw.data, collapse=""))</span><br />
<span style="font-size: x-small;"> close(conn)</span><br />
<span style="font-size: x-small;"> return(weather.data)</span><br />
<span style="font-size: x-small;">}</span><br />
<span style="font-size: x-small;"><br /></span><br />
<span style="font-size: x-small;"></span><br />
<span style="font-size: x-small;"># get data for 10 days - restriction by Weather Underground for free usage</span><br />
<span style="font-size: x-small;">date.range <- seq.Date(from=as.Date('2006-1-01'), to=as.Date('2006-1-10'), by='1 day')</span><br />
<span style="font-size: x-small;"><br /></span><br />
<span style="font-size: x-small;"># Initialize a data frame</span><br />
<span style="font-size: x-small;">hdwd <- data.frame()</span><br />
<span style="font-size: x-small;"><br /></span><br />
<br />
<span style="font-size: x-small;"># loop over dates, and fetch weather data</span><br />
<span style="font-size: x-small;">for(i in seq_along(date.range)) {</span><br />
<span style="font-size: x-small;"> weather.data <- getHistoricalWeather('SFO', format(date.range[i], "%Y%m%d")) </span><br />
<span style="font-size: x-small;"> hdwd <- rbind(hdwd, ldply(weather.data$history$dailysummary, </span><br />
<span style="font-size: x-small;"> function(x) c('SJC', date.range[i], x$fog, x$rain, x$snow, </span><span style="font-size: x-small;"> x$meantempi, x$meanvism, x$maxtempi, x$mintempi)))</span><br />
<span style="font-size: x-small;">}</span><br />
<span style="font-size: x-small;">colnames(hdwd) <- c("Airport", "Date", </span><span style="font-size: x-small;">'Fog', 'Rain', 'Snow','AvgTemp', </span><span style="font-size: x-small;">'AvgVisibility','MaxTemp','MinTemp')</span><br />
<span style="font-size: x-small;"><br /></span><br />
<span style="font-size: x-small;"># save to CSV</span><br />
<span style="font-size: x-small;">write.csv(hdwd, file=gzfile('SFC-Jan2006.csv.gz'), row.names=FALSE)</span><br />
<br />
<div>
<span style="font-size: x-small;">############################################################################</span><br />
<span style="font-size: x-small;">Results - </span><br />
<br />
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; width: 603px;">
<colgroup><col span="6" style="width: 48pt;" width="64"></col>
<col style="mso-width-alt: 3328; mso-width-source: userset; width: 68pt;" width="91"></col>
<col span="2" style="width: 48pt;" width="64"></col>
</colgroup><tbody>
<tr height="34" style="height: 25.5pt;">
<td class="xl65" height="34" style="height: 25.5pt; width: 48pt;" width="64">Airport</td>
<td class="xl65" style="width: 48pt;" width="64">Date</td>
<td class="xl65" style="width: 48pt;" width="64">Fog</td>
<td class="xl65" style="width: 48pt;" width="64">Rain</td>
<td class="xl65" style="width: 48pt;" width="64">Snow</td>
<td class="xl65" style="width: 48pt;" width="64">AvgTemp</td>
<td class="xl65" style="width: 68pt;" width="91">AvgVisibility</td>
<td class="xl65" style="width: 48pt;" width="64">MaxTemp</td>
<td class="xl65" style="width: 48pt;" width="64">MinTemp</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13149</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">1</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">55</td>
<td class="xl66" style="width: 68pt;" width="91">14</td>
<td class="xl66" style="width: 48pt;" width="64">62</td>
<td class="xl66" style="width: 48pt;" width="64">47</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13150</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">1</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">53</td>
<td class="xl66" style="width: 68pt;" width="91">11</td>
<td class="xl66" style="width: 48pt;" width="64">55</td>
<td class="xl66" style="width: 48pt;" width="64">50</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13151</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">1</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">51</td>
<td class="xl66" style="width: 68pt;" width="91">14</td>
<td class="xl66" style="width: 48pt;" width="64">56</td>
<td class="xl66" style="width: 48pt;" width="64">46</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13152</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">56</td>
<td class="xl66" style="width: 68pt;" width="91">16</td>
<td class="xl66" style="width: 48pt;" width="64">62</td>
<td class="xl66" style="width: 48pt;" width="64">50</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13153</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">54</td>
<td class="xl66" style="width: 68pt;" width="91">14</td>
<td class="xl66" style="width: 48pt;" width="64">60</td>
<td class="xl66" style="width: 48pt;" width="64">48</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13154</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">1</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">52</td>
<td class="xl66" style="width: 68pt;" width="91">14</td>
<td class="xl66" style="width: 48pt;" width="64">59</td>
<td class="xl66" style="width: 48pt;" width="64">45</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13155</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">1</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">56</td>
<td class="xl66" style="width: 68pt;" width="91">14</td>
<td class="xl66" style="width: 48pt;" width="64">61</td>
<td class="xl66" style="width: 48pt;" width="64">50</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13156</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">51</td>
<td class="xl66" style="width: 68pt;" width="91">16</td>
<td class="xl66" style="width: 48pt;" width="64">57</td>
<td class="xl66" style="width: 48pt;" width="64">45</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13157</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">49</td>
<td class="xl66" style="width: 68pt;" width="91">16</td>
<td class="xl66" style="width: 48pt;" width="64">56</td>
<td class="xl66" style="width: 48pt;" width="64">41</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15.0pt; width: 48pt;" width="64">SFO</td>
<td class="xl66" style="width: 48pt;" width="64">13158</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">0</td>
<td class="xl66" style="width: 48pt;" width="64">54</td>
<td class="xl66" style="width: 68pt;" width="91">10</td>
<td class="xl66" style="width: 48pt;" width="64">61</td>
<td class="xl66" style="width: 48pt;" width="64">46</td>
</tr>
</tbody></table>
<br />
<br /></div>
<div>
Happy Analyzing!</div>
<br />Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com307tag:blogger.com,1999:blog-7133039340481686842.post-20280069963920322232012-04-11T14:45:00.001-07:002012-04-11T15:12:02.454-07:00Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps<b style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">Technologies</b><span style="background-color: white; color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">: SAP HANA, R, HTML5, D3, Google Maps, JQuery and JSON</span><br />
<div class="post-header" style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 1.6; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
<div class="post-header-line-1">
</div>
</div>
<div class="post-body entry-content" id="post-body-1787276057412979652" itemprop="articleBody" style="color: #666666; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px; width: 660px;">
<div style="font-size: 13px;">
<span style="background-color: white;">For this fun exercise, I analyzed more than 200 million data points using SAP HANA and R and then brought in the aggregated results in HTML5 using D3, JSON and Google Maps APIs. The 2008 airlines data is from the data expo and I have been using this entire data set (123 million rows and 29 columns) for quite sometime. </span><a href="http://goo.gl/5ClmN">See my other blogs</a></div>
<div style="font-size: 13px;">
<span style="background-color: white;"><br /></span></div>
<div style="font-size: 13px;">
<span style="background-color: white;">The results look beautiful:</span></div>
<div style="font-size: 13px;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCbD2Nof9_afONJPBP1LBpXcaZTJWBzZ12VxDEL1yXt4zOEUDth9Icdi2b4KulrFsyYEwru3YiCGBtMF4tiqqi48xMaXgcLqbcyFUK-IK2aUBPcFgl7pxDZ_sc7OlclksLiv0XQ32wKDU/s1600/Picture1.png" imageanchor="1" style="background-color: white; color: #4d469c; margin-left: 1em; margin-right: 1em; text-align: center; text-decoration: none;"><img border="0" height="323" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCbD2Nof9_afONJPBP1LBpXcaZTJWBzZ12VxDEL1yXt4zOEUDth9Icdi2b4KulrFsyYEwru3YiCGBtMF4tiqqi48xMaXgcLqbcyFUK-IK2aUBPcFgl7pxDZ_sc7OlclksLiv0XQ32wKDU/s640/Picture1.png" style="border-bottom-style: none; border-color: initial; border-image: initial; border-left-style: none; border-right-style: none; border-top-style: none; border-width: initial; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; position: relative;" width="640" /></a></div>
<div style="font-size: 13px;">
<span style="background-color: white;"><br /></span></div>
<div style="font-size: 13px;">
<span style="background-color: white;">Each airport icon is clickable and when clicked displays an info-window describing the key stats for the selected airport:</span><a href="http://jitenderaswani.info.s3-website-us-east-1.amazonaws.com/img/JA-R-HANA-2.png" imageanchor="1" style="background-color: white; color: #4d469c; margin-left: 1em; margin-right: 1em; text-align: center; text-decoration: none;"><img border="0" height="320" src="http://jitenderaswani.info.s3-website-us-east-1.amazonaws.com/img/JA-R-HANA-2.png" style="border-bottom-style: none; border-color: initial; border-image: initial; border-left-style: none; border-right-style: none; border-top-style: none; border-width: initial; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; position: relative;" width="640" /></a></div>
<div style="font-size: 13px;">
<span style="background-color: white;">I then used D3 to display the aggregated result set in the modal window (light box):</span></div>
<div class="separator" style="clear: both; font-size: 13px; text-align: center;">
<a href="http://jitenderaswani.info.s3-website-us-east-1.amazonaws.com/img/JA-R-HANA-3.png" imageanchor="1" style="background-color: white; color: #4d469c; margin-left: 1em; margin-right: 1em; text-decoration: none;"><img border="0" height="320" src="http://jitenderaswani.info.s3-website-us-east-1.amazonaws.com/img/JA-R-HANA-3.png" style="border-bottom-style: none; border-color: initial; border-image: initial; border-left-style: none; border-right-style: none; border-top-style: none; border-width: initial; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; position: relative;" width="640" /></a></div>
<div style="font-size: 13px;">
<span style="background-color: white;">D3 made it looks ridiculously simpler to generate a table from a JSON file. </span></div>
<div style="font-size: 13px;">
<span style="background-color: white;">Unfortunately, I can't provide the live example due to the restrictions put in by Google Maps APIs and I am approaching my free API limits. </span><br />
<span style="background-color: white;"><br /></span></div>
<div style="font-size: 13px;">
<b>Fun fact:</b><span style="background-color: white;"> The Atlanta airport was</span><u> the largest airport in 2008 on many dimensions</u><span style="background-color: white;">: Total Flights Departed, Total Miles Flew, Total Destinations. It also experienced lower average departure delay in 2008 than Chicago O'Hare. I always thought Chicago O'Hare is the largest US airport.</span></div>
<div style="font-size: 13px;">
<span style="background-color: white;"><br /></span></div>
<div style="font-size: 13px;">
<span style="background-color: white;">As always, I just needed 6 lines of R code including two lines of code to write data in JSON and CSV files:</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">################################################################################</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">airports.2008.hp.summary <- airports.2008.hp[major.airports, </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> TotalFlights=length(Month),</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> TotalDestinations=length(unique(Dest)),</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> URL=paste("http://www.fly", Origin, ".com",sep="")), </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> by=list(Origin)][order(-TotalFlights)]</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">setkey(airports.2008.hp.summary, Origin)</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">#merge the two data tables</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">airports.2008.hp.summary <- major.airports[airports.2008.hp.summary, </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> list(Airport=airport, </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations, </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> Address=paste(airport, city, state, sep=", "), </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"> Lat=lat, Lng=long, URL)][order(-TotalFlights)]</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;"><br /></span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json") </span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)</span></div>
<div style="font-size: 13px;">
<span style="background-color: white; font-size: xx-small;">##############################################################################</span></div>
<div style="font-size: 13px;">
<span style="background-color: white;"><br /></span></div>
<div style="font-size: 13px;">
<b style="background-color: white;">Happy Coding and remember the possibilities are endless!</b></div>
</div>Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com11tag:blogger.com,1999:blog-7133039340481686842.post-7822795656366596872012-03-22T10:55:00.000-07:002012-03-22T10:55:35.294-07:00Tracking SFO Airport's Performance Using R, HANA and D3This is my first introduction to D3 and I am simply blown away. Mike Bostock (@mbostock), you are genius and thanks for creating D3! With HANA, R, D3, HTML5 and iPad, and you got yourself a KILLER combo!<br />
<br />
I have been burning my midnight oil on piecing together my big data story using HANA, R, JSON and HTML5. If you recall, I did a technical session on R and SAP HANA at DKOM, SAP's Development Kickoff Event last week where I showcased the supreme powers of R and HANA when analyzing 124 million records in real time. <a href="http://goo.gl/63vw7">R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data</a><br />
<br />
Since last week, I have been looking for other creative ways to analyze and then visualize this airlines data. I am very fortunate to come across D3. After spending couple of hours with D3, I decided to build the calendar view for the airlines data I have. The calendar view is the first example Mike shows on his D3 page. Amazingly awesome!<br />
<br />
I created this calendar view capturing the percent of delayed flight from SFO airports that departed daily between 2005-2008. For this analysis, I used HANA to get the data out for SFO (out of 250 plus airports) over this 4 years period in seconds and then did all the aggregation in R including creating a JSON and .CSV file in seconds again. Later, I moved to HTML5 and D3 to generate this beautiful calendar view showing SFO's performance. Graphics is presented below:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8hAYZRstWvMofDUI2Ulfjy5KsQp9zKP29Kjbxa_MC6TxmS7RoMfrV0fEGfd77Oerymh0lCGOXKcyOGSN6vgfSRNAc6BLcmMvw6BxM-9x3XR0Ve_DIiRcgQLw1U6NE-oHBtYVWvxhMUh4/s1600/D3-SFO-Delayed-Flights.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="385" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8hAYZRstWvMofDUI2Ulfjy5KsQp9zKP29Kjbxa_MC6TxmS7RoMfrV0fEGfd77Oerymh0lCGOXKcyOGSN6vgfSRNAc6BLcmMvw6BxM-9x3XR0Ve_DIiRcgQLw1U6NE-oHBtYVWvxhMUh4/s640/D3-SFO-Delayed-Flights.PNG" width="640" /></a></div>
<br />
As expected, December and January are two notorious months for flights delay. Have fun <a href="http://goo.gl/OGd4V">with the live example hosted in the Amazon cloud.</a><br />
<br />
<br />
Once again, my R code is very simple:<br />
<br />
## Depature Delay for SF Airport<br />
ba.hp.sfo <- ba.hp[Origin=="SFO",]<br />
<br />
ba.hp.sfo.daily.flights <- ba.hp.sfo[,list(DailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]<br />
ba.hp.sfo.daily.flights.delayed <- ba.hp.sfo[DepDelay>15,list(DelayedDailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]<br />
setkey(ba.hp.sfo.daily.flights.delayed, Year, Month, DayofMonth)<br />
response <- ba.hp.sfo.daily.flights.delayed[ba.hp.sfo.daily.flights]<br />
response <- response[,list(Date=as.Date(paste(Year, Month, DayofMonth, sep="-"),"%Y-%m-%d"),<br />
#DailyFlights,DelayedDailyFlights,<br />
PercentDelayedFlights=round((DelayedDailyFlights/DailyFlights), digits=2))]<br />
objs <- apply(response, 1, toJSON)<br />
res <- paste('{"dailyFlightStats": [', paste(objs, collapse=', '), ']}')<br />
writeLines(res, "dailyFlightStatsForSFO.json") <br />
write.csv(response, "dailyFlightStatsForSFO.csv", row.names=FALSE)<br />
<br />
For D3 and HTML code, please take a look at this example from <a href="http://goo.gl/HaOIO">D3 website</a>.<br />
<br />
Happy Analyzing and Keep That Mid Night Oil Burning!<br />
<br />Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com4tag:blogger.com,1999:blog-7133039340481686842.post-59043236980172227472012-03-20T14:53:00.003-07:002012-03-20T14:56:30.890-07:00Geocode and reverse geocode your data using, R, JSON and Google Maps' Geocoding API<br />
Geocode and reverse geocode your data using, R, JSON and Google Maps' Geocoding API<br />
<br />
To geocode and reverse geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.<br />
<br />
<b>Geocode:</b><br />
<br />
<span style="font-size: x-small;">getGeoCode <- function(gcStr) {</span><br />
<span style="font-size: x-small;"> library("RJSONIO") #Load Library</span><br />
<span style="font-size: x-small;"> gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters</span><br />
<span style="font-size: x-small;"> #Open Connection</span><br />
<span style="font-size: x-small;"> connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="") </span><br />
<span style="font-size: x-small;"> con <- url(connectStr)</span><br />
<span style="font-size: x-small;"> data.json <- fromJSON(paste(readLines(con), collapse=""))</span><br />
<span style="font-size: x-small;"> close(con)</span><br />
<span style="font-size: x-small;"> #Flatten the received JSON</span><br />
<span style="font-size: x-small;"> data.json <- unlist(data.json)</span><br />
<span style="font-size: x-small;"> if(data.json["status"]=="OK") {</span><br />
<span style="font-size: x-small;"> lat <- data.json["results.geometry.location.lat"]</span><br />
<span style="font-size: x-small;"> lng <- data.json["results.geometry.location.lng"]</span><br />
<span style="font-size: x-small;"> gcodes <- c(lat, lng)</span><br />
<span style="font-size: x-small;"> names(gcodes) <- c("Lat", "Lng")</span><br />
<span style="font-size: x-small;"> return (gcodes)</span><br />
<span style="font-size: x-small;"> }</span><br />
<span style="font-size: x-small;">}</span><br />
<span style="font-size: x-small;">geoCodes <- getGeoCode("Palo Alto,California")</span><br />
<span style="font-size: x-small;">> geoCodes</span><br />
<span style="font-size: x-small;"> Lat Lng </span><br />
<span style="font-size: x-small;"> "37.4418834" "-122.1430195" </span><br />
<br />
<b>Reverse Geocode:</b><br />
<span style="font-size: x-small;">reverseGeoCode <- function(latlng) {</span><br />
<span style="font-size: x-small;">latlngStr <- gsub(' ','%20', paste(latlng, collapse=","))#Collapse and Encode URL Parameters</span><br />
<span style="font-size: x-small;"> library("RJSONIO") #Load Library</span><br />
<span style="font-size: x-small;"> #Open Connection</span><br />
<span style="font-size: x-small;"> connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&latlng=',latlngStr, sep="")</span><br />
<span style="font-size: x-small;"> con <- url(connectStr)</span><br />
<span style="font-size: x-small;"> data.json <- fromJSON(paste(readLines(con), collapse=""))</span><br />
<span style="font-size: x-small;"> close(con)</span><br />
<span style="font-size: x-small;"> #Flatten the received JSON</span><br />
<span style="font-size: x-small;"> data.json <- unlist(data.json)</span><br />
<span style="font-size: x-small;"> if(data.json["status"]=="OK")</span><br />
<span style="font-size: x-small;"> address <- data.json["results.formatted_address"]</span><br />
<span style="font-size: x-small;"> return (address)</span><br />
<span style="font-size: x-small;">}</span><br />
<span style="font-size: x-small;">address <- reverseGeoCode(c(37.4418834, -122.1430195))</span><br />
<span style="font-size: x-small;">> address</span><br />
<span style="font-size: x-small;"> results.formatted_address </span><br />
<span style="font-size: x-small;">"668 Coleridge Ave, Palo Alto, CA 94301, USA" </span><br />
<br />
<b> Happy Coding!</b><br />
<br />
<br />
<br />Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com5tag:blogger.com,1999:blog-7133039340481686842.post-76586678677150605622012-01-30T11:28:00.001-08:002012-01-30T23:04:25.472-08:00Updated Sentiment Analysis and a Word Cloud for Netflix - The R Way!<br />
The Netflix investors must be happy and cheerful as the stock is up more than 78% since the beginning of the year (YES, 78%, <span style="font-size: xx-small;"><i>Source: Yahoo Finance!)</i></span>. I am not going to talk about what turned the stock around after a much talked/hyped about Netflix debacle of the late 2011 that earned Reed Hastings quite a few UNWANTED title and every one demanded his resignation from the top post. Not so fast, Mr. Bear! Reed Hastings must be smiling! After a stellar performance this year including carefully released stats on viewership, streaming hours as well as a solid Q4'11 earnings, Netflix is back and most importantly viewers are back!<br />
<br />
Well, is is not coincidental that the sentiment for Netflix is also improving, 68% of the tweets now have positive sentiment. See the table below:<br />
<br />
<br />
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; width: 582px;"><colgroup><col style="width: 75pt;" width="100"></col><col style="width: 82pt;" width="109"></col><col style="width: 76pt;" width="101"></col><col style="width: 79pt;" width="105"></col><col style="width: 68pt;" width="91"></col><col style="width: 57pt;" width="76"></col></colgroup><tbody>
<tr height="20" style="height: 15pt;"><td class="xl63" height="20" style="height: 15pt; width: 75pt;" width="100"><b><i>Total </i></b></td><td class="xl63" style="width: 82pt;" width="109"><b><i>Positive</i></b></td><td class="xl63" style="width: 76pt;" width="101"><b><i>Negative</i></b></td><td class="xl63" style="width: 79pt;" width="105"><b><i>Average</i></b></td><td class="xl63" style="width: 68pt;" width="91"><b><i>Total</i></b></td><td class="xl63" rowspan="2" style="width: 57pt;" width="76"><b><i>Sentiment</i></b></td></tr>
<tr height="34" style="height: 25.5pt;"><td class="xl63" height="34" style="height: 25.5pt; width: 75pt;" width="100"><b><i>Tweets<br /> Fetched</i></b></td><td class="xl63" style="width: 82pt;" width="109"><b><i>Tweets</i></b></td><td class="xl63" style="width: 76pt;" width="101"><b><i>Tweets</i></b></td><td class="xl63" style="width: 79pt;" width="105"><b><i>Score</i></b></td><td class="xl63" style="width: 68pt;" width="91"><b><i>Tweets</i></b></td></tr>
<tr height="20" style="height: 15pt;"><td class="xl64" height="20" style="height: 15pt; width: 75pt;" width="100"><i>499</i></td><td class="xl64" style="width: 82pt;" width="109"><i>171</i></td><td class="xl64" style="width: 76pt;" width="101"><i>80</i></td><td class="xl64" style="width: 79pt;" width="105"><i>0.281</i></td><td class="xl64" style="width: 68pt;" width="91"><i>251</i></td><td class="xl65" style="width: 57pt;" width="76"><i>68%</i></td></tr>
</tbody></table>
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; text-align: center; width: 678px;"><colgroup><col style="width: 55pt;" width="73"></col><col style="width: 101pt;" width="135"></col><col style="width: 77pt;" width="103"></col><col style="width: 82pt;" width="109"></col><col span="2" style="width: 70pt;" width="93"></col><col style="width: 54pt;" width="72"></col></colgroup><tbody></tbody></table>
<br />
<br />
<br />
<span style="font-size: x-small;">*Make sure you understand and interpret this analysis correctly. This analysis is not based on NLP. </span><br />
<br />
I updated the sentiment analysis that I did last year, <a href="http://goo.gl/fkfPy">http://goo.gl/fkfPy</a> , (I was then just beginning to play with Twitter and Text Mining packages in R) and used advanced packages like "TM" and "WordCloud". The new analysis is based on more than 6,800 words which are most commonly prescribed in various sentiment analysis blogs/books. (Check out Hu and Liu <a href="http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html" style="font-family: arial;" target="_blank">http://www.cs.uic.edu/~liub/<wbr></wbr>FBS/sentiment-analysis.html</a><span style="font-family: arial;">)</span><br />
<br />
I came across this excellent blog by Jeffrey Bean, @JeffreyBean, (http://goo.gl/RPkFX) and his tutorial. Thank you Mr. Bean! Please follow the instructions from Bean's slides and the R code listed there as well as the R code here:<br />
<br />
Here is the updated R code snippets -<br />
<span style="background-color: #fff2cc; font-family: arial; font-size: x-small;">#</span><span style="background-color: #fff2cc; font-family: arial; font-size: xx-small;">Populate the list of sentiment words from Hu and Liu (<a href="http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html" target="_blank">http://www.cs.uic.edu/~liub/<wbr></wbr>FBS/sentiment-analysis.html</a>)</span><br />
<br />
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">huliu.pwords <- scan('opinion-lexicon/<wbr></wbr>positive-words.txt', what='character', comment.char=';')</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">huliu.nwords <- scan('opinion-lexicon/<wbr></wbr>negative-words.txt', what='character', comment.char=';')</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"># Add some words</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">huliu.nwords <- c(huliu.nwords,'wtf','wait','<wbr></wbr>waiting','epicfail', 'crash', 'bug', 'bugy', 'bugs', 'slow', 'lie')</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">#Remove some words</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">huliu.nwords <- huliu.nwords[!huliu.nwords=='<wbr></wbr>sap']</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">huliu.nwords <- huliu.nwords[!huliu.nwords=='<wbr></wbr>cloud']</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">#which('sap' %in% huliu.nwords)</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">twitterTag </span><span style="background-color: #fff2cc; font-size: xx-small;"><- "@Netflix"</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"># Get 1500 tweets - an individual is only allowed to get 1500 tweets</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"> tweets <- searchTwitter(tag, n=1500)</span></div>
<div style="font-family: arial;">
<div>
<span style="background-color: #fff2cc; font-size: xx-small;"> tweets.text <- laply(tweets,function(t)t$<wbr></wbr>getText())</span></div>
<div>
<span style="background-color: #fff2cc; font-size: xx-small;"> sentimentScoreDF <- getSentimentScore(tweets.text)</span></div>
<div>
<span style="background-color: #fff2cc; font-size: xx-small;"> sentimentScoreDF$TwitterTag <- twitterTag</span></div>
</div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span></div>
<span style="background-color: #fff2cc; font-size: xx-small;"><br class="Apple-interchange-newline" /></span><br />
<br />
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"># Get rid of tweets that have zero score and seperate +ve from -ve tweets</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">sentimentScoreDF$posTweets <- as.numeric(sentimentScoreDF$<wbr></wbr>SentimentScore >=1)</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">sentimentScoreDF$negTweets <- as.numeric(sentimentScoreDF$<wbr></wbr>SentimentScore <=-1)</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">#Summarize finidings</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">summaryDF <- ddply(sentimentScoreDF,"<wbr></wbr>TwitterTag", summarise, </span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"> TotalTweetsFetched=length(<wbr></wbr>SentimentScore),</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"> PositiveTweets=sum(posTweets)<wbr></wbr>, NegativeTweets=sum(negTweets),<wbr></wbr> </span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"> AverageScore=round(mean(<wbr></wbr>SentimentScore),3))</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">summaryDF$TotalTweets <- summaryDF$PositiveTweets + summaryDF$NegativeTweets</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">#Get Sentiment Score</span></div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: xx-small;">summaryDF$Sentiment <- round(summaryDF$<wbr></wbr>PositiveTweets/summaryDF$<wbr></wbr>TotalTweets, 2)</span></div>
<br />
<span style="background-color: #fff2cc; font-size: xx-small;"><br /></span><br />
<br />
Saving the best for the last, here is a word cloud (also called tag cloud) for Netflix built in R-<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitOOJ0GiG-aMdHyGvzV3Xf5qDbSAjLSRpG9rrmXxMyokQuC-2yAFlPIqbKh8nh_DO1XG50Ip9key8nXa8as3GFnAQj9cDeEJXrsPuqjoB5E0GQ8undSWsNtHEHAqm8n6yxaYffPBG8FgM/s1600/Netflix+Tag+Cloud.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="458" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitOOJ0GiG-aMdHyGvzV3Xf5qDbSAjLSRpG9rrmXxMyokQuC-2yAFlPIqbKh8nh_DO1XG50Ip9key8nXa8as3GFnAQj9cDeEJXrsPuqjoB5E0GQ8undSWsNtHEHAqm8n6yxaYffPBG8FgM/s640/Netflix+Tag+Cloud.png" width="640" /></a></div>
<br />
I will be putting the R code up here for building a word cloud after scrubbing it.<br />
<br />
Happy Analyzing!<br />
<div>
<br /></div>Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com4tag:blogger.com,1999:blog-7133039340481686842.post-51724535822171805402012-01-30T11:28:00.000-08:002012-01-30T13:28:50.655-08:00Sentiment Analysis, the R way, on Netflix's September 18th Announcement<br />
Re-posting this blog from my other blog on Analytics (<a href="http://allthingsbusinessanalytics.blogspot.com/">http://allthingsbusinessanalytics.blogspot.com/</a>)<br />
<br />
Did Netflix make a bad move or a bold move, only time will tell but for now here is a simple sentiment analysis using R and TwitteR package on tweets involving Netflix for you to consume...<br />
<br />
<span class="Apple-style-span" style="font-family: arial; font-size: x-small;"></span><br />
<div>
<div>
So aftermath of #netflix supposedly bad strategic move, I thought that it will be little fun to do a little sentiment analysis using a sample of tweets from the past few days. I turned to my favorite "R" and discovered a new package called "TwitteR" and 4 lines of code later, I had the following outcome:</div>
<div>
<br /></div>
<div>
788 of the 1500 tweets, that is 52.5% of the tweets, over the last three days had words bad, suck, terrible or :( with #netflix...</div>
<div>
<br /></div>
<div>
You be the judge whether Netflix customers are unhappy and whether it was a bad (or bold) strategic move...</div>
</div>
<div>
<br /></div>
<div>
> library("twitteR")</div>
<div>
> searchNF <- searchTwitter("#netflix bad OR suck OR terrible OR disaster OR :(", n=1500, since=as.character(Sys.Date()-3))</div>
<div>
> negativeTweets <- length(searchNF)</div>
<div>
> negativeSentiment <- negativeTweets/1500</div>
<div>
<br /></div>Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com1tag:blogger.com,1999:blog-7133039340481686842.post-24563775353730045632012-01-24T12:05:00.000-08:002012-01-30T23:04:00.957-08:00Geocode your data using, R, JSON and Google Maps' Geocoding APIs<br />
<br />
<span style="font-family: Times, 'Times New Roman', serif;">Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics. In 2012, I decided to take thing in my control and turned to R. </span><span style="font-family: Times, 'Times New Roman', serif;">Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.</span><br />
<span style="font-family: Times, 'Times New Roman', serif;"><br /></span><br />
<span style="font-family: Times, 'Times New Roman', serif;">To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.</span><br />
<br />
Here is function that can be called repeatedly by other functions:<br />
<br />
<div style="text-align: left;">
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;">getGeoCode <- function(gcStr)</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;">{</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> library("RJSONIO") </i><i style="background-color: #fff2cc;">#Load Library</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> gcStr <- gsub(' ','%20',gcStr) </i><i style="background-color: #fff2cc;">#Encode URL Parameters</i></span></div>
<div>
<i style="background-color: #fff2cc;"><span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"> #Open Connection</span></i></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="") </i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> </i><i style="background-color: #fff2cc;"> con <- url(connectStr)</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> data.json <- fromJSON(paste(readLines(con), collapse=""))</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> close(con)</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;">#Flatten the received JSON</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> data.json <- unlist(data.json)</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> lat <- data.json["results.geometry.location.lat"]</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> lng <- data.json["results.geometry.location.lng"]</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> gcodes <- c(lat, lng)</i></span></div>
<div>
<i style="background-color: #fff2cc;"><span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"> names(gcodes) <- c("Lat", "Lng")</span></i></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;"> return (gcodes)</i></span></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif; font-size: x-small;"><i style="background-color: #fff2cc;">}</i></span></div>
<div style="text-align: -webkit-auto;">
<br /></div>
<div style="text-align: -webkit-auto;">
Let's put this function to test:</div>
<div style="text-align: -webkit-auto;">
<span style="background-color: #fff2cc; font-size: x-small;"><i>geoCodes <- getGeoCode("Palo Alto,California")</i></span></div>
<div style="text-align: -webkit-auto;">
<span style="background-color: #fff2cc; font-size: x-small;"><i><br /></i></span></div>
<div style="text-align: -webkit-auto;">
<span style="background-color: #fff2cc; font-size: x-small;"><i></i></span></div>
<span style="background-color: #fff2cc; font-size: x-small;"><i>> geoCodes</i></span><br />
<span style="background-color: #fff2cc; font-size: x-small;"><i> Lat Lng </i></span><br />
<span style="background-color: #fff2cc; font-size: x-small;"><i> "37.4418834" "-122.1430195" </i></span><br />
<br /></div>
<div>
<div style="font-family: arial;">
<br /></div>
<span style="font-family: Times, 'Times New Roman', serif;">You can run this on the entire column of a data frame or a data table:</span></div>
<div style="font-size: small;">
<span style="font-family: Times, 'Times New Roman', serif;"><br /></span></div>
<div>
<div>
<span style="font-family: Times, 'Times New Roman', serif;">Here is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.</span></div>
<div style="font-size: small;">
<i><br /></i></div>
</div>
<div>
<div style="font-family: arial;">
<span style="background-color: #fff2cc; font-size: x-small;"><i>> head(shortDS,10)</i></span></div>
<div style="font-family: arial;">
<i style="background-color: white;"><span style="font-size: x-small;"> </span><span style="font-size: xx-small;"> Opposition Ground.Country Toss</span></i></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">1 Pakistan Karachi,Pakistan won</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">2 Pakistan Faisalabad,Pakistan lost</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">3 Pakistan Lahore,Pakistan won</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">4 Pakistan Sialkot,Pakistan lost</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">5 New Zealand Christchurch,New Zealand lost</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">6 New Zealand Napier,New Zealand won</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">7 New Zealand Auckland,New Zealand won</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">8 England Lord's,England won</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">9 England Manchester,England lost</i></span></div>
<div style="font-family: arial;">
<span style="font-size: xx-small;"><i style="background-color: white;">10 England The Oval,England won</i></span></div>
<div style="font-family: arial; font-size: small;">
<br /></div>
<div>
<span style="font-family: Times, 'Times New Roman', serif;">To geo code this, here is a simple one liner I execute:</span><br />
<br />
<div>
<b><i style="background-color: #fff2cc;">shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,</i></b></div>
<div>
<div style="font-family: arial; font-size: small;">
<b><i style="background-color: #fff2cc;"> laply(Ground.Country, function(val){getGeoCode(val)}<wbr></wbr>)))</i></b></div>
<div style="font-family: arial; font-size: small;">
<b style="background-color: #fff2cc;"><br /></b></div>
<span style="background-color: #fff2cc;"></span><br />
<span style="background-color: #fff2cc;"><span style="font-family: arial; font-size: xx-small;"><i></i></span></span><br />
<span style="font-family: arial; font-size: x-small;"><i style="background-color: #fff2cc;">> head(shortDS, 10)</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;"> Opposition Ground.Country Toss Ground.Lat Ground.Lng</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">1 Pakistan Karachi,Pakistan won 24.893379 67.028061</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">2 Pakistan Faisalabad,Pakistan lost 31.408951 73.083458</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">3 Pakistan Lahore,Pakistan won 31.54505 74.340683</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">4 Pakistan Sialkot,Pakistan lost 32.4972222 74.5361111</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">5 New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">6 New Zealand Napier,New Zealand won -39.4928444 176.9120178</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">7 New Zealand Auckland,New Zealand won -36.8484597 174.7633315</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">8 England Lord's,England won 51.5294 -0.1727</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">9 England Manchester,England lost 53.479251 -2.247926</i></span><br />
<span style="font-family: arial; font-size: xx-small;"><i style="background-color: white;">10 England The Oval,England won 51.369037 -2.378269</i></span><br />
<span style="background-color: white;"><br /></span><br />
<div style="font-family: arial; font-size: small;">
<i style="background-color: white;"><br /></i></div>
<span style="background-color: white; font-family: Times, 'Times New Roman', serif;">Happy Coding!</span></div>
</div>
</div>
<br />Jitender Aswanihttp://www.blogger.com/profile/07256452105548911708noreply@blogger.com15