All Things R: Updated Sentiment Analysis and a Word Cloud for Netflix

The Netflix investors must be happy and cheerful as the stock is up more than 78% since the beginning of the year (YES, 78%, Source: Yahoo Finance!). I am not going to talk about what turned the stock around after a much talked/hyped about Netflix debacle of the late 2011 that earned Reed Hastings quite a few UNWANTED title and every one demanded his resignation from the top post. Not so fast, Mr. Bear! Reed Hastings must be smiling! After a stellar performance this year including carefully released stats on viewership, streaming hours as well as a solid Q4'11 earnings, Netflix is back and most importantly viewers are back!

Well, is is not coincidental that the sentiment for Netflix is also improving, 68% of the tweets now have positive sentiment. See the table below:

*Total*	*Positive*	*Negative*	*Average*	*Total*	*Sentiment*
Tweets Fetched	*Tweets*	*Tweets*	*Score*	*Tweets*	*Sentiment*
499	171	80	0.281	251	68%

*Make sure you understand and interpret this analysis correctly. This analysis is not based on NLP.

I updated the sentiment analysis that I did last year, http://goo.gl/fkfPy , (I was then just beginning to play with Twitter and Text Mining packages in R) and used advanced packages like "TM" and "WordCloud". The new analysis is based on more than 6,800 words which are most commonly prescribed in various sentiment analysis blogs/books. (Check out Hu and Liu http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

I came across this excellent blog by Jeffrey Bean, @JeffreyBean, (http://goo.gl/RPkFX) and his tutorial. Thank you Mr. Bean! Please follow the instructions from Bean's slides and the R code listed there as well as the R code here:

Here is the updated R code snippets -
#Populate the list of sentiment words from Hu and Liu (http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

huliu.pwords <- scan('opinion-lexicon/positive-words.txt', what='character', comment.char=';')

huliu.nwords <- scan('opinion-lexicon/negative-words.txt', what='character', comment.char=';')

# Add some words

huliu.nwords <- c(huliu.nwords,'wtf','wait','waiting','epicfail', 'crash', 'bug', 'bugy', 'bugs', 'slow', 'lie')

#Remove some words

huliu.nwords <- huliu.nwords[!huliu.nwords=='sap']

huliu.nwords <- huliu.nwords[!huliu.nwords=='cloud']

#which('sap' %in% huliu.nwords)

twitterTag <- "@Netflix"

# Get 1500 tweets - an individual is only allowed to get 1500 tweets

tweets <- searchTwitter(tag, n=1500)

tweets.text <- laply(tweets,function(t)t$getText())

sentimentScoreDF <- getSentimentScore(tweets.text)

sentimentScoreDF$TwitterTag <- twitterTag

# Get rid of tweets that have zero score and seperate +ve from -ve tweets

sentimentScoreDF$posTweets <- as.numeric(sentimentScoreDF$SentimentScore >=1)

sentimentScoreDF$negTweets <- as.numeric(sentimentScoreDF$SentimentScore <=-1)

#Summarize finidings

summaryDF <- ddply(sentimentScoreDF,"TwitterTag", summarise,

TotalTweetsFetched=length(SentimentScore),

PositiveTweets=sum(posTweets), NegativeTweets=sum(negTweets),

AverageScore=round(mean(SentimentScore),3))

summaryDF$TotalTweets <- summaryDF$PositiveTweets + summaryDF$NegativeTweets

#Get Sentiment Score

summaryDF$Sentiment <- round(summaryDF$PositiveTweets/summaryDF$TotalTweets, 2)

Saving the best for the last, here is a word cloud (also called tag cloud) for Netflix built in R-

I will be putting the R code up here for building a word cloud after scrubbing it.

Happy Analyzing!

4 comments:

PowellApril 12, 2012 at 7:53 AM
Hi,

May I ask how to get this function "getSentimentScore"?

Thanks,
Powell
Jitender AswaniApril 12, 2012 at 11:44 PM
Here you go Powell.

getSentimentScore <- function(tweets)
{
scores <- laply(tweets, function(singleTweet) {
# clean up tweets with R's regex-driven global substitute, gsub()
singleTweet <- gsub('[[:punct:]]', '', singleTweet)
singleTweet <- gsub('[[:cntrl:]]', '', singleTweet)
singleTweet <- gsub('\\d+', '', singleTweet)
#Convert to lower case for comparision, split the tweet into single words and flatten the list
tweetWords <- unlist(str_split(tolower(singleTweet), '\\s+'))
# compare our words to the dictionaries of positive & negative terms
# match() returns the position of the matched term or NA, apply is.na to convert to boolean
pos.matches <- !is.na(match(tweetWords, huliu.pwords))
neg.matches <- !is.na(match(tweetWords, huliu.nwords))
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score <- sum(pos.matches) - sum(neg.matches)
return(score)
})
return(data.frame(SentimentScore=scores, Tweet=tweets))
}
UnknownDecember 4, 2014 at 6:58 AM
Thanks for sharing this topic, Jitender. Nice work. My graduation paper was on the same line.
I did some feature extraction and product sentiment analysis. I didnt get to the summarization part, though. Financial News

All Things R

Monday, January 30, 2012

Updated Sentiment Analysis and a Word Cloud for Netflix - The R Way!

4 comments:

Blog Archive

Followers